Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

It looks like from the preprint that they sequenced the Y chromosome of HG002, which was one of the original 1000 genomes samples from way back in the day, still held in deep freeze at a number of biobanks.

Short-read sequencing data is a notoriously bad datatype for reconstructing the low-complexity / repetitive regions of genomes, so up until recently the most commonly used reference genomes have left many of these regions "dark". According to the preprint, the Y chromosome has the highest density of these low-complexity regions. It's also something of a bioinformatic nuisance when constructing a generic human reference genome, as it's only present in 50% of the population.



Isn't the problem the absence of random DNA?

I wouldn't call random data 'complex', but it is easy to sequence when assembling short reads.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: