When the sequencing of the human genome was announced two decades ago by the Human Genome Project and biotech firm Celera Genomics, the sequence was not truly complete. About 15% was missing: technological limitations left researchers unable to work out how certain stretches of DNA fitted together, especially those where there were many repeating letters (or base pairs). Scientists solved some of the puzzle over time, but the most recent human genome, which geneticists have used as a reference since 2013, still lacks 8% of the full sequence.
Now, researchers in the Telomere-to-Telomere (T2T) Consortium, an international collaboration that comprises around 30 institutions, have filled in those gaps. In a 27 May preprint1 entitled ‘The complete sequence of a human genome’, genomics researcher Karen Miga at the University of California, Santa Cruz, and her colleagues report that they’ve sequenced the remainder, in the process discovering about 115 new genes that code for proteins, for a total of 19,969.
The newly sequenced genome — dubbed T2T-CHM13 — adds nearly 200 million base pairs to the 2013 version of the human genome sequence.
This time, instead of taking DNA from a living person, the researchers used a cell line derived from what’s known as a complete hydatidiform mole, a type of tissue that forms in humans when a sperm inseminates an egg with no nucleus. The resulting cell contains chromosomes only from the father, so the researchers don’t have to distinguish between two sets of chromosomes from different people.
Miga says the feat probably wouldn’t have been possible without new sequencing technology from Pacific Biosciences in Menlo Park, California, which uses lasers to scan long stretches of DNA isolated from cells — up to 20,000 base pairs at a time. Conventional sequencing methods read DNA in chunks of only a few hundred base pairs at a time, and researchers reassemble these stretches like puzzle pieces. The larger pieces are much easier to put together, because they are more likely to contain sequences that overlap.
T2T-CHM13 is not the last word on the human genome, however. The T2T team had trouble resolving a few regions on the chromosomes, and estimates that about 0.3% of the genome might contain errors. There are no gaps, but Miga says quality-control checks have proved difficult in those areas. And the sperm cell that formed the hydatidiform mole carried an X chromosome, so the researchers have not yet sequenced a Y chromosome, which typically triggers male biological development.