In honour of the 20th anniversary of the completion of the first draft sequence of the human genome, on 26 June 2000, a ground-breaking achievement that was the result of many years of work and the joint effort of 2,400 scientists across 6 countries, we thought to remind ourselves of 6 numbers related to this seminal scientific breakthrough.
(Image: a graphic representation - called a cyclo-plot - of the genome of a cancer cell including various mutations, adapted from Stratton et al, Nature, 2009)
2 meters: The sequencing project had built on the discovery of the double helix some 50 years earlier; we all have about 2 meters of this double helix stuff in every cell in our body, each “double helix” containing 2 intertwined strands of DNA and is right-handed (i.e. twisting upwards as if you were driving a right-handed screw) while measuring 23 angstroms (1/1000th of 1/1000th of a millimeter) across the molecule.
3 billion: The DNA code of the human genome has approximately 3 billion base pairs.
3.5 minutes: According to the Sanger Institute, we now sequence DNA at a rate equivalent to a human genome every 3.5 minutes. We’ve sequenced over 10 petabases of DNA since 1992, the first five petabases having taken 25 years, the next five, just 13 months.
23: As is widely known by now, there are 23 chromosome pairs in our genome: our 1st gene, on chromosome 1, encodes a protein that senses smell in the nose, while the last gene, on chromosome X, encodes a protein that modulates the interaction between cells of the immune system. (The “first” and “last” chromosomes are arbitrarily assigned; the 1st chromosome is the longest chromosome.)
33: One of the landmark projects that arose from our success in human genome sequencing was the cancer genomics program that resulted in The Cancer Genome Atlas (TCGA), which characterized over 20,000 primary cancer and matched normal samples spanning 33 cancer types. This was a joint effort between the National Cancer Institute and the National Human Genome Research Institute that spanned from 2006 to 2018. (
98 percent. Genes, oddly, comprise only a miniscule fraction of our whole genome sequence. An enormous proportion – a bewildering 98% - is not dedicated to genes per se, but to enormous stretches of DNA that are interspersed between genes (intergenic DNA) or within genes (introns). These long stretches encode no RNA, and no protein: they exist in the genome either because they regulate gene expression, or for reasons that we do not yet understand, or because of no reason whatsoever (“junk” DNA).
P.S. A printed version of the genomic data of the full human genome is on display at the Wellcome Collection in London, recorded in 109 hard-backed books, each with over 1,000 pages.