Astonishing DNA complexity uncovered
Published: 20 June 2007 (GMT+10)
When the Human Genome Project published its first draft of the human genome in 2003, they already knew certain things in advance. These included:
- Coding segments (genes that coded for proteins) were a minor component of the total amount of DNA in each cell. It was embarrassing to find that we have only about as many genes as mice (about 25,000) which constitute only about 3% of the entire genome.
- The non-coding sections (i.e. the remaining 97%) were nearly all of unknown function. Many called it ‘junk DNA’; they thought it was the miscopied and mutation-riddled left-overs abandoned by our ancestors over millions of years. Molecular taxonomists routinely use this ‘junk DNA’ as a ‘molecular clock’—a silent record of mutations that have been undisturbed by natural selection for millions of years because it does not do anything. They have constructed elaborate evolutionary histories for all different kinds of life from it.
- Genes were known to be functional segments of DNA (exons) interspersed with non-functional segments (introns) of unknown purpose. When the gene is copied (transcribed into RNA) and then translated into protein the introns are spliced out and the exons are joined up to produce the functional gene.
- Copying (transcription) of the gene began at a specially marked START position, and ended at a special STOP sign.
- Gene switches (the molecules involved are collectively called transcription factors) were located on the chromosome adjacent to the START end of the gene.
- Transcription proceeds one way, from the START end to the STOP end.
- Genes were scattered throughout the chromosomes, somewhat like beads on a string, although some areas were gene-rich and others gene-poor.
- DNA is a double helix molecule, somewhat like a coiled zipper. Each strand of the DNA zipper is the complement of the other—as on a clothing zipper, one side has a lump that fits into a cavity on the other strand. Only one side of the DNA ‘zipper’ (called the ‘sense’ strand) makes the correct protein sequence. The complementary strand is called the ‘anti-sense’ strand. The sense strand is like an electrical extension cord where the ‘female’ end is safe to leave open until an appliance is attached, but the protruding ‘male’ end is active and for safety’s sake only works when plugged into a ‘female’ socket. Thus, protein production usually only comes from copying the sense strand, not the anti-sense strand. The anti-sense strand provides a template for copying the sense strand in a way that a photographic negative is used to produce a positive print. Some exceptions to this rule were known (i.e. that in some cases anti-sense strands were used to make protein) but no one expected the whole anti-sense strand to be transcribed.
This whole structure of understanding has now been turned on its head. A project called ENCODE recently reported an intensive study of the transcripts (copies of RNA produced from the DNA) of just 1% of the human genome.1,2 Their findings include the following inferences:
- About 93% of the genome is transcribed (not 3%, as expected). Further study with more wide-ranging methods may raise this figure to 100%. Because much energy and coordination is required for transcription this means that probably the whole genome is used by the cell and there is no such thing as ‘junk DNA’.
- Exons are not gene-specific but are modules that can be joined to many different RNA transcripts. One exon (i.e. one part of one gene) can be used in combination with up to 33 different genes located on 14 different chromosomes. This means that one exon can specify one part shared in common by many different proteins.
- There is no ‘beads on a string’ linear arrangement of genes, but rather an interleaved structure of overlapping segments, with typically 5, 7, 9 or more transcripts coming from the one ‘gene’.
- Not just one strand, but both strands (sense and anti-sense) of the DNA are fully transcribed.
- Transcription proceeds not just one way but both backwards and forwards.
- Transcription factors can be tens or hundreds of thousands of base-pairs away from the gene that they control, even on different chromosomes.
- There is not just one START site, but many, in each particular gene region.
- There is not just one transcription triggering (switching) system for each region, but many.
The authors conclude:
‘An interleaved genomic organization poses important mechanistic challenges for the cell. One involves the [use of] the same DNA molecules for multiple functions. The overlap of functionally important sequence motifs must be resolved in time and space for this organization to work properly. Another challenge is the need to compartmentalize RNA or mask RNAs that could potentially form long double-stranded regions, to prevent RNA-RNA interactions that could prompt apoptosis [programmed cell death].’
This concern for the safety of so many RNA molecules being produced in such a small space is well-founded. RNA is a long single-strand molecule not unlike a long piece of sticky-tape—it will stick to any nearby surface, including itself! Unless properly coordinated, it will all scrunch up into a sticky mess.
These results are so astonishing, so shocking, that it is going to take an awful lot more work to untangle what is really going on in cells. And the molecular taxonomists, who have been drawing up evolutionary histories (‘phylogenies’) for everything, are going to have to undo all their years of ‘junk DNA’-based historical reconstructions and wait for the full implications to emerge before they try again. One of the supposedly ‘knock-down’ arguments that humans have a common ancestor with chimpanzees is shared ‘non-functional’ DNA coding. That argument just got thrown out the window.
- Birney, E., et. al., Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project, Nature 447: 799–816, 2007. Return to Text.
- Philipp Kapranov, P., Willingham, A.T. and Gingeras, T.R., Genome-wide transcription and the implications for genomic organization, Nature Reviews Genetics 8: 413–423, 2007. Return to Text.