Astonishing DNA complexity update
Published: 3 July 2007 (GMT+10)
Recently we reported astonishing new discoveries about the complexity of the information content stored in the DNA molecule.1 Notably, the 97% of the human DNA that does not code for protein is not leftover ‘junk DNA’ from our evolutionary past, as previously thought, but is virtually all being actively used right now in our cells.
Here are a few more exciting details from the ENCODE (Encyclopedia of DNA Elements) pilot project report.2 As a help in understanding this, DNA is a very stable molecule ideal for storing information. In contrast, RNA is a very active (and unstable) molecule and does lots of work in our cells. To use the stored information on our DNA, our cells copy the information onto RNA transcripts that then do the work as instructed by that information.
- Traditional ‘beads-on-a-string’ type genes do form the basis of the protein-producing code, even though much greater complexity has now been uncovered. Genes found in the ENCODE project differ only about 2% from the existing catalogue of known protein-coding genes.
- We reported previously that the transcripts overlap the gene regions, but the overlaps are huge compared to the size of the genes. On average, the transcripts are 10 to 50 times the size of the gene region, overlapping on both sides. And as many as 20% of transcripts range up to more than 100 times the size of the gene region. This would be like photocopying a page in a book and having to get information from 10, 50 or even 100 other pages in order to use the information on that page.
- The untranslated regions (now called UTRs, rather than ‘junk’) are far more important than the translated regions (the genes), as measured by the number of DNA bases appearing in RNA transcripts. Genic regions are transcribed on average in five different overlapping and interleaved ways, while UTRs are transcribed on average in seven different overlapping and interleaved ways. Since there are about 33 times as many bases in UTRs than in genic regions, that makes the ‘junk’ about 50 times more active than the genes.
- Transcription activity can best be predicted by just one factor, the way that the DNA is packaged into chromosomes. The DNA is coiled around protein globules called histones, then coiled again into a rope-like structure, then super-coiled in two stages around scaffold proteins to produce the thick chromosomes that we see under the microscope. This suggests that DNA information normally exists in a form similar to a closed book—all the coiling prevents the coded information from coming into contact with the translation machinery. When the cell wants some information it opens a particular page, ‘photocopies’ the information, then closes the book again. Recent other work3 shows that this is physically accomplished as follows:
- The chromosomes in each cell are stored in the membrane-bound nucleus. The nuclear membrane has about 2000 pores in it, through which molecules can be passed in and out. The required chromosome is brought near to one of these nuclear pores.
- The section of DNA to be transcribed is placed in front of the pore.
- The supercoil is unwound to expose the transcription region.
- The histone coils are twisted so as to expose the required copying site.
- The double-helix of the DNA is unzipped to expose the coded information.
- The DNA is grasped into a loop by the enzymes that do the copying, and this loop is copied onto an RNA transcript. The transcript is then checked for accuracy (and is degraded and recycled if it is faulty). The RNA transcript is then specially tagged for export, and is exported through the pore and carried to wherever it is needed in the cell.
- The ‘book’ of DNA information is then closed by a reversal of the coiling process and movement of the chromosome away from the nuclear pore region.
- The most surprising result, according to the ENCODE authors, is that 95% of the functional transcripts (genic and UTR transcripts with at least one known function) show no sign of selection pressure (i.e. they are not noticeably conserved and are mutating at the average rate). This contradicts Charles Darwin’s theory that natural selection is the major cause of our evolution. It also creates an interesting paradox: cell architecture, machinery and metabolic cycles are all highly conserved (e.g. the human insulin gene has been put into bacteria to produce human insulin on an industrial scale), while most of the chromosomal information is freely mutating. How could this state of affairs be maintained for the supposed 3.8 billion years since bacteria first evolved? A better answer might be that life is only thousands, not billions of years old. It also looks like cells, not genes, are in control of life—the direct opposite of what neo-Darwinists have long assumed.
- Alex Williams, Astonishing DNA complexity uncovered. Return to Text.
- Ewan Birney, et. al., Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project, Nature 447: 799-816, 2007. Return to Text.
- Asifa Akhtar & Susan M. Gasser, The nuclear envelope and transcriptional control, Nature Reviews Genetics 8:507–517, 2007. Return to Text.