‘Simple’? Whole Bacteria Genome Sequenced


A team of 40 scientists in the United States recently achieved a landmark with the sequencing of the whole of the genome of the bacterium Haemophilus influenzae Rd.1 This is the first free-living bacterium to be fully sequenced. The team also proved the usefulness of the technique of random sequencing for sequencing bacterial genomes in particular and possibly eukaryotes as well.

Haemophilus influenzae was chosen for sequencing because no physical gene map existed and, with a genome size of about 1.8 million bases, it was considered ‘typical among bacteria’.

The project involved an enormous amount of work and considerable computer analysis of the data generated by each of the laboratories involved. The DNA was physically chopped into random pieces, and pieces of a maximum of between 1,600 and 2,000 bases in length were selected for sequencing. Sufficient segments were sequenced to get the equivalent of six full genomes. The sequences were entered into a database and computer programmes used to match up the random segment sequences to derive the sequences for larger segments (‘contigs’) of the full genome. One part of this data analysis used 30 hours on a SPARCenter 200 computer with 512 MB RAM. Statistical studies suggest that such a procedure is likely to result in a low level of gaps in the final sequence. These gaps were then filled by other techniques to complete the final sequence. The authors estimated their final error rate as between 1 base in 5,000 and 1 base in 10,000.

The resulting genome has 1,830,137 base pairs coding for an estimated 1,743 coding regions (‘genes’). The sequence was compared with the sequences in a published database of gene sequences called GeneBank 87. From this, 1,007, or 58 per cent, of the coding regions were tentatively assigned a role, but 736, or 42 per cent, could not be assigned a role. In other words, there is an enormous amount of work yet to be done to confirm and elucidate the functions of each of the coding regions identified.

The putatively identified coding regions were categorised as to their functions into 102 biological roles, and further into 14 broader role categories. It is interesting to see that some 87 genes code for proteins/enzymes involved in DNA replication alone. There are many more involved in transcription and translation, not to mention biosynthesis, energy metabolism, transport, etc. How many of the 1,743 genes are essential for life?

It is clearly becoming more and more untenable to believe that any sort of self-reproducing cell could ever have been ‘simple’ so as to allow for its naturalistic origin. Anyone who believes in ‘simple’ bacteria should look at the genome map for Haemophilus influenzae - it should cure them for good. Furthermore, if a prokaryote such as a typical bacterium were to be transformed into a human over some billions of years, one has to add the information for about a further 100,000 genes - an impossible task for mutations to achieve.


  1. Fleischmann, R. D., Adams, M. D., 1995. Whole-genome sequencing and assembly of Haemophilus influenzae Rd. Science, 269:496-512. Return to text.