More or less information?
How do we define information in biology?
One of the most powerful creationist arguments concerns information. Understanding this issue can enable us to deflect the common anti-creationist equivocation of calling any change ‘evolution’. Creationists do not deny that living things change, and even speciate, but nearly all the changes acclaimed as ‘evolution’ do not actually involve the increase in information content required for microbes-to-man evolution, but rather go in the opposite direction. For an illustration of this see: How information is lost when creatures adapt to their environment.
This week’s feedback comes from Casey P who encountered a vexatious question about how to define information. The evolutionist who posed the question to Casey P erred by presupposing a simplistic definition, while Andrew Lamb’s reply shows that there are more levels of information needed to understand its role in biology.
How do we define information in biology?
I’m curious to know perhaps you could fill me in on this. Which one has the most information, and what exactly are these two sequences?
Sequence 1: cag tgt ctt ggg ttc tcg cct gac tac gag acg cgt ttg tct tta cag gtc ctc ggc cag cac ctt aga caa gca ccc ggg acg cac ctt tca gtg ggc act cat aat ggc gga gta cca agg agg cac ggt cca ttg ttt tcg ggc cgg cat tgc tca tct ctt gag att tcc ata ctt
Sequence 2: tgg agt tct aag aca gta caa ctc tgc gac cgt gct ggg gta gcc act tct ggc cta atc tac gtt aca gaa aat ttg agg ttg cgc ggt gtc ctc gtt agg cac aca cgg gtg gaa tgg ggg tct ctt acc aaa ggg ctg ccg tat cag gta cga cgt agg tat tgc cgt gat aga ctg
Thanks for your help here. God bless.
Dear Mr P
Thank you for your email of 17 January, submitted via our website.
In response to creationist arguments about genetic information, some evolutionists disingenuously object that since there is no one measure of information content applicable to all situations, therefore genetic information doesn’t exist! But even hardened atheists like the eugenicist Richard Dawkins recognize that DNA contains information. In fact there is a burgeoning new field of science called bio-informatics, which is all about genetic information.
With respect to the two sequences you presented, one would need to know their functions before it would be possible to consider making a comparison about which sequence carried more information. If their functions (assuming they were not just gobbledygook) were dissimilar, then it would be fairly meaningless to attempt a comparison of information content. For example if one was a genetic sequence coding for an enzyme, and the other a genetic sequence coding for a structural protein, then to ask which has the most information would be as meaningless as asking, say, ‘which has more information—60 grams worth of apple or 60 grams worth of orange’.
If the meaning/function is similar, then an information-content comparison may be possible. Consider the following two sequences:
She has a yellow vehicle.
She has a yellow car.
Both are English sentences. The first is 25 characters long, and the second is 21 characters long. The first sentence has more characters, but the second sentence has more information, because it is more specific (cars being just one of scores of different types of vehicle), and specificity is one measure of information content. Specificity relates to the purpose of the information, not to the way it is expressed or the size of the message when it is expressed in some particular way/language.
- statistics (symbols and their frequencies)
- syntax (patterns of arrangement of symbols)
- semantics (meaning)
- pragmatics (function/result/outcome)
- apobetics (purpose/plan/design)
Specificity relates to the semantics, pragmatics or apobetics level.
Gitt’s Theorem 9 states that ‘Only that which contains semantics is information’. This is a crucial point. Many evolutionists err by restricting information measurement to the statistical level, or to ‘Shannon information’. So-called ‘Shannon information’ is not a measure of information per se, but merely a measure of the minimum number of characters/units needed to represent a sequence, regardless of whether the sequence is meaningful or not. Gobbledygook can have more ‘Shannon information’ than a meaningful sentence in a recognised language.
So, if the two sequences you were presented with were composed randomly, then it is highly unlikely that either contains any information. However, for argument’s sake, I will assume that they may be meaningful, and compare them.
The two sequences both contain the same amount of statistical information, 240 characters worth, when represented in text.
Both sequences appear the same at the syntactical level, i.e. both consist of 60 spaced triplets composed of the symbols c, a, t, and g.
At the semantic level, I recognize that these letter triplets are the same as ones used to represent triplets of DNA bases that code for particular amino acids. Since all 64 possible triplets have a meaning in the DNA code, and since neither sequence contains any of the three ‘stop codes’ (taa, tga, tag), it follows that both sequences could be regarded as having the same amount of information at the semantic level, since, if processed by the appropriate genetic machinery, both sequences could probably produce a protein chain 60 amino acids in length.
However, when it comes to the pragmatics level, as far as I can determine (being unable to locate these sequences in any online gene libraries such as NCBI’s Entrez Nucleotides) both sequences apparently carry the same amount of meaningful information—zilch.
At the apobetics level, I have no idea what outcomes would result from processing the two sequences. Conceivably, at one extreme, they could result in production of an enzyme that kills the cell, or even a toxin that kills the organism to which the cell belongs. At the other extreme, they could (for all I know) prevent aging, thus extending the lifespan—I have no idea. However if they are indeed random sequences then the outcome of processing them into proteins in living cells would likely be destructive, since random changes overwhelmingly tend to be harmful rather than beneficial.
The final protein configuration that results from a particular DNA sequence is affected by cellular machines of a type called chaperonins, which influence protein folding. One of the most difficult problems in molecular biology has been trying to compute the final protein configuration from an amino acid sequence (see a current project). Without chaperonins the amino-acid chain produced from the DNA sequence might mis-fold into a deadly prion, rather than fold correctly into the needed protein. This is the likely cause of the fatal brain conditions Creutzfeldt–Jakob disease and bovine spongiform encephalopathy (BSE) aka mad cow disease (see also Did God create life? Ask a protein, and Discoveries that undermine the one gene → one protein idea).
Note also that each creature has its own unique set of cellular machinery, so the outcomes that result from the reading of these genetic sequences could be very different depending on which organism’s genetic machinery reads them. For example the genetic sequence found in the HIV virus is apparently harmless when read by the cellular machinery in ape’s cells, but ultimately lethal when read by human cellular machinery—very different outcomes at the apobetics level from the same genetic sequence. Also, there are some organisms with slightly different genetic codes, so the same semantic information would be read differently resulting in different pragmatic and apobetic information.
I hope this helps. We have many articles on our website on the issue of information in living organisms. Some can be found listed under the topic ‘Information Theory’ in the Frequently Asked Questions index on this website.
Re-featured on homepage: 18 February 2023