Genetic algorithms—do they show that evolution works?

by Don Batten

A genetic algorithm (GA) is a computer program that supposedly simulates biological evolution. GAs have found limited application in generating novel engineering solutions—for example, an electronic circuit that filters out a particular frequency. GAs use mathematical constructs that parallel mutations (random changes in the variables/coefficients), natural selection (elimination of variations in a circuit, for example, that do not move toward the objective of a response to a particular frequency), and even some type of ‘recombination’ (as happens in sexual reproduction). Because of this, some apologists for evolution claim that these programs show that biological evolution can create the information needed to proceed from less complex to more complex organisms (i.e. with more genetic information).

However, GAs do not mimic or simulate biological evolution because with a GA:

A ‘trait’ can only be quantitative so that any move towards the objective can be selected for. Many biological traits are qualitative—it either works or it does not, so there is no step-wise means of getting from no function to the function.
A GA can only select for a very limited number of traits. Even with the simplest bacteria, which are not at all simple, hundreds of traits have to be present for it to be viable (survive); selection has to operate on all traits that affect survival.
Something always survives to carry on the process. There is no rule in evolution that says that some organism(s) in the evolving population will remain viable no matter what mutations occur. In fact, the GAs that I have looked at artificially preserve the best of the previous generation and protect it from mutations or recombination in case nothing better is produced in the next iteration. This has a ratchet effect that ensures that the GA will generate the desired outcome—any move in the right direction is protected.
Perfect selection (selection coefficient, s = 1.0) is often applied so that in each generation only the best survives to ‘reproduce’ to produce the next generation. In the real world, selection coefficients of 0.01 or less are considered realistic, in which case it would take many generations for an information-adding mutation to permeate through a population. Putting it another way, the cost of substitution is ignored (see ReMine’s The Biotic Message for a thorough run-down of this, which is completely ignored in GAs—see Population genetics, Haldane’s Dilemma, etc.).
The flip side to this is that high rates of ‘reproduction’ are used. Bacteria can only double their numbers per generation. Many ‘higher’ organisms can only do a little better, but GAs commonly produce 100s or 1000s of ‘offspring’ per generation. For example, if a population of 1,000 bacteria had only one survivor (999 died), then it would take 10 generations to get back to 1,000.
Generation time is ignored. A generation can happen in a computer in microseconds whereas even the best bacteria take about 20 minutes. Multicellular organisms have far longer generation times.
The mutation rate is artificially high (by many orders of magnitude). This is sustainable because the ‘genome’ is small (see next point) and artificial rules are invoked to protect the best ‘organism’ from mutations, for example. Such mutation rates in real organisms would result in all the offspring being non-viable (error catastrophe). This is why living things have exquisitely designed editing machinery to minimize copying errors to a rate of about one in a billion per cell division.
The ‘genome’ is artificially small and only does one thing. The smallest real world genome is over 0.5 million base pairs (and it is an obligate parasite, which depends on its host for many of the substrates needed) with several hundred proteins coded. This is equivalent to over a million bits of information. Even if a GA generated 1800 bits of real information, as one of the commonly-touted ones claims, that is equivalent to maybe one small enzyme—and that was achieved with totally artificial mutation rates, generation times, selection coefficients, etc., etc. In fact, this is also how the body’s immune system develops specific antibodies, with these designed conditions totally different to any whole organism. This is pointed out in more detail by biophysicist Dr Lee Spetner in his refutation of a skeptic.
In real organisms, mutations occur throughout the genome, not just in a gene or section that specifies a given trait. This means that all the deleterious changes to other traits have to be eliminated along with selecting for the rare desirable changes in the trait being selected for. This is ignored in GAs. With genetic algorithms, the program itself is protected from mutations; only target sequences are mutated. Indeed, if it were not quarantined from mutations, the program would very quickly crash. However, the reproduction machinery of an organism is not protected from mutations.
There is no problem of irreducible complexity with GAs. Many biological traits require many different components to be present, functioning together, for the trait to exist at all (e.g. protein synthesis, DNA replication, reproduction of a cell, blood clotting, every metabolic pathway, etc.).
Polygeny (where a trait is determined by the combined action of more than one gene) and pleiotropy (where one gene can affect several different traits) are ignored. Furthermore, recessive genes are ignored (recessive genes cannot be selected for unless present as a pair; i.e. homozygous), which multiplies the number of generations needed to get a new trait established in a population. The problem of recessive genes leads to one facet of Haldane’s Dilemma, where the well-known evolutionist J.B.S. Haldane pointed out that, based on the theorems of population genetics, there has not been enough time for the sexual organisms with low reproductive rates and long generation times to evolve. See review of ReMine’s analysis of Haldane’s Dilemma.
Multiple coding genes are ignored. From the human genome project, it appears that, on average, each gene codes for at least three different proteins (see Genome Mania — Deciphering the human genome. In microbes, genes have been discovered that code for one protein when ‘read’ in one direction and a different protein when read backwards, or when the ‘reading’ starts one letter on. Creating a GA to generate such information-dense coding would seem to be out of the question. Such demands an intelligence vastly superior to human beings for its creation.
The outcome with a GA is ‘pre-ordained’ (‘formal’). Evolution is by definition purposeless, so no computer program that has a pre-determined goal can simulate it—period. This is most obviously true of Dawkins’ ‘weasel’ program, where the selection of each letter sequence is determined entirely on its match with the pre-programmed goal sequence (see further reading below). That GAs are not valid simulations of evolution because of this fundamental problem has been acknowledged—see this 2009 quote. Perhaps if the programmer could come up with a program that allowed any random change to happen and then measured the survivability of the ‘organisms’, it might be getting closer to what evolution is supposed to do! Of course that is impossible (as is evolution).
With a particular GA, we need to ask how much of the ‘information’ generated by the program is actually specified in the program, rather than being generated de novo. A number of modules or subroutines are normally specified in the program, and the ways these can interact is also specified. The GA program finds the best combinations of modules and the best ways of interacting them. The amount of new information generated is usually quite trivial, even with all the artificial constraints designed to make the GA work.

For the above reasons (and some of them overlap), and no doubt there are more that could be added, GAs do not validate biological evolution. It does not take long with a decent calculator to see that the information space available for a minimal real world organism of just several hundred proteins is so huge that no naturalistic iterative real world process could have accounted for it—or even the development of one new protein with a fundamentally new function.

Another type of ‘simulation’ is that of antitheist T.D. Schneider.¹ Schneider claims that his program simulates the naturalistic formation of DNA binding sites for gene control. This exercise has led to grandstanding by some evolutionists that this proves creationists wrong. However, many of the same problems outlined above also apply to this programming exercise. For example, the selection coefficient is extremely high, the genome is extremely small, the mutation rate high, no possibility of extinction is permitted, etc. For many other problems, see the critique by Dr Royal Truman.

Note that we are not saying that mutations and natural selection cannot generate any information. It’s just that with real world generation times, real-world sized genomes and real-world organisms which have to survive through multi-dimensional adaptive traits, there has not been enough time to generate even a tiny amount of the biological information seen in living things. As Spetner says, look, if mutations and natural selection have generated all the information we see, then we should be able to easily find some examples of some new information (i.e. increase in specified complexity) arising today. The best that anyone has come up with is a GA, which does not simulate real world evolution, for the reasons outlined above.

Reference

Schneider, T.D., Evolution of biological information, Nucleic Acids Research 28(14):2794–2799, 2000. In this paper, Schneider acknowledges the input of fellow atheist Richard Dawkins. Return to text.