Monday, October 10, 2005

Avida Loca

ID: The Future published this article, which irritated some acquaintances of mine, who argued that IDTF had failed to seriously interact with this paper. So here are a couple of more formal informal comments on the paper.
... all populations explored only a tiny fraction of the total genotypic space. Given the ancestral genome of length 50 and 26 possible instructions at each site, there are c.5.6x10^70 genotypes; and even this number underestimates the genotypic space because length evolves.

This range of "genotypes" for the digital organisms is very small. For comparison, there are around this number of different DNA sequences for a length of 120 bases (coding 40 amino acids). Or, there are about 10^70 unique proteins with a length of 55 amino acids. This being the case, whilst this simulation might be appropriate to look at how the evolution of a single protein might occur (proteins, with secondary and tertiary structures, might themselves be considered to be "irreducibly complex" in their eventual form), it can hardly be said to model the evolution of organisms. The DNA of E.Coli is about 5 million bases long. The genotypic space of this "simple" bacterium is thus - well, ten to the power of several million. I didn't do logs at school, but I can see where that's going.

The significance of this is that the space over which Avida has to search for functionality is tiny compared to the space over which a "real live" organism would have to search for functionality. That means that it rapidly becomes harder to start from nothing and get "an answer" - especially if the answer, rather than being typically a sequence of the order of 40, was of the order of 400, or 4000.

The handwritten ancestral genome was 50 instructions long, of which 15 were required for efficient self replication; the other 35 were tandem copies of a single no-operation instruction that performed no function when executed

So Avida gives organisms a huge head start. In this simulation, the digital organisms are given a nice, 70% blank genotype (). This would be selected against in real life - it adds to the energetic burden on the organism, whilst providing no additional functionality. It seems unlikely that a real organism would actually arrive at a state where it had even 10% of its DNA "free" to be changed with no deleterious effect on the organism. And yet all of the digital organisms are conveniently started off in this state. This represents a huge "head start" in evolutionary terms - because changes to the majority of the "DNA" don't damage existing functionality.

Furthermore, in real life, the tools for transcribing DNA, reproduction etc etc also have to be encoded in the DNA. In Avida, the tools that interpret the instructions are coded in the computer, and are thus "safeguarded" - they won't be corrupted. (Though, by the same token I guess, they don't evolve, either). Again, this represents another significant evolutionary advantage.

Bottom line:
On the basis of what I've read so far, I have little reason to disagree with the ID:The Future comments that the claims made for the achievements of this study are substantially exaggerated. I suspect that if the authors were to go no further than to say that this was an analogue of how a protein might evolve, their claims might be considered more reasonable. The fact that they argue that this is not even a model of evolution, but describe this as "digital life", cries out for a close analysis of their claims, and in statistical terms, this simply highlights that the model substantially underestimates the problems of real life evolution.

Below the bottom line:
I have downloaded, installed, and run (several times) Avida. Despite the cheap joke in the title, it is an interesting piece of software. Further comments can be found here.