Thursday, October 20, 2005

More about Avida

As promised, here are some more comments on the Avida “artificial life” software. I am interacting with this paper – and unless I say otherwise, I am using the parameters used by the researchers in preparing this paper (see the Methods section at the end of their paper).

Firstly, as a piece of software, it is good. It does what it says on the tin – you can download it, adjust a vast array of parameters, run it, rerun it, rewrite the source and recompile it (apparently – I have been happy enough with the "basic package" not to try this myself yet) and so on. You can’t save it "mid-run", as far as I can tell – so I ended up with a computer running overnight – and (perhaps because I was using a beta version) it tended to fall over if you fiddled with the wrong things at the wrong time. You at least get the results saved as it goes along – but it was a little frustrating to have a run stop after several hours and 30,200 updates because I asked it to display the wrong thing. However, there aren’t many science experiments these days that you can try at home – but I have spent some time running Avida on a couple of PC’s, which has allowed me to evaluate the software, and also give consideration to the research that has already been done. Few areas of research can make themselves available to review by both professionals and amateurs – all credit to the Avida team for doing just that.

However, given my previous posts, it will hardly come as a surprise that my comments on what the software shows diverge from the researchers’ conclusions.

Interaction with the Avida software can be found on the ARN discussion forum (here is one of the threads), and the reason that I investigated it in more detail was because of reaction against this post in the ID: The Future blog, which various acquaintances thought weren’t taking the research seriously.

The software clearly demonstrates how a selective advantage will propagate through a population. In the paper, the digital organisms, through random mutation, may generate new functionality – in terms of giving as an output the result of a logical operation not built into the genetic language of the organism. In simple terms, the "harder" the logical operation – that is, the more instructions that would be needed in the genetic language to achieve the operation – the greater the "reward" - expressed in terms of a larger proportion of CPU time offered the organism that is expressing this functionality. In the paper, nine different logic operations are examined. The gain in functionality is cumulative, and also increases geometrically for the more complex functions. Thus, for expressing the simplest functions (NOT and NAND), the fitness of the organism is doubled. For expressing the most complex function (EQU), the fitness is multiplied by 32. The total impact of expressing all nine functions is that an organism would get 33 million times more CPU time than an organism of the same length expressing no functions.

The consequence of this is that, if a viable digital organism appears that expresses some of the more advanced functionality, the new functionality will rapidly dominate the population. For example, on a run of organisms of starting length 150, after a false start that didn’t get anywhere (perhaps the expression of the function was too sensitive to change, or perhaps it had appeared at the expense of a couple of other functions, resulting in a loss of overall fitness), the number of organisms expressing EQU rose from 2 (out of 3600) at update 15600 to 2000 at update 16600.

In a sense, this is kind of obvious – if an animal has a significant selective advantage, it is trivial that this advantage will spread across the population. Neither am I unhappy with the multiplicative nature of selective advantages. In the same way that protein functionality can easily and drastically be lost through even single changes to the amino acid sequence (as per the sickle cell anaemia mutation), it seems likely that comparatively small changes can result in abrupt improvements in functionality that can be expressed. There have been papers on how "anti-freeze" proteins arise in fish, and how small changes to genes can result in bacteria able to digest synthetic compounds that they would not have experienced in nature. However, the neat spread of a new colour across the map on Avida doesn’t generally correspond with nature, for various reasons.

Firstly, as I mentioned below, the domain space that relates to these digital organisms is much smaller than that of any organism in nature – the evolution that we are looking at is closer in scale to the evolution of a protein than an organism. The number of possible digital organisms of genome length 50 is getting on for 10^71 – which corresponds with a stretch of DNA with 120 bases, or a protein with a sequence of 55 amino acids. Within this, the simplest function that would give an increase in fitness would be coded for in only four or five codons. The most complex can be coded for in 19 codons. Of course, the likelihood of this arising by chance is small, but not vanishingly small. Dembski proposes a universal probability bound of 10^-150 – in other words, if a specified event is less likely than this to occur, then it is reasonable to assume that if it has occurred, it didn’t occur by chance. As Avida runs, it will end up randomly trying millions of candidate organisms for functional improvements – and it is obvious, given the size of the domain space, that this will yield organisms with increasing fitness.

But what is the likelihood of new functionality arising in a real organism that will give it a selective advantage? The antifreeze proteins and synthetic compound digesting enzymes mentioned above weren’t found through randomly trying everything – they were the result of minor changes of parts of the genome that had an existing role in the organism. What is the likelihood of (say) a proto-heme arising by chance, or being co-opted through minor modifications from another protein? What is the likelihood of the enzymes arising by chance that are at the heart of transcription and reproduction? As was observed in the ARN discussion, the developers of Avida didn’t wait for random changes to produce the genetic code of the digital organisms – and yet, this is a key step in a naturalistic biogenesis process. Similarly, the digital organisms don’t end up setting their own fitness criteria – they are constrained to what is programmed – they don’t break out of the code. Is this only a matter of time and chance? How many generations would that take?

Following on from this, whilst the genomic mutation rate – at 0.225 – may have been comparable to an organism (see Methods), the mutation rate per codon was consequently much higher, whilst the total number of codons was far lower. This means that evolution in the model will happen at a far higher rate than in real life.

All of these are acceptable simplifications for the purpose of modelling something – but it is important to be aware of how these simplifications impact the validity of the model when it is related to what it is supposed to be modelling.

To be continued ....