Friday, December 08, 2006

High probability start points for natural selection?

(This whole post may be based on a misapprehension of an argument from commenters. If so, please say so in the comments.)

The argument has been presented in comments that, in looking for primitive, low specification versions of cytochrome c, I am missing the point. In fact, I am told, recent work has suggested that smaller polypeptide sequences with relatively high probability, with the capability of binding enzymes, could be the earliest selection units – effectively a bridge between an amino acid soup and well organised life.

This sounds really neat – we no longer need to worry about the random appearance of low-probability polypeptides in early organisms. Presumably, instead, we would find relatively short sequences of DNA or RNA in the organism, which encode for polypeptides with a length of (say) 10 amino acids. These bind proteins which are present in the environment – again, the proteins formed at random, but not having to be genetically encoded in the organism at this stage – which are then available for the organism to use. Are people happy with that as a description of the process being proposed?

This post is a first attempt to think through the implications of this. I don't think that one can simply say - “Ah! We don't need low probability events for evolution to start! Here is a process that only needs relatively high probability events.” The reason for this is that whilst there isn't a high specification required for the short binding polypeptide in the organism, we have quietly introduced low probability elsewhere into the process. We need to make sure that the new process is no less probable than the process that we were hoping to dispense with.

Where do the proteins come from that these low-specification sequences bind to? Are they sufficiently high-probability as to be fairly abundant in the environment in which the proto-organism finds itself?

(I am being vague about the “environment” and the “organism” at the moment. My concept at this stage is that the primitive organism has some means of keeping access to a supply of material from the “environment” - without the complex cell wall and gated transfer mechanisms we see today. The precise nature of the “cell wall” in a primitive organism is obviously one more thing to add to the “to work out” list.)

The abundance of amino acids in the environmental medium would be some value – let's say k Moles per litre. It seems reasonable to assume that the formation of peptide bonds between amino acids is not strongly favoured, or the entire concoction would fairly abruptly become scrambled egg (or, to talk more scientifically, long polypeptide sequences would precipitate out of the mixture) – only if the mixture is liquid can a process continue whereby different forms can be experimented with by this process. On the other hand, it seems reasonable that if two amino acids are positioned appropriately, they would be fairly likely to form a bond.

If all the single amino acid molecules joined with other single amino acids, you would have a mixture of dipeptides with half the abundance (k/2) of the single amino acids. If a protein requires a sequence of n amino acids, then the abundance of polypeptides of that length is not more than (k/n) – probably substantially less - I'll try and think a bit more about this .... Furthermore, as the length of proteins increases, the number of different combinations increases. We find ourselves back looking at specifications, again. What is the specification of the protein that has to be bound by our short polypeptide sequence? It is hardly likely to be less specified than the polypeptide that is to bind it – otherwise it would be more likely that the organism would simply be producing the protein, rather than the binding polypeptide. So both the binding polypeptide must be present, and a protein which has a specification no smaller. The probability of the first was estimated at around 10-11. Let's assume the specification is the same. Even if the specification of the binding polypeptide is as suggested by Art, the probability of it being able to bind with a suitable protein is less than 10-22. Not beyond the probability boundary – but a lot less useful than the specification of the binding polypeptide would suggest. And it seems unlikely that being able to bind with a protein that is no more specified than the binding polypeptide would be of any real advantage to an organism – surely it's more evolutionarily obvious to manufacture the protein itself. So I would suggest that the bound protein would in actual fact have to be significantly more specified than the binding protein – which means that the probability of it being present gets even smaller.

You then have the issue that having polypeptides that can bind to a protein is of little value to an organism. An enzyme – cytochrome c, for example – is actually a complex little machine in its own right. You could write a list of requirements that are necessary for a generic (or genetic!) machine to work – input, output, power, action and control. It is the co-ordination of thousands of biochemical machines within a cell, or tens of thousands within a multicellular organism, which is the amazing feat that life is. Conceptually, what this proposal suggests is that we can conceive of the “input” component arising separately at random. The “output” - an “un-binding” mechanism – the “power” - a system for harnessing energy from somewhere else – the “action” - a means of reconfiguring the “input” to make the “output” - and the “control” - a means whereby the organism can cause this to occur according to requirements – are not suggested.

It would be too much to say that an enzyme was “irreducibly complex” because it had all of these requirements – I have discussed elsewhere the fact that small enzymes are not apparently beyond the probability boundaries, which I think is an unspoken part of the “irreducible complexity” concept. However, there are certainly levels of complexity present in the functionality of proteins used by organisms that this proposal has not yet addressed. It is not clear why the “input” part of a machine should convey a selective advantage to an organism. Furthermore, since the organism has no means of being able to make the protein (which is no less specified than the binding polypeptide, remember, and probably substantially more) that it requires, it has no guarantee that the binding polypeptide will continue to be any use.

It should go without saying that you can't assume that other bits (of enzymes or whatever) are being made elsewhere for use at the same time. As soon as you do this, you are simply slipping back in the improbabilities that you took out to start with, whilst nobody is looking.

Incidentally, it is also worth reminding ourselves that we are still making assumptions here about there being a suitable environment in which this process can occur, and the presence of fairly-well specified equipment for synthesising proteins from RNA or DNA. As I said in earlier posts, for the sake of trying to get a handle on this area that we are looking at, these issues can be set aside, as long as we come back to them at some stage.