Tuesday, November 28, 2006

Evolution - moving the discussion on

Why bother with all that stuff about the specification or probability of cytochrome c? Well, the aim is to move on the debate between evolution, creationism and intelligent design. Certain classes of argument - on both sides of the debate - can be clearly labelled as flawed, on the basis of this work. This isn't to attract glory to me, incidentally - I have little doubt that this work has already been done better elsewhere, and as much as anything else, the fact that it is written down here is as much as anything to give me somewhere to point people to when they present these customary, sloppy arguments.

A frequent creationist allegation is that "evolution can't happen because a protein with a specific sequence of amino acids is incredibly improbable." Here's an example, from "Answers in Genesis."
However, ignoring all such problems, and many others that could be detailed, what is the probability of getting just 100 amino acids lined up in a functional manner? Since there are 20 different amino acids involved, it is (1/20)100, which is 10-130. To try to get this in perspective, there are about 1080 fundamental particles (electrons, etc) in the universe. If every one of those particles were an experiment at getting the right sequence with all the correct amino acids present, every microsecond of 15 billion years, that amounts to 4.7 x 10103 experiments. We are still 1027 experiments short of getting an even chance of it happening. In other words, this is IMPOSSIBLE!
Cytochrome c is a 100 amino acid protein, for all intents and purposes. But I have demonstrated that it is not anywhere near as improbable as 10-130 - in fact, more than forty orders of magnitude more probable - and in a less specified format, as it would have been when it first appeared, it would have been less improbable again.

There is an argument from improbability - but it doesn't start from a single specific sequence of amino acids.

Similarly, the vague darwinist assertion that "there are gazillions of bacteria, each of whom have the opportunity to make random sequences, so really the suggestion that there is any problem with improbability is absurd" is just as flawed. TalkOrigins is careful to avoid this - there are other points at which their argument could be challenged, but this isn't one. I'm not sure that this is so much the case with people who casually engage in this debate, and stop counting when they get to 1020 bacteria.

What I have suggested - and again, I have little doubt that the same calculations have been carried out elsewhere - is that there is an effective probability boundary of 10-60 for proteins generated at around the time of the origin of life, in the conventional evolutionary model. If a protein has to be specified such that it has a lower probability of arising than this, then there aren't sufficient probabilistic resources. A similar analysis can be applied in a similar sort of way to the appearance of proteins and genes at other stages in evolutionary history.

From here, the analysis opens out in several directions. For a start, in determining a minimum for the improbability of proteins when they initially appear, and determining a current improbability, it is possible to determine how much "work" evolution has done, in refining proteins through evolutionary history. Are the mechanisms available to evolution able to do this work?

Also, the figures that I put down can be more accurately determined - for example, the time at which cytochrome c first appeared can be pinned down. More accurate estimates of the current improbability of cytochrome c can be determined. (I notice that in a paper cited on the TalkOrigins page given above, Yockey says that the improbability of cytochrome c is nearer 10-68 - which makes me feel quite smug about my guess of 10-70!) More thought-out figures for the amount of carbon available for forming "proto-proteins" can be determined. Big issues related to how the genetic code and translation systems might appear remain unresolved.

Starting to attach numbers to the specifications of proteins has an impact on the debate about irreducible complexity. Implicit in this concept is that the different components are themselves improbable - and thus the requirement for them to appear simultaneously requires the multiplication of small improbabilities. The probabilities ought to be hardened up. If we can get a handle on the value of these probabilities, we can again move away from the "yes it is, no it isn't" style of debate that is what the current state of the art boils down to.

Both sides in the debate suggest that this sort of analysis is something that the other side ought to be doing. In truth, without attaching numbers to the propositions that are being kicked around, both sides are debating not science, but presuppositions.

Beautiful Day

Man, I love this song! Don't forget to read the comments, to see how U2 themselves make reference to Jeremiah 33.

H/T U2 Sermons.

Monday, November 27, 2006

Christian philosophy

The Constructive Curmudgeon, Douglas Groothuis, writes a eulogy for Dr. Robert T. Herbert, who died recently.
Professor Herbert and I disagreed on many things. In a Philosophy of Religion seminar he was leading, I voiced concerns about a paper in which he argued that believers come down with faith as one comes down with a cold. That is, the faith is neither rational nor irrational. (The paper was subsequently published in Faith and Philosophy.) He asked me to write a response to his paper. I agreed with some hesitation. (Whether I really had a choice, I do not know.) When I received the paper back, it was filled with red ink comments challenging nearly every one of my criticisms. At the bottom of the last page was the grade: A+.
I have been involved in an email discussion recently relating, in effect, to epistemology - that is, the foundation of knowledge, both for Christians and non-Christians. This pointed me in the direction of Cornelius van Til. If you follow this link, you will find his "Credo", at the Center for Reformed Theology and Apologetics. I was surprised at how closely it fit with my own (less organised!) philosophical thoughts - and I was both gratified and a little disturbed to find this perspective described as "calvinist" - a label I had always shied away from for various reasons.

Sunday, November 26, 2006

The B of the Bang

... is the name of a sculpture, conceived of by Thomas Heatherwick, which was built in Manchester, UK, for the 2002 Commonwealth Games. It derives its name from a quotation from Linford Christie, who said that he had to start not just at the bang of the starting pistol, but at the "B of the Bang".

It's a stunning sculpture - it stands taller than any other sculpture in Britain. I want to post some comments on Heatherwick as a designer, from an article in yesterday's Daily Telegraph.
Heatherwick's creations are certainly eccentric, but they never stray from his central belief that good design should be "readable rather than impenetrable": you need no background knowledge to be touched by their immediate, exhilarating brilliance. After studying design in an era when the world of architecture seemed impossibly rarefied - "There were all sorts of discussions about sacred geometries and things, but to my mind nothing interesting was being built" - he is determined to make things that will appeal to a universal audience.

"Is something dumbed-down if it appeals to an eight-year-old, and therefore of no interest to someone from an academic perspective?" he asks. "Or is there potential for something to have meaning across both levels? It is my aspiration to prove that there is."

Telegraph Review, 25/11/2006

Thursday, November 23, 2006

For the record ...

Doubtless not as liberal as Bec wishes, but definitely more liberal than most US citizens. (But then, who isn't?!) Thanks Miss Mellifluous for reminding me of this, and showing me I could blog it!
You are a

Social Moderate
(50% permissive)

and an...

Economic Liberal
(31% permissive)

You are best described as a:

Centrist










Link: The Politics Test on Ok Cupid
Also: The OkCupid Dating Persona Test

In Our Time ...

... in case you missed it, or are unaware of it, featured Richard Dawkins (amongst others) talking to the presenter Melvyn Bragg about altruism.

I didn't hear it all - if you are interested, the BBC has a "Listen Again" feature that doesn't in fact require you to have heard it once already. Go to this page and follow the link.

There was the odd bit even in what I heard. In talking about the different effects of culture, Dawkins talked about the "darwinian foundation" - but then added that the zeitgeist might be different from decade to decade. One was tempted to ask - in that case, what exactly is the predictive significance of the darwinian foundation?

And David Stove's commentary on Darwinism was not mentioned, as far as I know.

Evolutionary history of cytochrome c

This post is a little provocative, and is based in part on my earlier posts about the specification of cytochrome c. If this analytical process is fair, then it should make possible the sort of explanation outlined below. Note that, in my opinion, this is substantially more detailed than the traditional evolutionary explanations offered. However, I would be interested in other people's thoughts on the numbers involved, or pointers to other places where similar analysis has been carried out.

Cytochrome c is genetically coded. Other forms of electron transport are possible, but this one could not have predated the genetic code and transcription mechanisms. This means that it is likely that cytochrome c would not have substantially predated the appearance of prokaryotic life. On the other hand, the gene that became cytochrome c was established in prokaryotic life by the time it became dominant – say 3500 Myr. The reason for this conclusion is that we don't see any forms of life that don't utilize cytochrome c, with the exception of parasitic forms which would not have predated this point.

We can determine how many possible events there were that might have led to the first appearance of cytochrome c, and thus the “probabilistic resources” available for this evolutionary step. The total time available is (let's say) of the order of 1000 Myr. Assuming that there are biochemical processes that can occur 100 times per second (I suspect that there are very few biochemical processes that are energetically neutral to such an extend that they could oscillate at this frequency – the mechanisms that allow cells to tick “more rapidly” are dependent upon the sort of systems that we are seeking to explain), this represents of the order of 1018 biochemical “ticks”. There are of the order of 1022 moles of organic carbon available – that is, of the order of 1046 atoms in total, of which let's say 1% is in an appropriate environment to be a resource for finding proto-cytochrome c, and proto-cytochrome c requires of the order of 100 carbon atoms.

Thus, the total number of reconfigurations of hundreds of carbon atoms available to try and find proto-cytochrome c is about 1060. If proto-cytochrome c was substantially more specified (less probable) than this, then there would not have been sufficient probabilistic resources available to consider its appearance to be a likely event in this timescale. Note also that this says nothing about the requirement of existing genetic systems.

By the time of the appearance of multicellular, eukaryotic life – say 600 Myr: 100 Myr in either direction has little effect on the argument – cytochrome c was already well specified. The reason for this conclusion is that all modern life has a similar high specification – there aren't any substantially different forms in different phyla. It would be possible that the convergence to a generally single form of cytochrome c took place in different phyla after the Cambrian era – but with prokaryotic organisms having faster generation times, asexual reproduction and being dominated for selection by fewer factors, it seems more likely that a highly specified form would have been established before this point in time.

Based on the list of 113 versions of cytochrome c referenced earlier, the probability of a random sequence of DNA coding for one of these versions of cytochrome c is about 10-85. It was observed that going from 103 to 113 different versions increased this probability by two orders of magnitude (from about 10-87). It is pretty arbitrary, but for the sake of argument, let's say that the probability of a random sequence of DNA coding for any version of cytochrome c (which today would be well-specified) is of the order of 10-70. This is then a measure of the specification of cytochrome c today, following all the years of natural selection on the earlier forms.

If these figures (or ones like it) are accepted, then we can see the scale of the work done by natural selection from the time random mutation produced proto-cytochrome c to today. Cytochrome c has evolved from requiring a DNA sequence no less probable than 1 in 1060 to having a DNA sequence that we are saying has a probability of 1 in 1070, in around 3 Myr.

Note that arguing for a lower specification for proto-cytochrome c increases the demand placed on natural selection to arrive at its modern specification. Arguing for too high a specification for proto-cytochrome c runs the risk of exhausting reasonable probabilistic resources for its initial appearance.

Tuesday, November 21, 2006

How Music Works with Howard Goodall

This was an excellent programme. Now where can I get a DVD of it? Channel 4 say it's not going to be made available for sale, and there will only be one repeat on More4 sometime next month. Did anybody video it?

Saturday, November 18, 2006

Variability of cytochrome c across species

(Sorry, I don't know why this image came out like that - click it to get the proper version)

Using the spreadsheet referred to below, I did this graph of the moving average of the variability of cytochrome c across the 113 species. The x axis represents distance along the amino acid chain. It starts below zero, as some species have chains of amino acids that come before the start of the reference sequence. The y axis represents the number of codons that could be used across to encode each position - this drops as low as 2 in some locations, where only one amino acid is present in the cytochrome c of every species. The average rises to almost 30, signifying that around half the 20 occurring amino acids might be used at a particular location in cytochrome c.

Friday, November 17, 2006

The specification of proteins - part 3

The table referred to in my earlier posts has a list of cytochrome c sequences for 113 different species, all aligned as far as possible using the horse heart cytochrome c as a reference sequence. So, this sequencing and alignment having been carried out, the same process can be carried out for all the positions of all the cytochrome c sequences in the table.

Incidentally, I would like to point out that I made one change, in sequence 21 (ceratotherium simum) – at position 48, there were a series of amino acids that seemed to be incorrectly aligned with the reference sequence. Inserting two empty locations before the sequence DANKNKG, and removing two of the empty locations afterwards gives better alignment.

So the list of species provided was rematched with the amino acid sequences in the table, and then the amino acid sequences were “exploded” into individual columns, giving a table with 113 rows and 118 columns, each containing either a letter or a hyphen. I then counted the number of each different amino acid (letter) in each column. Each amino acid can, as was discussed in the previous post, be encoded by one or more codons. So, if I know which particular amino acids can be present at a location, and I know how many codons encode these amino acids, then I can work out how many of the 64 available codons could work at each position. Thence, I can determine an estimate of the probability that, given a random sequence of DNA, it will code for a functional cytochrome c protein.

Discarding the first 9 places in the table – in other words, accepting position 1 on the horse heart cytochrome c as the first significant position – and the last 2 places, which generally aren't part of the amino acid sequences, I can multiply up the total number of valid codons for each position, to give the total number of possible cytochrome c sequences, assuming that any amino acid that works at a location can be put there to make a viable cytochrome c protein. This comes to about 1.5x10112. There are 106 places under consideration: 64106 is 2.8x10191.

The proportion of valid cytochrome c sequences in this domain space is the ratio of these – that is, 1 in 1.9x1079.

Let's unpack the significance of this a little more. I have not assumed that only one cytochrome c sequence is valid – a challenge directed at many ID proponents. Neither have I assumed that only the 113 given cytochrome c sequences are valid. I have assumed that, if an amino acid appears at a given position in any of these cytochrome c sequences, then that is a “possible answer”. This analysis allows me to construct a very large number of possible cytochrome c sequences, only 113 of which happen to constitute the table, and all of which I am assuming would be functional. Despite this, the proportion of valid cytochrome c sequences in the domain space of 106 amino acid polypeptides is of the order of 1 in 1079. For reference, the total number of atoms in the earth is around 1050.

The range of species that is covered by this survey is very large – everything from humans to rice to saccharomyces. It is likely that additional species would add to the number of conceivable cytochrome c combinations – by showing that different amino acids would work at positions not covered already. However, given how widely the net has been cast with this approach, and the range of species considered, my hunch is that the increase would not be more than a few orders of magnitude.

However, this can be investigated. This process could be carried out omitting several of the sequences, and seeing what effect this has on the ratio. Or, if other candidate sequences of cytochrome c are available, they could be added, again determining the effect that this has.

To consider this from a naturalistic perspective, we can assume that given the key role that cytochrome c has within cells, and given its ubiquity, selection pressures on it would be strong, and in billions of years of evolutionary history, this would have allowed it to arrive at a highly specified form. It is possible to argue that this being so, all versions of cytochrome c that we observe in the world today are far more specified than would have been necessary in the most primitive organisms. In fact, this analysis is also useful from a naturalistic perspective. In determining how specified (improbable) cytochrome c is today, and in estimating the probabilistic resources available in early stages of evolutionary history, we can calculate how effective evolutionary processes are in improving the specification of proteins. This sort of analysis would be very useful if darwinists wish to move away from "hand-waving" explanations towards a solid empirical foundation for their beliefs.

Thursday, November 16, 2006

That's my king!

I'd never heard this before. If you are a Christian, this will help to remind you of why. If you aren't a Christian, it is a different experience from my stumbling and inaccurate apologetics.

H/T Andrew and Cora.

Pictures from South America

My wife's cousin James has been living in the west of South America for quite a few years now. Initially, he was doing tour guiding, but more recently he has focussed on photography. He has his own website, through which it is possible to buy copies of the stunning images.

Wednesday, November 15, 2006

The specification of proteins - part 2

For part 1, see below.

I said below that, in considering how specified Cytochrome C is, “we need to determine what the probability is of a random sequence of DNA coding for Cytochrome C, rather than what the probability is of a random polypeptide being Cytochrome C”. So let's start with Cytochrome C for a horse – the first line in the table referenced before. The sequence of amino acids starts:

GDVEKGKKIFVQKCA ...

and ends:

... KKTEREDLIAYLKKATNE[Stop]

Each amino acid can be coded by one or more DNA codon (Incidentally, I will show my ignorance by pointing out that I am working on the basis that Cytochrome C is encoded in normal genes, rather than mitochondrial genes. I understand this to be the case – see here. However, even if Cyt C were encoded in the mitochondria, the principles discussed here could be rewritten to apply to this). Given that there are 64 codons and 21 different items encoded (20 amino acids and the stop sequence), there is an average of 3 codons per item encoded. But, with a table of the genetic code, we can be more precise than this. Four codons can encode the first G (glycine) in the polypeptide. Two can then encode the next D (aspartic acid) and so on, through to the gene termination, which can be one of three codons.

The entire gene for horse Cytochrome C, then, is encoded by a sequence of 104 codons. There are 64104 possible sequences of codons – that is, 7x10187 – but by multiplying together the numbers of possible codons that would encrypt the given amino acids in each position, we discover that there are 2x1045 permutations that would encode exactly this sequence of amino acids. So the probability of any given sequence of 104 codons encoding precisely the sequence for horse Cytochrome C is the second number divided by the first – that is, about 3.5 x 10-143.

It is worth noticing that this is 13 orders of magnitude less probable than that a random sequence of 103 amino acids would turn out to be horse Cytochrome C, and it would be interesting to know whether this was generally the case (that is, amino acids used in proteins are more frequently those encoded by less than the average number of codons).

However, this isn't the whole story (“But that is not all, no that is not all”). We have 113 different versions of Cytochrome C, and we now need to consider what effect these other versions have on what we can say about the specification of this protein. We can continue with the amino acid sequence for a zebra – sequence number 25 in the table. This differs in just one position from that of the horse – at position 47, it has serine (S) rather than threonine (T). Here is where assumptions start to become important. If we assume that this is a neutral substitution, and is simply evidence of evolutionary divergence, then we can conclude that we could now have any one of ten codons at position 47 – one of the six for serine, or one of the four for threonine. This single change doubles the probability of a random sequence of codons encoding Cytochrome C – albeit only to the unhopeful-looking probability of 8 x 10-143, but we can start to see the direction this will move in.

To be continued ...

Monday, November 13, 2006

The specification of proteins - part 1

Part of the problem in the scientific debate between ID proponents and opponents is that much of it is conducted with relation to intractable universal problems. The debate then tends towards a pythonesque “Yes it does”, “No it doesn't” series of contradictions. I am keen to try and use some of the ideas to look at more tractable specific problems, and use principles learned working on these to extend the debate to areas where more definite conclusions can be drawn.

Proteins are specified. That is to say, as we find them today, they aren't simply random sequences of amino acids. The information that they incorporate allows them to express functionality that is of use to a cell or to an organism. Take Cytochrome C, for example. It has a functional specification – a substantial one. Its function can be described in terms of what it achieves for an organism – its Wikipedia entry gives detail about this. It can also be functionally described in terms of the low level biochemical reactions that it catalyzes.

Since Cytochrome C is a protein, it is also coded by a specific sequence of amino acids. Actually, this isn't strictly true. The sequence of amino acids that codes for Cytochrome C is different in different organisms. A table listing different amino acid sequences for Cytochrome C in 113 different species can be found from here.

Just how much specified information does Cytochrome C contain in the sequence of its amino acids? If we know how much information it contains, then we can calculate how likely it is that Cytochrome C would appear by chance – the probability that a random polypeptide would happen to be Cytochrome C (or close enough to be useful for natural selection). And if we can determine this, we can say how reasonable is the chance hypothesis for explaining the initial appearance of Cytochrome C.

If there were only one sequence of 100 amino acids that was universally used to code for Cytochrome C, the probability of it (20 possible amino acids, 100 places in the chain) appearing as a random polypeptide sequence could be simplistically expressed as 1 in 20100 – that is about 10-130. This probability is low – but it is above Dembski's UPB, and much more significantly for opponents of ID, it substantially overestimates the specification of Cytochrome C even within our understanding.

But to make sure that we get our foundations right, it's necessary to see that we have already gone wrong at this point, since we are not properly considering the reference frame. The reference frame is actually not simply the sequence of amino acids that make up Cytochrome C, but the genetic coding of these amino acids. The organism doesn't record Cytochrome C as a polypeptide, but as a DNA sequence. So we need to determine what the probability is of a random sequence of DNA coding for Cytochrome C, rather than what the probability is of a random polypeptide being Cytochrome C.

It's also important to point out at this stage the fact that we have not derived the reference frame. To understand this, consider the fact that the words of this post are an improbable sequence of letters that convey information. But they only convey information given the pre-existing reference frame of the English language (ignoring the additional layers of complexity which are represented by the medium on which this is being read) – they say nothing about how the English language came about in the first place. The information required for Cytochrome C to be present in an organism is the DNA sequence that encodes for it in the genes of the organism, but it is also the reference frame which includes the mechanism to convert the DNA sequence into a protein. The task of darwinism – or any “ism” that addresses the issue of origins – isn't only to explain the appearance of Cytochrome C (for example), but also to explain the presence of the reference frame which allows Cytochrome C to be encoded and manufactured to demand.

The question is more subtle even than this. For example, the darwinian presumption is likely to be that the 113 different Cytochrome C sequences enumerated in the table above are functionally identical, and the differences in amino acid sequence simply represent evolutionary divergence. However, it is conceivable that rather than being functionally identical, each version of Cytochrome C is actually specific to the species in which it is found – that the reference frame isn't simply a generic DNA coding and expression framework, but is the specific organism in the case of each protein. This is perhaps unlikely for a relatively simple protein like Cytochrome C, but may be more relevant for complex and specific proteins. This issue is at the heart of the ID objection to many of the co-option scenarios that are proposed to explain the appearance of complex biochemical systems – that it is an unjustified darwinian assumption that proteins can arbitrarily be re-used or re-located within an organism, ignoring the reference frame.

However, these issues can be put aside for now, as long as they don't disappear off the radar indefinitely.

To be continued ...

Wednesday, November 08, 2006

Hours per degree

So people on different courses do different amounts of work? Well, how amazing!

For the record, on my CompSci degree, we had four hours of lectures six days a week, 9 am to 1 pm. We also had two 2-hour practicals in the afternoons. That's 28 hours of scheduled work per week. In addition to that, we had projects that we had to work in, and programming course work. I see the medical student started at 10 am. No such luck for us. In large measure, we simply ended up skipping the 9 am lecture.... Natural Science Part I entailed a similar workload - I think we had three practicals, rather than 2, but on two days a week, we only had 3 hours of lectures. I still remember the regular dash down Tennis Court Road from the Old Museums Site to the Chemistry Department at 11 am - 300 cyclists in tight formation.

In the meantime, there were rumours that the Land Economy course at the same university involved just 2 hours of lectures a week. That made the student newspaper at the time.

My nephew has timetabled (I believe) 7 hours lectures per week on his English degree.