You could save money by
recoding
the message to remove the redundancy.
Richard-Dawkins-The-Devil-s-Chaplain
We are including those that could never have survived even if they had come into existence, as well as those that might have survived if they had existed but as a matter of fact never came into existence.
Movement from one point in the landscape to another is mutation, interpreted in its broadest sense to include large-scale changes in the genetic system as well as point mutations at loci within existing genetic systems. In principle, by a sufficiently contrived piece of genetic engineering - artificial mutation - it is possible to move from any point in the landscape to any other. There exists a recipe for transforming the genome of a human into the genome of a hippo or into the genome of any other animal, actual or conceivable. It would normally be a very large recipe, involving changes to many of the genes, deletion of many genes, duplication of many genes, and radical reorganizations of the genetic system. Nevertheless, the recipe is in principle discoverable, and obeying it can be represented as equivalent to taking a single giant leap from one point to another in our mathematical space. In practice, viable mutations are normally relatively small steps in the landscape: children are only slightly different from their parents even if, in principle, they could be as different as a hippo is from a human. Evolution consists of step-by-step trajectories through the genetic space, not large leaps.
*I find this image, which is modified from the venerable American population geneticist Sewall Wright, a helpful way to think about evolution. I first made use of it in The Blind Watchmaker and gave it two chapters in Climbing Mount Improbable, where I called it a 'museum' of all possible animals. Museum is superficially better than landscape because it is three- dimensional, although actually, of course, we are usually dealing with many more than three dimensions. Daniel Dennett's version, in Darwin's Dangerous Idea, is a library, the vividly named 'Library of Mendel'.
DARWIN TRIUMPHANT
83
? LIGHT WILL BE THROWN
Evolution, in other words, is gradualistic. There is a general reason why this has to be so, a reason that I shall now develop.
Even without formal mathematical treatment, we can make some statistical statements about our landscape. First, in the landscape of all possible genetic combinations and the 'organisms' that they might generate, the proportion of viable organisms to nonviable organisms is very small. 'However many ways there may be of being alive, it is certain
61
that there are vastly more ways of being dead. ' Second, taking any
given starting point in the landscape, however many ways there may be of being slightly different, it is obvious that there are vastly more ways of being very different. The number of near neighbours in the landscape may be large, but it is dwarfed by the number of distant neighbours. As we consider hyperspheres of ever increasing size, the number of progres- sively more distant genetic neighbours that the spheres envelop mounts as a power function and rapidly becomes for practical purposes infinite.
The statistical nature of this argument points up an irony in the claim, frequently made by lay opponents of evolution, that the theory of evolution violates the Second Law of thermodynamics, the law of increasing entropy or chaos* within any closed system. The truth is opposite. If anything appeared to violate the law (nothing really does), it would be the factst, not any particular explanation of those facts! The Darwinian explanation, indeed, is the only viable explanation we have for those facts that shows us how they could have come into being without violating the laws of physics. The law of increasing entropy is, in any case, subject to an interesting misunderstanding, which is worthy of a brief digression because it has helped to foster the mistaken claim that the idea of evolution violates the law.
62
The Second Law originated in the theory of heat engines, but the
form of it that is relevant to the evolutionary argument can be stated in more general statistical terms. Entropy was characterized by the physicist Willard Gibbs as the 'mixed-upness' of a system. The law states that the total entropy of a system and its surroundings will not decrease. Left to itself, without work being contributed from outside, any closed system (life is not a closed system) will tend to become more mixed-up, less orderly. Homely analogies - or they may be more than analogies - abound. If there is not constant work being put in by a librarian, the orderly shelving of books in a library will suffer relentless degradation due to the inevitable if low probability that borrowers will return them
"Chaos here has its original and still colloquial meaning, not the technical meaning which it has recently acquired.
tAbout life's functional complexity or high 'information content'.
84
? to the wrong shelf. We have to import a hard-working librarian into the system from outside, who, Maxwell's-Demon-like, methodically and energetically restores order to the shelves.
The common error to which I referred is to personify the Second Law: to invest the universe with an inner urge or drive towards chaos; a positive striving towards an ultimate nirvana of perfect disorder. It is partly this error that has led people to accept the foolish notion that evolution is a mysterious exception to the law. The error can most simply be exposed by reference to the library analogy. When we say that an unattended library tends to approach chaos as time proceeds, we do not mean that any particular state of the shelves is being approached, as though the library were striving towards a goal from afar. Quite the contrary. The number of possible ways of shelving the N books in a library can be calculated, and for any nontrivial library it is a very, very large number indeed. Of these ways, only one, or a very few, would be recognized by us as a state of order. That is all there is to it. Far from there being any mystical urge towards disorder, it is just that there are vastly more ways of being recognized as disorderly than of being recognized as orderly. So, if a system wanders anywhere in the space of all possible arrangements, it is almost certain - unless special, librarian-like steps are taken - that we shall perceive the change as an increase in disorder. In the present context of evolutionary biology, the particular kind of order that is relevant is adaptation, the state of being equipped to survive and reproduce.
Returning to the general argument in favour of gradualism, to find viable life forms in the space of all possible forms is like searching for a modest number of needles in an extremely large haystack. The chance of happening to land on one of the needles if we take a large random mutational leap to another place in our multidimensional haystack is very small indeed. But one thing we can say is that the starting point of any mutational leap has to be a viable organism - one of the rare and precious needles in the haystack. This is because only organisms good enough to survive to reproductive age can have offspring of any kind, including mutant offspring. Finding a viable body-form by random mutation may be like finding a needle in a haystack, but given that you have already found one viable body-form, it is certain that you can hugely increase your chances of finding another viable one if you search in the immediate neighbourhood rather than more distantly.
The same goes for finding an improved body-form. As we consider mutational leaps of decreasing magnitude, the absolute number of destinations decreases but the proportion of destinations that are
DARWIN TRIUMPHANT
85
? LIGHT WILL BE THROWN
improvements increases. Fisher gave an elegantly simple argument to ] show that this increase tends towards 50 per cent for mutational changes of very small magnitude. * His argument seems inescapable for any single dimension of variation considered on its own. Whether his precise conclusion (50 per cent) generalizes to the multidimensional case I shall not discuss, but the direction of the argument is surely indisputable. The larger the leap through genetic space, the lower is the probability that the resulting change will be viable, let alone an improve- ment. Gradualistic, step-by-step walking in the immediate vicinity of already discovered needles in the haystack seems to be the only way to find other and better needles. Adaptive evolution must in general be a crawl through genetic space, not a series of leaps.
But are there any special occasions when macromutations areI incorporated into evolution? Macromutations certainly occur in the laboratory,t Our theoretical considerations say only that viable macromutations should be exceedingly rare in comparison with viable micromutations. But even if the occasions when major saltations are viable and incorporated into evolution are exceedingly rare, even if they have occurred only once or twice in the whole history of a lineage from Precambrian to present, that is enough to transform the entire course of evolution. I find it plausible, for instance, that the invention of segmentation occurred in a single macromutational leap, once during the history of our own vertebrate ancestors and again once in the ancestry of arthropods and annelids. Once this had happened, in each of these two lineages, it changed the entire climate in which ordinary cumulative selection of micromutations went on. It must have resembled, indeed, a sudden catastrophic change in the external climate. Just as a lineage can, after appalling loss of life, recover and adapt to a catastro- phic change in the external climate, so a lineage might, by subsequent micromutational selection, adapt to the catastrophe of a macromutation as large as the first segmentation.
In the landscape of all possible animals, our segmentation example might look like this. A wild macromutational leap from a perfectly viable parent lands in a remote part of the haystack, far from any needle of viability. The first segmented animal is born: a freak; a monster none of whose detailed bodily features equip it to survive its new, segmented
*He used the analogy of perfecting the focus of a microscope. A very small movement of the objective lens has a 50 per cent chance of being in the right direction (which will improve the focus). A large movement is bound to make things worse (even if it was in the right direction, it will overshoot).
tMacromutations, or saltations, are mutations of large magnitude. A famous example in fruit flies is antennapedia. Mutant flies grow a leg where an antenna should be.
86
;
? architecture. It should die. But by chance the leap in genetic space has coincided with a leap in geographical space. The segmented monster finds itself in a virgin part of the world where the living is easy and competition is light. What can happen when any ordinary animal finds itself in a strange place, a new continent, say, is that, although ill- adapted to the new conditions, it survives by the skin of its teeth. In the competition vacuum, its descendants survive for enough generations to adapt, by normal, cumulative natural selection of micromutations, to the alien conditions. So it might have been with our segmented monster. It survived by the skin of its teeth, and its descendants adapted, by ordinary micromutational cumulative selection, to the radically new conditions imposed by the macromutation. Though the macromutational leap landed far from any needle in the haystack, the competition vacuum enabled the monster's descendants subsequently to inch their way towards the nearest needle. As it turned out, when all the compensating evolution at other genetic loci had been completed, the body plan represented by that nearest needle eventually emerged as superior to the ancestral unsegmented body plan. The new local optimum, into whose vicinity the lineage wildly leapt, eventually turned out superior to the local optimum on which it had previously been trapped.
This is the kind of speculation in which we should indulge only as a last resort. The argument stands that only gradualistic, inch-by-inch walking through the genetic landscape is compatible with the sort of cumulative evolution that can build up complex and detailed adaptation. Even if segmentation, in our example, ended up as a superior body form, it began as a catastrophe that had to be weathered, just like a climatic or volcanic catastrophe in the external environment. It was gradualistic, cumulative selection that engineered the step-by- step recovery from the segmentation catastrophe, just as it engineers recoveries from external climatic catastrophes. Segmentation, according to the speculation I have just given, survived not because natural selection favoured it but because natural selection found compensatory ways of survival in spite of it. The fact that advantages in the segmented body plan eventually emerged is an irrelevant bonus. The segmented body plan was incorporated into evolution, but it may never have been favoured by natural selection.
But in any case gradualism is only a part of core Darwinism. A belief in the ubiquity of gradualistic evolution does not necessarily commit us to Darwinian natural selection as the steering mechanism guiding the search through genetic space. It is highly probable that Motoo Kimura
DARWIN TRIUMPHANT
87
? LIGHT WILL BE THROWN
is right to insist that most of the evolutionary steps taken through genetic space are unsteered steps. To a large extent the trajectory of small, gradualistic steps actually taken may constitute a random walk rather than a walk guided by selection. But this is irrelevant if - for the reasons given above - our concern is with adaptive evolution as opposed to evolutionary change per se. Kimura himself rightly insists* that his 'neutral theory is not antagonistic to the cherished view that evolution of form and function is guided by Darwinian selection'. Further,
the theory does not deny the role of natural selection in determining the course of adaptive evolution, but it assumes that only a minute fraction of DNA changes in evolution are adaptive in nature, while the great majority of phenotypically silent molecular substitutions exert no significant influence on survival and reproduction and drift randomly through the species.
The facts of adaptation compel us to the conclusion that evolutionary trajectories are not all random. There has to be some nonrandom guidance towards adaptive solutions because nonrandom is what adap- tive solutions precisely are. Neither random walk nor random saltation can do the trick on its own. But does the guiding mechanism necessarily have to be the Darwinian one of nonrandom survival of random spontaneous variation? The obvious alternative class of theory postulates some form of nonrandom, i. e. directed, variation.
Nonrandom, in this context, means directed towards adaptation. It
does not mean causeless. Mutations are, of course, caused by physical
events, for instance, cosmic ray bombardment. When we call them
random, we mean only that they are random with respect to adaptive
63
improvement. It could be said, therefore, that, as a matter of logic,
some kind of theory of directed variation is the only alternative to natural selection as an explanation for adaptation. Obviously, combinations of the two kinds of theory are possible.
The theory nowadays attributed to Lamarck is typical of a theory of directed variation. It is normally expressed as two main principles. First, organisms improve during their own lifetime by means of the principle of use and disuse; muscles that are exercised as the animal strives for a
? 'Insists' may be putting it a bit strongly. Now that Professor Kimura is dead, the rather endearing story told by John Maynard Smith can be included. It is true that Kimura's book includes the statement that natural selection must be involved in adaptive evolution but, according to Maynard Smith, Kimura could not bear to write the sentence himself and he asked his friend, the distinguished American geneticist James Crow, to write it for him. The book is M . Kimura, The Neutral Theory of Molecular Evolution (Cambridge, Cambridge University Press, 1983).
88
? particular kind of food enlarge, for instance, and the animal is con- sequently better equipped to procure that food in the future. Second, acquired characteristics - in this case acquired improvements due to use - are inherited so, as the generations go by, the lineage improves. Arguments offered against Lamarckian theories are usually factual. Acquired characteristics are not, as a matter of fact, inherited. The implication, often made explicit, is that if only they were inherited,
64 Lamarckism would be a tenable theory of evolution. Ernst Mayr, for
instance, wrote,
Accepting his premises, Lamarck's theory was as legitimate a theory of adaptation as that of Darwin. Unfortunately, these premises turned out to be invalid.
65
Francis Crick showed an awareness of the possibility that general a
priori arguments might be given, when he wrote,
As far as I know, no one has given general theoretical reasons why such a
mechanism must be less efficient than natural selection.
I have since offered two such reasons, following an argument that the
inheritance of acquired characteristics is in principle incompatible with 66
embryology as we know it.
First, acquired improvements could in principle be inherited only if
embryology were preformationistic rather than epigenetic. Preformationistic embryology is blueprint embryology. The alternative is recipe, or computer-program, embryology. The important point about blueprint embryology is that it is reversible. If you have a house, you can, by following simple rules, reconstruct its blueprint. But if you have a cake, there is no set of simple rules that enables you to reconstruct its recipe.
All living things on this planet grow by recipe embryology, not blue- print embryology. The rules of development work only in the forward direction, like the rules in a recipe or computer program. You cannot, by inspecting an animal, reconstruct its genes. Acquired characteristics are attributes of the animal. In order for them to be inherited, the animal would have to be scanned and its attributes reverse-transcribed into the genes. There may be planets whose animals develop by blue- print embryology. If so, acquired characteristics might there be inherited. This argument says that if you want to find a Lamarckian form of life, don't bother to look on any planet whose life forms develop by epigenesis rather than preformationism. I have an intuitive hunch that there may be a general, a priori argument against preformationistic, blueprint embryology, but I have not developed it yet.
DARWIN TRIUMPHANT
89
? LIGHT WILL BE THROWN
Second, most acquired characteristics are not improvements. There is no general reason why they should be, and use and disuse does not really help here. Indeed, by analogy with wear and tear on machines, we might expect use and disuse to be positively counterproductive. If acquired characteristics were indiscriminately inherited, organisms would be walk- ing museums of ancestral decrepitude, pock-marked from ancestral plagues, limping relics of ancestral misfortune. How is the organism supposed to 'know' how to respond to the environment in such a way as to improve itself? If there is a minority of acquired characteristics that are improvements, the organism would have to have some way of selecting these to pass on to the next generation, avoiding the much more numerous acquired characteristics that are deleterious. Selecting, here, really means that some form of Darwinian process must be smuggled in. Lamarckism cannot work unless it has a Darwinian underpinning.
Third, even if there were some means of choosing which acquired characteristics should be inherited, which discarded at the current generation, the principle of use and disuse is not powerful enough to fashion adaptations as subtle and intricate as we know them to be. A human eye, for instance, works well because of countless pernickety adjustments of detail. Natural selection can fine-tune these adjustments because any improvement, however slight and however deeply buried in internal architecture, can have a direct effect upon survival and reproduction. The principle of use and disuse, on the other hand, is in principle incapable of such fine-tuning. This is because it relies upon the coarse and crude rule that the more an animal uses a bit of itself, the bigger that bit ought to be. Such a rule might tune the blacksmith's arms to his trade, or the giraffe's neck to the tall trees. But it could hardly be responsible for improving the lucidity of a lens or the reaction time of an iris diaphragm. The correlation between use and size is too loose to be responsible for fine-grained adaptation.
I shall refer to these three arguments as the 'Universal Darwinism' arguments. I am confident that they are arguments of the kind that Crick was calling for, although whether he or anyone else accepts these three particular arguments is another matter. If they are correct, the case for Darwinism, in its most general form, is enormously strengthened.
I suspect that other armchair arguments about the nature of life all over the universe, more powerful and watertight than mine, are waiting to be discovered by those better equipped than I am. But I cannot forget that Darwin's own triumph, for all that it could have been launched from any armchair in the universe, was in fact the spin-off of a five-year circumnavigation of this particular planet.
90
? 1
MM ? "%#
The 'Information Challenge'
In September 1997,1 allowed an Australian film crew into my house in Oxford without realizing that their purpose was creationist propaganda. In the course of a suspiciously amateurish interview, they issued a truculent challenge to me to 'give an example of a genetic mutation or an evolutionary process which can be seen to increase the information in the genome'. It is the kind of question only a creationist would ask in that way, and it was at this point I tumbled to the fact that I had been duped into granting an interview to creationists - a thing I normally don't do, for good reasons. * In my anger I refused to discuss the question further, and told them to stop the camera. However, I eventually with- drew my peremptory termination of the interview, because they pleaded with me that they had come all the way from Australia specifically in order to interview me. Even if this was a considerable exaggeration, it seemed, on reflection, ungenerous to tear up the legal release form and throw them out. I therefore relented.
My generosity was rewarded in a fashion that anyone familiar with fundamentalist tactics might have predicted. When I eventually saw the film a year later,f I found that it had been edited to give the false impression that I was incapable of answering the question about information content. ^ In fairness, this may not have been quite as intentionally deceitful as it sounds. You have to understand that these people really believe that their question cannot be answered! Pathetic as it sounds, their entire journey from Australia seems to have been a quest to film an evolutionist failing to answer it.
*See 'Unfinished Correspondence with a Darwinian Heavyweight' (pp. 218-22).
tThe producers never deigned to send me a copy: I completely forgot about it until an American colleague called it to my attention.
JSee Barry Williams, 'Creationist deception exposed', the Skeptic 18 (1998), 3, pp. 7-10, for an account of how my long pause (trying to decide whether to throw them out) was made to look like hesitant inability to answer the question, followed by an apparently evasive answer to a completely different question.
91
? LIGHT WILL BE THROWN
With hindsight - given that I had been suckered into admitting them into my house in the first place - it might have been wiser simply to answer the question. But I like to be understood whenever I open my mouth - 1 have a horror of blinding people with science - and this was not a question that could be answered in a soundbite. First you have to explain the technical meaning of 'information'. Then the relevance to evolution, too, is complicated - not really difficult but it takes time. Rather than engage in further recriminations and disputes about exactly what happened at the time of the interview, I shall try to redress the matter now in constructive fashion by answering the original question, the 'Information Challenge', at adequate length - the sort of length you can achieve in a proper article.
The technical definition of 'information' was introduced by the American engineer Claude Shannon in 1948. An employee of the Bell Telephone Company, Shannon was concerned to measure information as an economic commodity. It is costly to send messages along a telephone line. Much of what passes in a message is not information: it is redundant.
You could save money by recoding the message to remove the redundancy. Redundancy was a second technical term introduced by Shannon, as the inverse of information. Both definitions are mathe- matical, but we can convey Shannon's intuitive meaning in words. * Redundancy is any part of a message that is not informative, either because the recipient already knows it (is not surprised by it) or because it duplicates other parts of the message. In the sentence 'Rover is a poodle dog', the word 'dog' is redundant because 'poodle' already tells us that Rover is a dog. An economical telegram would omit it, thereby increasing the informative proportion of the message. 'Arr JFK Fri pm pis mt BA Cncrd fit' carries the same information as the much longer, but more redundant, 'I'll be arriving at John F Kennedy airport on Friday evening; please meet the British Airways Concorde flight'. Obviously the brief, telegraphic message is cheaper to send (although the recipient may have to work harder to decipher it - redundancy has its virtues if we forget economics). Shannon wanted to find a mathe- matical way to capture the idea that any message could be broken into
*It is important not to blame Shannon for my verbal and intuitive way of expressing what I think of as the essence of his idea. Mathematical readers should go straight to the original, C. Shannon and W. Weaver, The Mathematical Theory of Communication (University of Illinois Press, 1949). Claude Shannon, by the way, had an imaginative sense of humour. He once built a box with a single switch on the outside. If you threw the switch, the lid of the box slowly opened, a mechanical hand appeared, reached down and switched off the box. It then put itself away and the lid closed. As Arthur C. Clarke said, 'There is something unspeakably sinister about a machine that does nothing - absolutely nothing - except switch itself off. '
92
? the information (which is worth paying for), the redundancy (which can, with economic advantage, be deleted from the message because, in effect, it can be reconstructed by the recipient) and the noise (which is just random rubbish).
'It rained in Oxford every day this week' carries relatively little information, because the receiver is not surprised by it. On the other hand, 'It rained in the Sahara desert every day this week' would be a message with high information content, well worth paying extra to send. Shannon wanted to capture this sense of information content as 'surprise value'. It is related to the other sense - 'that which is not duplicated in other parts of the message' - because repetitions lose their power to surprise. Note that Shannon's definition of the quantity of information is independent of whether it is true. The measure he came up with was ingenious and intuitively satisfying. Let's estimate, he suggested, the receiver's ignorance or uncertainty beforereceiving the message, and then compare it with the receiver's remaining ignorance after receiving the message. The quantity of ignorance-reduction is the information content. Shannon's unit of information is the bit, short for 'binary digit'. One bit is defined as the amount of information needed to halve the receiver's prior uncertainty, however great that prior uncer- tainty was (mathematical readers will notice that the bit is, therefore, a logarithmic measure).
In practice, you first have to find a way of measuring the prior uncertainty - that which is reduced by the information when it comes. For particular kinds of simple message, this is easily done in terms of probabilities. An expectant father watches the birth of his child through a window. He can't see any details, so a nurse has agreed to hold up a pink card if it is a girl, blue for a boy. How much information is conveyed when, say, the nurse flourishes the pink card to the delighted father? The answer is one bit - the prior uncertainty is halved. The father knows that a baby of some kind has been born, so his uncertainty amounts to just two possibilities - boy and girl - and they are (for purposes of this discussion) equiprobable. The pink card halves the father's prior uncertainty from two possibilities to one (girl). If there'd been no pink card but a doctor walked out of the room, shook the father's hand and said, 'Congratulations old chap, I'm delighted to be the first to tell you that you have a daughter', the information conveyed by the 17-word message would still be only one bit.
Computer information is held in a sequence of noughts and ones. There are only two possibilities, so each 0 or 1 can hold one bit. The memory capacity of a computer, or the storage capacity of a disk or
THE 'INFORMATION CHALLENGE'
93
? LIGHT WILL BE THROWN
tape, is often measured in bits, and this is the total number of Os or Is that it can hold. For some purposes, more convenient units of measure- ment are the byte (8 bits), the kilobyte (1000 bytes), the megabyte (a million bytes) or the gigabyte (1000 million bytes). * Notice that these figures refer to the total available capacity. This is the maximum quantity of information that the device is capable of storing. The actual amount of information stored is something else. The capacity of my hard disk happens to be 4. 2 gigabytes. Of this, about 1. 4 gigabytes are actually being used to store data at present. But even this is not the true information content of the disk in Shannon's sense. The true information content is smaller, because the information could be more economically stored. You can get some idea of the true information content by using one of those ingenious compression programs like 'Stuffit'. Stuffit looks for redundancy in the sequence of 0s and Is, and removes a hefty proportion of it by recoding - stripping out internal predictability. Maximum information content would be achieved (probably never in practice) only if every 1 or 0 surprised us equally. Before data is transmitted in bulk around the internet, it is routinely compressed to reduce redundancy, t
That's good economics. But on the other hand it is also a good idea to keep some redundancy in messages, to help correct errors. In a message that is totally free of redundancy, after there's been an error there is no means of reconstructing what was intended. Computer codes often incorporate deliberately redundant 'parity bits' to aid in error detection. DNA, too, has various error-correcting procedures which depend upon redundancy. When I come on to talk of genomes, I'll return to the three-way distinction between total information capacity, information capacity actually used, and true information content.
? These round figures are all decimal approximations. In the world of computers, the standard
metric prefixes, 'kilo', 'giga' etc. are borrowed for the nearest convenient power of 2. Thus a
10 20 kilobyte is not 1000 bytes but 2 or 1024 bytes; a megabyte is not a million bytes but 2 or
1,048,576 bytes. If we had evolved with 8 fingers or 16, instead of 10, the computer might have been invented a century earlier. Theoretically, we could now decide to teach all children octal instead of decimal arithmetic. I'd love to give it a go, but realistically I recognize that the immense short-term costs of the transition would outweigh the undoubted long-term benefits of the change. For a start, we'd all have to learn our multiplication tables again from scratch.
tA powerful application of this aspect of information theory is Horace Barlow's idea that sensory systems are wired up to remove massive amounts of redundancy before passing their messages on to the brain. One way they do this is by signalling change in the world (what mathematicians would call differentiating) rather than continuously reporting the current state of the world (which is highly redundant because it doesn't fluctuate rapidly and randomly). I discussed Barlow's idea in Unweaving the Rainbow (London, Penguin, 1998; Boston, Houghton Mifflin, 1998), pp. 257-66.
94
? It was Shannon's insight that information of any kind, no matter what it means, no matter whether it is true or false, and no matter by what physical medium it is carried, can be measured in bits, and is translatable into any other medium of information. The great biologist
J. B. S. Haldane used Shannon's theory to compute the number of bits
of information conveyed by a worker bee to her hivemates when she 'dances' the location of a food source (about 3 bits to tell about the direction of the food and another 3 bits for the distance of the food). In the same units, I recently calculated that I'd need to set aside 120 megabits of laptop computer memory to store the triumphal opening chords of Richard Strauss's Also Sprach Zarathustra (the '2001 theme'), which I wanted to play in the middle of a lecture about evolution. Shannon's economics enable you to calculate how much modem time it'll cost you to email the complete text of a book to a publisher in another land. Fifty years after Shannon, the idea of information as a commodity, as measurable and interconvertible as money or energy, has come into its own.
DNA carries information in a very computer-like way, and we can measure the genome's capacity in bits too, if we wish. DNA doesn't use a binary code, but a quaternary one. Whereas the unit of information in the computer is a 1 or a 0, the unit in DNA can be T, A, C or G. If I tell you that a particular location in a DNA sequence is a T, how much information is conveyed from me to you? Begin by measuring the prior uncertainty. How many possibilities are open before the message T arrives? Four. How many possibilities remain after it has arrived? One. So you might think the information transferred is four bits, but actually it is two. Here's why (assuming that the four letters are equally probable, like the four suits in a pack of cards). Remember that Shannon's metric is concerned with the most economical way of conveying the message. Think of it as the number of yes/no questions that you'd have to ask in order to narrow down to certainty, from an initial uncertainty of four possibilities, assuming that you planned your questions in the most economical way. 'Is the mystery letter before D in the alphabet? '* No. That narrows it down to T or G, and now we need only one more question to clinch it. So, by this method of measuring, each 'letter' of the DNA has an information capacity of 2 bits.
Whenever prior uncertainty of recipient can be expressed as a number of equiprobable alternatives N, the information content of a
*A chemist would more naturally ask, 'Is it a pyrimidine? ', but that sends the wrong signal for my purposes. It is only incidentally true that the four letters of the DNA alphabet fall naturally into two chemical families, purines and pyrimidines.
THE 'INFORMATION CHALLENGE'
95
? LIGHT WILL BE THROWN
message which narrows those alternatives down to one is log2N (the power to which 2 must be raised in order to yield the number of alternatives N). If you pick a card, any card, from a normal pack, a statement of the identity of the card carries log252, or 5. 7 bits of information. In other words, given a large number of guessing games, it would take 5. 7 yes/no questions on average to guess the card, provided the questions are asked in the most economical way. The first two questions might establish the suit (Is it red? Is it a diamond? ); the remaining three or four questions would successively divide and conquer the suit (Is it a 7 or higher? etc. ), finally homing in on the chosen card. When the prior uncertainty is some mixture of alterna- tives that are not equiprobable, Shannon's formula becomes a slightly more elaborate weighted average, but it is essentially similar. By the way, Shannon's weighted average is the same formula as physicists have used, since the nineteenth century, for entropy. The point has interesting implications but I shall not pursue them here. *
That's enough background on information theory. It is a theory which has long held a fascination for me, and I have used it in several of my research papers over the years. Let's now think how we might use it to ask whether the information content of genomes increases in evolution. First, recall the three-way distinction between total information capacity, the capacity that is actually used, and the true information content when stored in the most economical way possible. The total information capacity of the human genome is measured in gigabits. That of the common gut bacterium Escherichia coli is measured in mega- bits. We, like all other animals, are descended from an ancestor which, were it available for our study today, we'd classify as a bacterium. So during the billions of years of evolution since that ancestor lived, the information capacity of our genome has gone up perhaps three orders of magnitude (powers of ten) - about a thousandfold. This is satisfyingly plausible and comforting to human dignity.
Should human dignity feel wounded, then, by the fact that the crested newt, Triturus cristatus, has a genome capacity estimated at 40 gigabits, an order of magnitude larger than the human genome? No, because, in any case, most of the capacity of the genome of any animal is not used to store useful information. There are many nonfunctional pseudogenes (see below) and lots of repetitive nonsense, useful for forensic detectives but not translated into protein in the living cells. The crested newt has a bigger 'hard disk' than we have, but since the
'Ecologists also use the formula as an index of diversity. 96
? great bulk of both our hard disks is unused, we needn't feel insulted. Related species of newt have much smaller genomes. Why the Creator should have played fast and loose with the genome sizes of newts in such a capricious way is a problem that creationists might like to ponder. From an evolutionary point of view the explanation is simple. *
Evidently the total information capacity of genomes is very variable across the living kingdoms, and it must have changed greatly in evolution, presumably in both directions. Losses of genetic material are called deletions. New genes arise through various kinds of duplication. This is well illustrated by haemoglobin, the complex protein molecule that transports oxygen in the blood.
Human adult haemoglobin is actually a composite of four protein chains called globins, knotted around each other. Their detailed sequences show that the four globin chains are closely related to each other, but they are not identical. Two of them are called alpha globins (each a chain of 141 amino acids), and two are beta globins (each a chain of 146 amino acids). The genes coding for the alpha globins are on chromosome 11; those coding for the beta globins are on chromo- some 16. On each of these chromosomes, there is a cluster of globin genes in a row, interspersed with some junk DNA. The alpha cluster, on chromosome 11, contains seven globin genes. Four of these are pseudogenes, versions of alpha disabled by faults in their sequence and not translated into proteins. Two are true alpha globins, used in the adult. The final one is called zeta and is used only in embryos. Similarly the beta cluster, on chromosome 16, has six genes, some of which are disabled, and one of which is used only in the embryo. Adult haemoglobin, as we've seen, contains two alpha and two beta chains.
Never mind all this complexity. Here's the fascinating point. Careful letter-by-letter analysis shows that these different kinds of globin genes are literally cousins of each other, literally members of a family. But these distant cousins still coexist inside our own genome, and that of all vertebrates. On the scale of whole organisms, all vertebrates are our cousins too. The tree of vertebrate evolution is the family tree we are all familiar with, its branch-points representing speciation events - the splitting of species into pairs of daughter species. But there is another family tree occupying the same timescale, whose branches represent not speciation events but gene duplication events within genomes.
The dozen or so different globins inside you are descended from an
*My suggestion (The Selfish Gene, 1976) that surplus DNA is parasitic was later taken up and developed by others under the catch-phrase 'Selfish DNA'. See The Selfish Gene, 2nd edn (Oxford University Press, 1989), pp. 44-5 and 275.
THE 'INFORMATION CHALLENGE'
97
? LIGHT WILL BE THROWN
ancient globin gene which, in a remote ancestor who lived about half a billion years ago, duplicated, after which both copies stayed in the genome. There were then two copies of it, in different parts of the genome of all descendant animals. One copy was destined to give rise to the alpha cluster (on what would eventually become chromosome 11 in our genome), the other to the beta cluster (on chromosome 16). As the aeons passed, there were further duplications (and doubtless some deletions as well). Around 400 million years ago the ancestral alpha gene duplicated again, but this time the two copies remained near neighbours of each other, in a cluster on the same chromosome. One of them was destined to become the zeta used by embryos, the other became the alpha globin genes used by adult humans (other branches gave rise to the nonfunctional pseudogenes I mentioned). It was a similar story along the beta branch of the family, but with duplications at other moments in geological history.
Now here's an equally fascinating point. Given that the split between the alpha cluster and the beta cluster took place 500 million years ago, it will of course not be just our human genomes that show the split - that is, possess alpha genes in a different part of the genome from beta genes. We should see the same within-genome split if we look at any other mammals, at birds, reptiles, amphibians and bony fish, for our common ancestor with all of them lived less than 500 million years ago. Wherever it has been investigated, this expectation has proved correct. Our greatest hope of finding a vertebrate that does not share with us the ancient alpha/beta split would be a jawless fish like a lamprey, for they are our most remote cousins among surviving verte- brates; they are the only surviving vertebrates whose common ancestor with the rest of the vertebrates is sufficiently ancient that it could have predated the alpha/beta split. Sure enough, these jawless fishes are the only known vertebrates that lack the alpha/beta divide.
Gene duplication, within the genome, has a similar historic impact to species duplication ('speciation') in phylogeny. It is responsible for gene diversity, in the same way as speciation is responsible for phyletic diversity. Beginning with a single universal ancestor, the magnificent diversity of life has come about through a series of branchings of new species, which eventually gave rise to the major branches of the living kingdoms and the hundreds of millions of separate species that have graced the Earth. A similar series of branchings, but this time within genomes - gene duplications - has spawned the large and diverse population of clusters of genes that constitutes the modern genome.
The story of the globins is just one among many. Gene duplications
98
? and deletions have occurred from time to time throughout genomes. It is by these, and similar means, that genome sizes can increase in evolution. But remember the distinction between the total capacity of the whole genome, and the capacity of the portion that is actually used. Recall that not all the globin genes are used. Some of them, like theta in the alpha cluster of globin genes, are pseudogenes, recognizably kin to functional genes in the same genomes, but never actually translated into the action language of protein. What is true of globins is true of most other genes. Genomes are littered with nonfunctional pseudo- genes, faulty duplicates of functional genes that do nothing, while their functional cousins (the word doesn't even need scare quotes) get on with their business in a different part of the same genome. And there's lots more DNA that doesn't even deserve the name pseudogene. It too is derived by duplication, but not duplication of functional genes. It consists of multiple copies of junk, 'tandem repeats', and other nonsense which may be useful for forensic detectives but which doesn't seem to be used in the body itself. Once again, creationists might spend some earnest time speculating on why the Creator should bother to litter genomes with untranslated pseudogenes and junk tandem repeat DNA.
Can we measure the information capacity of that portion of the genome which is actually used? We can at least estimate it. In the case of the human genome it is about 2 per cent - considerably less than the proportion of my hard disk that I have used since I bought it. Presumably the equivalent figure for the crested newt is even smaller, but I don't know if it has been measured. In any case, we mustn't run away with a chauvinistic idea that the human genome somehow ought to have the largest DNA database because we are so wonderful. The great evolutionary biologist George C. Williams has pointed out that animals with complicated life cycles need to code for the development of all stages in the life cycle, but they only have one genome with which to do so. A butterfly! s genome has to hold the complete informa- tion needed for building a caterpillar as well as a butterfly.
Movement from one point in the landscape to another is mutation, interpreted in its broadest sense to include large-scale changes in the genetic system as well as point mutations at loci within existing genetic systems. In principle, by a sufficiently contrived piece of genetic engineering - artificial mutation - it is possible to move from any point in the landscape to any other. There exists a recipe for transforming the genome of a human into the genome of a hippo or into the genome of any other animal, actual or conceivable. It would normally be a very large recipe, involving changes to many of the genes, deletion of many genes, duplication of many genes, and radical reorganizations of the genetic system. Nevertheless, the recipe is in principle discoverable, and obeying it can be represented as equivalent to taking a single giant leap from one point to another in our mathematical space. In practice, viable mutations are normally relatively small steps in the landscape: children are only slightly different from their parents even if, in principle, they could be as different as a hippo is from a human. Evolution consists of step-by-step trajectories through the genetic space, not large leaps.
*I find this image, which is modified from the venerable American population geneticist Sewall Wright, a helpful way to think about evolution. I first made use of it in The Blind Watchmaker and gave it two chapters in Climbing Mount Improbable, where I called it a 'museum' of all possible animals. Museum is superficially better than landscape because it is three- dimensional, although actually, of course, we are usually dealing with many more than three dimensions. Daniel Dennett's version, in Darwin's Dangerous Idea, is a library, the vividly named 'Library of Mendel'.
DARWIN TRIUMPHANT
83
? LIGHT WILL BE THROWN
Evolution, in other words, is gradualistic. There is a general reason why this has to be so, a reason that I shall now develop.
Even without formal mathematical treatment, we can make some statistical statements about our landscape. First, in the landscape of all possible genetic combinations and the 'organisms' that they might generate, the proportion of viable organisms to nonviable organisms is very small. 'However many ways there may be of being alive, it is certain
61
that there are vastly more ways of being dead. ' Second, taking any
given starting point in the landscape, however many ways there may be of being slightly different, it is obvious that there are vastly more ways of being very different. The number of near neighbours in the landscape may be large, but it is dwarfed by the number of distant neighbours. As we consider hyperspheres of ever increasing size, the number of progres- sively more distant genetic neighbours that the spheres envelop mounts as a power function and rapidly becomes for practical purposes infinite.
The statistical nature of this argument points up an irony in the claim, frequently made by lay opponents of evolution, that the theory of evolution violates the Second Law of thermodynamics, the law of increasing entropy or chaos* within any closed system. The truth is opposite. If anything appeared to violate the law (nothing really does), it would be the factst, not any particular explanation of those facts! The Darwinian explanation, indeed, is the only viable explanation we have for those facts that shows us how they could have come into being without violating the laws of physics. The law of increasing entropy is, in any case, subject to an interesting misunderstanding, which is worthy of a brief digression because it has helped to foster the mistaken claim that the idea of evolution violates the law.
62
The Second Law originated in the theory of heat engines, but the
form of it that is relevant to the evolutionary argument can be stated in more general statistical terms. Entropy was characterized by the physicist Willard Gibbs as the 'mixed-upness' of a system. The law states that the total entropy of a system and its surroundings will not decrease. Left to itself, without work being contributed from outside, any closed system (life is not a closed system) will tend to become more mixed-up, less orderly. Homely analogies - or they may be more than analogies - abound. If there is not constant work being put in by a librarian, the orderly shelving of books in a library will suffer relentless degradation due to the inevitable if low probability that borrowers will return them
"Chaos here has its original and still colloquial meaning, not the technical meaning which it has recently acquired.
tAbout life's functional complexity or high 'information content'.
84
? to the wrong shelf. We have to import a hard-working librarian into the system from outside, who, Maxwell's-Demon-like, methodically and energetically restores order to the shelves.
The common error to which I referred is to personify the Second Law: to invest the universe with an inner urge or drive towards chaos; a positive striving towards an ultimate nirvana of perfect disorder. It is partly this error that has led people to accept the foolish notion that evolution is a mysterious exception to the law. The error can most simply be exposed by reference to the library analogy. When we say that an unattended library tends to approach chaos as time proceeds, we do not mean that any particular state of the shelves is being approached, as though the library were striving towards a goal from afar. Quite the contrary. The number of possible ways of shelving the N books in a library can be calculated, and for any nontrivial library it is a very, very large number indeed. Of these ways, only one, or a very few, would be recognized by us as a state of order. That is all there is to it. Far from there being any mystical urge towards disorder, it is just that there are vastly more ways of being recognized as disorderly than of being recognized as orderly. So, if a system wanders anywhere in the space of all possible arrangements, it is almost certain - unless special, librarian-like steps are taken - that we shall perceive the change as an increase in disorder. In the present context of evolutionary biology, the particular kind of order that is relevant is adaptation, the state of being equipped to survive and reproduce.
Returning to the general argument in favour of gradualism, to find viable life forms in the space of all possible forms is like searching for a modest number of needles in an extremely large haystack. The chance of happening to land on one of the needles if we take a large random mutational leap to another place in our multidimensional haystack is very small indeed. But one thing we can say is that the starting point of any mutational leap has to be a viable organism - one of the rare and precious needles in the haystack. This is because only organisms good enough to survive to reproductive age can have offspring of any kind, including mutant offspring. Finding a viable body-form by random mutation may be like finding a needle in a haystack, but given that you have already found one viable body-form, it is certain that you can hugely increase your chances of finding another viable one if you search in the immediate neighbourhood rather than more distantly.
The same goes for finding an improved body-form. As we consider mutational leaps of decreasing magnitude, the absolute number of destinations decreases but the proportion of destinations that are
DARWIN TRIUMPHANT
85
? LIGHT WILL BE THROWN
improvements increases. Fisher gave an elegantly simple argument to ] show that this increase tends towards 50 per cent for mutational changes of very small magnitude. * His argument seems inescapable for any single dimension of variation considered on its own. Whether his precise conclusion (50 per cent) generalizes to the multidimensional case I shall not discuss, but the direction of the argument is surely indisputable. The larger the leap through genetic space, the lower is the probability that the resulting change will be viable, let alone an improve- ment. Gradualistic, step-by-step walking in the immediate vicinity of already discovered needles in the haystack seems to be the only way to find other and better needles. Adaptive evolution must in general be a crawl through genetic space, not a series of leaps.
But are there any special occasions when macromutations areI incorporated into evolution? Macromutations certainly occur in the laboratory,t Our theoretical considerations say only that viable macromutations should be exceedingly rare in comparison with viable micromutations. But even if the occasions when major saltations are viable and incorporated into evolution are exceedingly rare, even if they have occurred only once or twice in the whole history of a lineage from Precambrian to present, that is enough to transform the entire course of evolution. I find it plausible, for instance, that the invention of segmentation occurred in a single macromutational leap, once during the history of our own vertebrate ancestors and again once in the ancestry of arthropods and annelids. Once this had happened, in each of these two lineages, it changed the entire climate in which ordinary cumulative selection of micromutations went on. It must have resembled, indeed, a sudden catastrophic change in the external climate. Just as a lineage can, after appalling loss of life, recover and adapt to a catastro- phic change in the external climate, so a lineage might, by subsequent micromutational selection, adapt to the catastrophe of a macromutation as large as the first segmentation.
In the landscape of all possible animals, our segmentation example might look like this. A wild macromutational leap from a perfectly viable parent lands in a remote part of the haystack, far from any needle of viability. The first segmented animal is born: a freak; a monster none of whose detailed bodily features equip it to survive its new, segmented
*He used the analogy of perfecting the focus of a microscope. A very small movement of the objective lens has a 50 per cent chance of being in the right direction (which will improve the focus). A large movement is bound to make things worse (even if it was in the right direction, it will overshoot).
tMacromutations, or saltations, are mutations of large magnitude. A famous example in fruit flies is antennapedia. Mutant flies grow a leg where an antenna should be.
86
;
? architecture. It should die. But by chance the leap in genetic space has coincided with a leap in geographical space. The segmented monster finds itself in a virgin part of the world where the living is easy and competition is light. What can happen when any ordinary animal finds itself in a strange place, a new continent, say, is that, although ill- adapted to the new conditions, it survives by the skin of its teeth. In the competition vacuum, its descendants survive for enough generations to adapt, by normal, cumulative natural selection of micromutations, to the alien conditions. So it might have been with our segmented monster. It survived by the skin of its teeth, and its descendants adapted, by ordinary micromutational cumulative selection, to the radically new conditions imposed by the macromutation. Though the macromutational leap landed far from any needle in the haystack, the competition vacuum enabled the monster's descendants subsequently to inch their way towards the nearest needle. As it turned out, when all the compensating evolution at other genetic loci had been completed, the body plan represented by that nearest needle eventually emerged as superior to the ancestral unsegmented body plan. The new local optimum, into whose vicinity the lineage wildly leapt, eventually turned out superior to the local optimum on which it had previously been trapped.
This is the kind of speculation in which we should indulge only as a last resort. The argument stands that only gradualistic, inch-by-inch walking through the genetic landscape is compatible with the sort of cumulative evolution that can build up complex and detailed adaptation. Even if segmentation, in our example, ended up as a superior body form, it began as a catastrophe that had to be weathered, just like a climatic or volcanic catastrophe in the external environment. It was gradualistic, cumulative selection that engineered the step-by- step recovery from the segmentation catastrophe, just as it engineers recoveries from external climatic catastrophes. Segmentation, according to the speculation I have just given, survived not because natural selection favoured it but because natural selection found compensatory ways of survival in spite of it. The fact that advantages in the segmented body plan eventually emerged is an irrelevant bonus. The segmented body plan was incorporated into evolution, but it may never have been favoured by natural selection.
But in any case gradualism is only a part of core Darwinism. A belief in the ubiquity of gradualistic evolution does not necessarily commit us to Darwinian natural selection as the steering mechanism guiding the search through genetic space. It is highly probable that Motoo Kimura
DARWIN TRIUMPHANT
87
? LIGHT WILL BE THROWN
is right to insist that most of the evolutionary steps taken through genetic space are unsteered steps. To a large extent the trajectory of small, gradualistic steps actually taken may constitute a random walk rather than a walk guided by selection. But this is irrelevant if - for the reasons given above - our concern is with adaptive evolution as opposed to evolutionary change per se. Kimura himself rightly insists* that his 'neutral theory is not antagonistic to the cherished view that evolution of form and function is guided by Darwinian selection'. Further,
the theory does not deny the role of natural selection in determining the course of adaptive evolution, but it assumes that only a minute fraction of DNA changes in evolution are adaptive in nature, while the great majority of phenotypically silent molecular substitutions exert no significant influence on survival and reproduction and drift randomly through the species.
The facts of adaptation compel us to the conclusion that evolutionary trajectories are not all random. There has to be some nonrandom guidance towards adaptive solutions because nonrandom is what adap- tive solutions precisely are. Neither random walk nor random saltation can do the trick on its own. But does the guiding mechanism necessarily have to be the Darwinian one of nonrandom survival of random spontaneous variation? The obvious alternative class of theory postulates some form of nonrandom, i. e. directed, variation.
Nonrandom, in this context, means directed towards adaptation. It
does not mean causeless. Mutations are, of course, caused by physical
events, for instance, cosmic ray bombardment. When we call them
random, we mean only that they are random with respect to adaptive
63
improvement. It could be said, therefore, that, as a matter of logic,
some kind of theory of directed variation is the only alternative to natural selection as an explanation for adaptation. Obviously, combinations of the two kinds of theory are possible.
The theory nowadays attributed to Lamarck is typical of a theory of directed variation. It is normally expressed as two main principles. First, organisms improve during their own lifetime by means of the principle of use and disuse; muscles that are exercised as the animal strives for a
? 'Insists' may be putting it a bit strongly. Now that Professor Kimura is dead, the rather endearing story told by John Maynard Smith can be included. It is true that Kimura's book includes the statement that natural selection must be involved in adaptive evolution but, according to Maynard Smith, Kimura could not bear to write the sentence himself and he asked his friend, the distinguished American geneticist James Crow, to write it for him. The book is M . Kimura, The Neutral Theory of Molecular Evolution (Cambridge, Cambridge University Press, 1983).
88
? particular kind of food enlarge, for instance, and the animal is con- sequently better equipped to procure that food in the future. Second, acquired characteristics - in this case acquired improvements due to use - are inherited so, as the generations go by, the lineage improves. Arguments offered against Lamarckian theories are usually factual. Acquired characteristics are not, as a matter of fact, inherited. The implication, often made explicit, is that if only they were inherited,
64 Lamarckism would be a tenable theory of evolution. Ernst Mayr, for
instance, wrote,
Accepting his premises, Lamarck's theory was as legitimate a theory of adaptation as that of Darwin. Unfortunately, these premises turned out to be invalid.
65
Francis Crick showed an awareness of the possibility that general a
priori arguments might be given, when he wrote,
As far as I know, no one has given general theoretical reasons why such a
mechanism must be less efficient than natural selection.
I have since offered two such reasons, following an argument that the
inheritance of acquired characteristics is in principle incompatible with 66
embryology as we know it.
First, acquired improvements could in principle be inherited only if
embryology were preformationistic rather than epigenetic. Preformationistic embryology is blueprint embryology. The alternative is recipe, or computer-program, embryology. The important point about blueprint embryology is that it is reversible. If you have a house, you can, by following simple rules, reconstruct its blueprint. But if you have a cake, there is no set of simple rules that enables you to reconstruct its recipe.
All living things on this planet grow by recipe embryology, not blue- print embryology. The rules of development work only in the forward direction, like the rules in a recipe or computer program. You cannot, by inspecting an animal, reconstruct its genes. Acquired characteristics are attributes of the animal. In order for them to be inherited, the animal would have to be scanned and its attributes reverse-transcribed into the genes. There may be planets whose animals develop by blue- print embryology. If so, acquired characteristics might there be inherited. This argument says that if you want to find a Lamarckian form of life, don't bother to look on any planet whose life forms develop by epigenesis rather than preformationism. I have an intuitive hunch that there may be a general, a priori argument against preformationistic, blueprint embryology, but I have not developed it yet.
DARWIN TRIUMPHANT
89
? LIGHT WILL BE THROWN
Second, most acquired characteristics are not improvements. There is no general reason why they should be, and use and disuse does not really help here. Indeed, by analogy with wear and tear on machines, we might expect use and disuse to be positively counterproductive. If acquired characteristics were indiscriminately inherited, organisms would be walk- ing museums of ancestral decrepitude, pock-marked from ancestral plagues, limping relics of ancestral misfortune. How is the organism supposed to 'know' how to respond to the environment in such a way as to improve itself? If there is a minority of acquired characteristics that are improvements, the organism would have to have some way of selecting these to pass on to the next generation, avoiding the much more numerous acquired characteristics that are deleterious. Selecting, here, really means that some form of Darwinian process must be smuggled in. Lamarckism cannot work unless it has a Darwinian underpinning.
Third, even if there were some means of choosing which acquired characteristics should be inherited, which discarded at the current generation, the principle of use and disuse is not powerful enough to fashion adaptations as subtle and intricate as we know them to be. A human eye, for instance, works well because of countless pernickety adjustments of detail. Natural selection can fine-tune these adjustments because any improvement, however slight and however deeply buried in internal architecture, can have a direct effect upon survival and reproduction. The principle of use and disuse, on the other hand, is in principle incapable of such fine-tuning. This is because it relies upon the coarse and crude rule that the more an animal uses a bit of itself, the bigger that bit ought to be. Such a rule might tune the blacksmith's arms to his trade, or the giraffe's neck to the tall trees. But it could hardly be responsible for improving the lucidity of a lens or the reaction time of an iris diaphragm. The correlation between use and size is too loose to be responsible for fine-grained adaptation.
I shall refer to these three arguments as the 'Universal Darwinism' arguments. I am confident that they are arguments of the kind that Crick was calling for, although whether he or anyone else accepts these three particular arguments is another matter. If they are correct, the case for Darwinism, in its most general form, is enormously strengthened.
I suspect that other armchair arguments about the nature of life all over the universe, more powerful and watertight than mine, are waiting to be discovered by those better equipped than I am. But I cannot forget that Darwin's own triumph, for all that it could have been launched from any armchair in the universe, was in fact the spin-off of a five-year circumnavigation of this particular planet.
90
? 1
MM ? "%#
The 'Information Challenge'
In September 1997,1 allowed an Australian film crew into my house in Oxford without realizing that their purpose was creationist propaganda. In the course of a suspiciously amateurish interview, they issued a truculent challenge to me to 'give an example of a genetic mutation or an evolutionary process which can be seen to increase the information in the genome'. It is the kind of question only a creationist would ask in that way, and it was at this point I tumbled to the fact that I had been duped into granting an interview to creationists - a thing I normally don't do, for good reasons. * In my anger I refused to discuss the question further, and told them to stop the camera. However, I eventually with- drew my peremptory termination of the interview, because they pleaded with me that they had come all the way from Australia specifically in order to interview me. Even if this was a considerable exaggeration, it seemed, on reflection, ungenerous to tear up the legal release form and throw them out. I therefore relented.
My generosity was rewarded in a fashion that anyone familiar with fundamentalist tactics might have predicted. When I eventually saw the film a year later,f I found that it had been edited to give the false impression that I was incapable of answering the question about information content. ^ In fairness, this may not have been quite as intentionally deceitful as it sounds. You have to understand that these people really believe that their question cannot be answered! Pathetic as it sounds, their entire journey from Australia seems to have been a quest to film an evolutionist failing to answer it.
*See 'Unfinished Correspondence with a Darwinian Heavyweight' (pp. 218-22).
tThe producers never deigned to send me a copy: I completely forgot about it until an American colleague called it to my attention.
JSee Barry Williams, 'Creationist deception exposed', the Skeptic 18 (1998), 3, pp. 7-10, for an account of how my long pause (trying to decide whether to throw them out) was made to look like hesitant inability to answer the question, followed by an apparently evasive answer to a completely different question.
91
? LIGHT WILL BE THROWN
With hindsight - given that I had been suckered into admitting them into my house in the first place - it might have been wiser simply to answer the question. But I like to be understood whenever I open my mouth - 1 have a horror of blinding people with science - and this was not a question that could be answered in a soundbite. First you have to explain the technical meaning of 'information'. Then the relevance to evolution, too, is complicated - not really difficult but it takes time. Rather than engage in further recriminations and disputes about exactly what happened at the time of the interview, I shall try to redress the matter now in constructive fashion by answering the original question, the 'Information Challenge', at adequate length - the sort of length you can achieve in a proper article.
The technical definition of 'information' was introduced by the American engineer Claude Shannon in 1948. An employee of the Bell Telephone Company, Shannon was concerned to measure information as an economic commodity. It is costly to send messages along a telephone line. Much of what passes in a message is not information: it is redundant.
You could save money by recoding the message to remove the redundancy. Redundancy was a second technical term introduced by Shannon, as the inverse of information. Both definitions are mathe- matical, but we can convey Shannon's intuitive meaning in words. * Redundancy is any part of a message that is not informative, either because the recipient already knows it (is not surprised by it) or because it duplicates other parts of the message. In the sentence 'Rover is a poodle dog', the word 'dog' is redundant because 'poodle' already tells us that Rover is a dog. An economical telegram would omit it, thereby increasing the informative proportion of the message. 'Arr JFK Fri pm pis mt BA Cncrd fit' carries the same information as the much longer, but more redundant, 'I'll be arriving at John F Kennedy airport on Friday evening; please meet the British Airways Concorde flight'. Obviously the brief, telegraphic message is cheaper to send (although the recipient may have to work harder to decipher it - redundancy has its virtues if we forget economics). Shannon wanted to find a mathe- matical way to capture the idea that any message could be broken into
*It is important not to blame Shannon for my verbal and intuitive way of expressing what I think of as the essence of his idea. Mathematical readers should go straight to the original, C. Shannon and W. Weaver, The Mathematical Theory of Communication (University of Illinois Press, 1949). Claude Shannon, by the way, had an imaginative sense of humour. He once built a box with a single switch on the outside. If you threw the switch, the lid of the box slowly opened, a mechanical hand appeared, reached down and switched off the box. It then put itself away and the lid closed. As Arthur C. Clarke said, 'There is something unspeakably sinister about a machine that does nothing - absolutely nothing - except switch itself off. '
92
? the information (which is worth paying for), the redundancy (which can, with economic advantage, be deleted from the message because, in effect, it can be reconstructed by the recipient) and the noise (which is just random rubbish).
'It rained in Oxford every day this week' carries relatively little information, because the receiver is not surprised by it. On the other hand, 'It rained in the Sahara desert every day this week' would be a message with high information content, well worth paying extra to send. Shannon wanted to capture this sense of information content as 'surprise value'. It is related to the other sense - 'that which is not duplicated in other parts of the message' - because repetitions lose their power to surprise. Note that Shannon's definition of the quantity of information is independent of whether it is true. The measure he came up with was ingenious and intuitively satisfying. Let's estimate, he suggested, the receiver's ignorance or uncertainty beforereceiving the message, and then compare it with the receiver's remaining ignorance after receiving the message. The quantity of ignorance-reduction is the information content. Shannon's unit of information is the bit, short for 'binary digit'. One bit is defined as the amount of information needed to halve the receiver's prior uncertainty, however great that prior uncer- tainty was (mathematical readers will notice that the bit is, therefore, a logarithmic measure).
In practice, you first have to find a way of measuring the prior uncertainty - that which is reduced by the information when it comes. For particular kinds of simple message, this is easily done in terms of probabilities. An expectant father watches the birth of his child through a window. He can't see any details, so a nurse has agreed to hold up a pink card if it is a girl, blue for a boy. How much information is conveyed when, say, the nurse flourishes the pink card to the delighted father? The answer is one bit - the prior uncertainty is halved. The father knows that a baby of some kind has been born, so his uncertainty amounts to just two possibilities - boy and girl - and they are (for purposes of this discussion) equiprobable. The pink card halves the father's prior uncertainty from two possibilities to one (girl). If there'd been no pink card but a doctor walked out of the room, shook the father's hand and said, 'Congratulations old chap, I'm delighted to be the first to tell you that you have a daughter', the information conveyed by the 17-word message would still be only one bit.
Computer information is held in a sequence of noughts and ones. There are only two possibilities, so each 0 or 1 can hold one bit. The memory capacity of a computer, or the storage capacity of a disk or
THE 'INFORMATION CHALLENGE'
93
? LIGHT WILL BE THROWN
tape, is often measured in bits, and this is the total number of Os or Is that it can hold. For some purposes, more convenient units of measure- ment are the byte (8 bits), the kilobyte (1000 bytes), the megabyte (a million bytes) or the gigabyte (1000 million bytes). * Notice that these figures refer to the total available capacity. This is the maximum quantity of information that the device is capable of storing. The actual amount of information stored is something else. The capacity of my hard disk happens to be 4. 2 gigabytes. Of this, about 1. 4 gigabytes are actually being used to store data at present. But even this is not the true information content of the disk in Shannon's sense. The true information content is smaller, because the information could be more economically stored. You can get some idea of the true information content by using one of those ingenious compression programs like 'Stuffit'. Stuffit looks for redundancy in the sequence of 0s and Is, and removes a hefty proportion of it by recoding - stripping out internal predictability. Maximum information content would be achieved (probably never in practice) only if every 1 or 0 surprised us equally. Before data is transmitted in bulk around the internet, it is routinely compressed to reduce redundancy, t
That's good economics. But on the other hand it is also a good idea to keep some redundancy in messages, to help correct errors. In a message that is totally free of redundancy, after there's been an error there is no means of reconstructing what was intended. Computer codes often incorporate deliberately redundant 'parity bits' to aid in error detection. DNA, too, has various error-correcting procedures which depend upon redundancy. When I come on to talk of genomes, I'll return to the three-way distinction between total information capacity, information capacity actually used, and true information content.
? These round figures are all decimal approximations. In the world of computers, the standard
metric prefixes, 'kilo', 'giga' etc. are borrowed for the nearest convenient power of 2. Thus a
10 20 kilobyte is not 1000 bytes but 2 or 1024 bytes; a megabyte is not a million bytes but 2 or
1,048,576 bytes. If we had evolved with 8 fingers or 16, instead of 10, the computer might have been invented a century earlier. Theoretically, we could now decide to teach all children octal instead of decimal arithmetic. I'd love to give it a go, but realistically I recognize that the immense short-term costs of the transition would outweigh the undoubted long-term benefits of the change. For a start, we'd all have to learn our multiplication tables again from scratch.
tA powerful application of this aspect of information theory is Horace Barlow's idea that sensory systems are wired up to remove massive amounts of redundancy before passing their messages on to the brain. One way they do this is by signalling change in the world (what mathematicians would call differentiating) rather than continuously reporting the current state of the world (which is highly redundant because it doesn't fluctuate rapidly and randomly). I discussed Barlow's idea in Unweaving the Rainbow (London, Penguin, 1998; Boston, Houghton Mifflin, 1998), pp. 257-66.
94
? It was Shannon's insight that information of any kind, no matter what it means, no matter whether it is true or false, and no matter by what physical medium it is carried, can be measured in bits, and is translatable into any other medium of information. The great biologist
J. B. S. Haldane used Shannon's theory to compute the number of bits
of information conveyed by a worker bee to her hivemates when she 'dances' the location of a food source (about 3 bits to tell about the direction of the food and another 3 bits for the distance of the food). In the same units, I recently calculated that I'd need to set aside 120 megabits of laptop computer memory to store the triumphal opening chords of Richard Strauss's Also Sprach Zarathustra (the '2001 theme'), which I wanted to play in the middle of a lecture about evolution. Shannon's economics enable you to calculate how much modem time it'll cost you to email the complete text of a book to a publisher in another land. Fifty years after Shannon, the idea of information as a commodity, as measurable and interconvertible as money or energy, has come into its own.
DNA carries information in a very computer-like way, and we can measure the genome's capacity in bits too, if we wish. DNA doesn't use a binary code, but a quaternary one. Whereas the unit of information in the computer is a 1 or a 0, the unit in DNA can be T, A, C or G. If I tell you that a particular location in a DNA sequence is a T, how much information is conveyed from me to you? Begin by measuring the prior uncertainty. How many possibilities are open before the message T arrives? Four. How many possibilities remain after it has arrived? One. So you might think the information transferred is four bits, but actually it is two. Here's why (assuming that the four letters are equally probable, like the four suits in a pack of cards). Remember that Shannon's metric is concerned with the most economical way of conveying the message. Think of it as the number of yes/no questions that you'd have to ask in order to narrow down to certainty, from an initial uncertainty of four possibilities, assuming that you planned your questions in the most economical way. 'Is the mystery letter before D in the alphabet? '* No. That narrows it down to T or G, and now we need only one more question to clinch it. So, by this method of measuring, each 'letter' of the DNA has an information capacity of 2 bits.
Whenever prior uncertainty of recipient can be expressed as a number of equiprobable alternatives N, the information content of a
*A chemist would more naturally ask, 'Is it a pyrimidine? ', but that sends the wrong signal for my purposes. It is only incidentally true that the four letters of the DNA alphabet fall naturally into two chemical families, purines and pyrimidines.
THE 'INFORMATION CHALLENGE'
95
? LIGHT WILL BE THROWN
message which narrows those alternatives down to one is log2N (the power to which 2 must be raised in order to yield the number of alternatives N). If you pick a card, any card, from a normal pack, a statement of the identity of the card carries log252, or 5. 7 bits of information. In other words, given a large number of guessing games, it would take 5. 7 yes/no questions on average to guess the card, provided the questions are asked in the most economical way. The first two questions might establish the suit (Is it red? Is it a diamond? ); the remaining three or four questions would successively divide and conquer the suit (Is it a 7 or higher? etc. ), finally homing in on the chosen card. When the prior uncertainty is some mixture of alterna- tives that are not equiprobable, Shannon's formula becomes a slightly more elaborate weighted average, but it is essentially similar. By the way, Shannon's weighted average is the same formula as physicists have used, since the nineteenth century, for entropy. The point has interesting implications but I shall not pursue them here. *
That's enough background on information theory. It is a theory which has long held a fascination for me, and I have used it in several of my research papers over the years. Let's now think how we might use it to ask whether the information content of genomes increases in evolution. First, recall the three-way distinction between total information capacity, the capacity that is actually used, and the true information content when stored in the most economical way possible. The total information capacity of the human genome is measured in gigabits. That of the common gut bacterium Escherichia coli is measured in mega- bits. We, like all other animals, are descended from an ancestor which, were it available for our study today, we'd classify as a bacterium. So during the billions of years of evolution since that ancestor lived, the information capacity of our genome has gone up perhaps three orders of magnitude (powers of ten) - about a thousandfold. This is satisfyingly plausible and comforting to human dignity.
Should human dignity feel wounded, then, by the fact that the crested newt, Triturus cristatus, has a genome capacity estimated at 40 gigabits, an order of magnitude larger than the human genome? No, because, in any case, most of the capacity of the genome of any animal is not used to store useful information. There are many nonfunctional pseudogenes (see below) and lots of repetitive nonsense, useful for forensic detectives but not translated into protein in the living cells. The crested newt has a bigger 'hard disk' than we have, but since the
'Ecologists also use the formula as an index of diversity. 96
? great bulk of both our hard disks is unused, we needn't feel insulted. Related species of newt have much smaller genomes. Why the Creator should have played fast and loose with the genome sizes of newts in such a capricious way is a problem that creationists might like to ponder. From an evolutionary point of view the explanation is simple. *
Evidently the total information capacity of genomes is very variable across the living kingdoms, and it must have changed greatly in evolution, presumably in both directions. Losses of genetic material are called deletions. New genes arise through various kinds of duplication. This is well illustrated by haemoglobin, the complex protein molecule that transports oxygen in the blood.
Human adult haemoglobin is actually a composite of four protein chains called globins, knotted around each other. Their detailed sequences show that the four globin chains are closely related to each other, but they are not identical. Two of them are called alpha globins (each a chain of 141 amino acids), and two are beta globins (each a chain of 146 amino acids). The genes coding for the alpha globins are on chromosome 11; those coding for the beta globins are on chromo- some 16. On each of these chromosomes, there is a cluster of globin genes in a row, interspersed with some junk DNA. The alpha cluster, on chromosome 11, contains seven globin genes. Four of these are pseudogenes, versions of alpha disabled by faults in their sequence and not translated into proteins. Two are true alpha globins, used in the adult. The final one is called zeta and is used only in embryos. Similarly the beta cluster, on chromosome 16, has six genes, some of which are disabled, and one of which is used only in the embryo. Adult haemoglobin, as we've seen, contains two alpha and two beta chains.
Never mind all this complexity. Here's the fascinating point. Careful letter-by-letter analysis shows that these different kinds of globin genes are literally cousins of each other, literally members of a family. But these distant cousins still coexist inside our own genome, and that of all vertebrates. On the scale of whole organisms, all vertebrates are our cousins too. The tree of vertebrate evolution is the family tree we are all familiar with, its branch-points representing speciation events - the splitting of species into pairs of daughter species. But there is another family tree occupying the same timescale, whose branches represent not speciation events but gene duplication events within genomes.
The dozen or so different globins inside you are descended from an
*My suggestion (The Selfish Gene, 1976) that surplus DNA is parasitic was later taken up and developed by others under the catch-phrase 'Selfish DNA'. See The Selfish Gene, 2nd edn (Oxford University Press, 1989), pp. 44-5 and 275.
THE 'INFORMATION CHALLENGE'
97
? LIGHT WILL BE THROWN
ancient globin gene which, in a remote ancestor who lived about half a billion years ago, duplicated, after which both copies stayed in the genome. There were then two copies of it, in different parts of the genome of all descendant animals. One copy was destined to give rise to the alpha cluster (on what would eventually become chromosome 11 in our genome), the other to the beta cluster (on chromosome 16). As the aeons passed, there were further duplications (and doubtless some deletions as well). Around 400 million years ago the ancestral alpha gene duplicated again, but this time the two copies remained near neighbours of each other, in a cluster on the same chromosome. One of them was destined to become the zeta used by embryos, the other became the alpha globin genes used by adult humans (other branches gave rise to the nonfunctional pseudogenes I mentioned). It was a similar story along the beta branch of the family, but with duplications at other moments in geological history.
Now here's an equally fascinating point. Given that the split between the alpha cluster and the beta cluster took place 500 million years ago, it will of course not be just our human genomes that show the split - that is, possess alpha genes in a different part of the genome from beta genes. We should see the same within-genome split if we look at any other mammals, at birds, reptiles, amphibians and bony fish, for our common ancestor with all of them lived less than 500 million years ago. Wherever it has been investigated, this expectation has proved correct. Our greatest hope of finding a vertebrate that does not share with us the ancient alpha/beta split would be a jawless fish like a lamprey, for they are our most remote cousins among surviving verte- brates; they are the only surviving vertebrates whose common ancestor with the rest of the vertebrates is sufficiently ancient that it could have predated the alpha/beta split. Sure enough, these jawless fishes are the only known vertebrates that lack the alpha/beta divide.
Gene duplication, within the genome, has a similar historic impact to species duplication ('speciation') in phylogeny. It is responsible for gene diversity, in the same way as speciation is responsible for phyletic diversity. Beginning with a single universal ancestor, the magnificent diversity of life has come about through a series of branchings of new species, which eventually gave rise to the major branches of the living kingdoms and the hundreds of millions of separate species that have graced the Earth. A similar series of branchings, but this time within genomes - gene duplications - has spawned the large and diverse population of clusters of genes that constitutes the modern genome.
The story of the globins is just one among many. Gene duplications
98
? and deletions have occurred from time to time throughout genomes. It is by these, and similar means, that genome sizes can increase in evolution. But remember the distinction between the total capacity of the whole genome, and the capacity of the portion that is actually used. Recall that not all the globin genes are used. Some of them, like theta in the alpha cluster of globin genes, are pseudogenes, recognizably kin to functional genes in the same genomes, but never actually translated into the action language of protein. What is true of globins is true of most other genes. Genomes are littered with nonfunctional pseudo- genes, faulty duplicates of functional genes that do nothing, while their functional cousins (the word doesn't even need scare quotes) get on with their business in a different part of the same genome. And there's lots more DNA that doesn't even deserve the name pseudogene. It too is derived by duplication, but not duplication of functional genes. It consists of multiple copies of junk, 'tandem repeats', and other nonsense which may be useful for forensic detectives but which doesn't seem to be used in the body itself. Once again, creationists might spend some earnest time speculating on why the Creator should bother to litter genomes with untranslated pseudogenes and junk tandem repeat DNA.
Can we measure the information capacity of that portion of the genome which is actually used? We can at least estimate it. In the case of the human genome it is about 2 per cent - considerably less than the proportion of my hard disk that I have used since I bought it. Presumably the equivalent figure for the crested newt is even smaller, but I don't know if it has been measured. In any case, we mustn't run away with a chauvinistic idea that the human genome somehow ought to have the largest DNA database because we are so wonderful. The great evolutionary biologist George C. Williams has pointed out that animals with complicated life cycles need to code for the development of all stages in the life cycle, but they only have one genome with which to do so. A butterfly! s genome has to hold the complete informa- tion needed for building a caterpillar as well as a butterfly.
