They are no more aware of what a p-value means than you are aware of the equation for a parabolic
trajectory
when you catch a cricket ball or baseball in the outfield.
Richard-Dawkins-Unweaving-the-Rainbow
Some talk to the machine.
Others make funny signs to it with their fingers, or stroke it or "pat it with their hands.
They once patted it and won the jackpot and they've never forgotten it.
I have watched computer addicts, impatient for a server to respond, behaving in a similar way, say, knocking the terminal with their knuckles.
My informant about Las Vegas has also made an informal study of London betting shops. She reports that one particular gambler habitually runs, after placing his bet, to a certain tile in the floor, where he stands on one leg while watching the race on the bookmaker's television. Presumably he once won while standing on this tile and conceived the notion that there was a causal link. Now, if somebody else stands on 'his' lucky tile (some other sportsmen do this deliberately, perhaps to try to hijack some of his 'luck' or just to annoy him) he dances around it, desperately trying to get a foot on the tile before the race ends. Other gamblers refuse to change their shirt, or to cut their hair, while they are 'on a lucky streak'. In contrast one Irish punter, who had a fine head of hair, shaved it completely bald in a desperate effort to change his luck. His hypothesis was that he was having rotten luck on the horses and he had lots of hair. Perhaps the two were connected somehow; perhaps these facts were all part of a meaningful pattern! Before we feel too superior, let us remember that large numbers of us were brought up to believe that Samson's fortunes changed utterly after Delilah cut off his hair.
How can we tell which apparent patterns are genuine, which random and meaningless? Methods exist, and they belong in the science of statistics and experimental design. I want to spend a little more time explaining a few of the principles, though not the details, of statistics. Statistics can largely be seen as the art of distinguishing pattern from randomness. Randomness means lack of pattern. There are various ways of explaining the ideas of randomness and pattern. Suppose I claim that I can tell girls' handwriting from boys'. If I am right, this would have to mean that there is a real pattern relating sex to handwriting. A sceptic might doubt this, agreeing that handwriting varies from person to person but denying that there is a sex-related pattern to this variation. How should you decide whether my claim, or the sceptic's, is right? It is no use just accepting my word for it. Like a superstitious Las Vegas gambler, I could easily have mistaken a lucky streak for a real, repeatable skill. In any case, you have every right to demand evidence. What evidence should satisfy you? The answer is evidence that is publicly recorded, and properly analysed.
The claim is, in any case, only a statistical claim. I do not maintain (in this hypothetical example - in reality I am not claiming anything) that I can infallibly judge the sex of the author of a given piece of handwriting. I claim only that among the great variation that exists among handwriting,
some component of that variation correlates with sex. Therefore, even though I shall often make mistakes, if you give me, say, 100 samples of handwriting I should be able to sort them into boys and girls more accurately than could be achieved purely by guessing at random. It follows that, in order to assess any claim, you are going to have to calculate how likely it is that a given result could have been achieved by guessing at random. Once again, we have an exercise in calculating the odds of coincidence.
Before we get to the statistics, there are some precautions you need to take in designing the experiment. The pattern - the non-randomness we seek - is a pattern relating sex to handwriting. It is important not to confound the issue with extraneous variables. The handwriting samples that you give me should not, for instance, be personal letters. It would be too easy for me to guess the sex of the writer from the content of the letter rather than from the handwriting. Don't choose all the girls from one school and all the boys from another. The pupils from one school might share aspects of their handwriting, learning either from each other or from a teacher. These could result in real differences in handwriting, and they might even be interesting, but they could be representative of different schools, and only incidentally of different sexes. And don't ask the children to write out a passage from a favourite book. I should be influenced by a choice of Black Beauty or Biggies (readers whose childhood culture is different from mine will substitute examples of their own).
Obviously, it is important that the children should all be strangers to me, otherwise I'd recognize their individual writing and hence know their sex. When you hand me the papers they must not have the children's names on them, but you must have some means of keeping track of whose is which. Put secret codes on them for your own benefit, but be careful how you choose the codes. Don't put a green mark on the boys' papers and a yellow mark on the girls'. Admittedly, I won't know which is which, but I'll guess that yellow denotes one sex and green the other, and that would be a big help. It would be a good idea to give every paper a code number. But don't give the boys the numbers 1 to 10 and the girls 11 to 20; that would be just like the yellow and green marks all over again. So would giving the boys odd numbers and the girls even. Instead, give the papers random numbers and keep the crib list locked up where I cannot find it. These precautions are those named 'double blind' in the literature of medical trials.
Let's assume that all the proper double blind precautions have been taken, and that you have assembled 20 anonymous samples of handwriting, shuffled into random order. I go through the papers, sorting them into two piles for suspected boys and suspected girls. I may have
some 'don't knows', but let's assume that you compel me to make the best guess I can in such cases. At the end of the experiment I have made two piles and you look through to see how accurate I have been.
Now the statistics. You'd expect me to guess right quite often even if I was guessing purely at random. But how often? If my claim to be able to sex handwriting is unjustified, my guessing rate should be no better than somebody tossing a coin. The question is whether my actual performance is sufficiently different from a coin-tosser's to be impressive. Here is how to set about answering the question.
Think about all possible ways in which I could have guessed the sex of the 20 writers. List them in order of impressiveness, beginning with all 20 correct and going down to completely random (all 20 exactly wrong is nearly as impressive as all 20 exactly right, because it shows that I can discriminate, even though I perversely reverse the sign). Then look at the actual way I sorted them and count up the percentage of all possible sortings that would have been as impressive as the actual one, or more. Here's how to think about all possible sortings. First, note that there is only one way of being 100 per cent right, and one way of being 100 per cent wrong, but there are lots of ways of being 50 per cent right. One could be right on the first paper, wrong on the second, wrong on the third, right on the fourth . . , There are somewhat fewer ways of being 60 per cent right. Fewer ways still of being 70 per cent right, and so on. The number of ways of making a single mistake is sufficiently few that we can write them all down. There were 20 scripts. The mistake could have been made on the first one, or on the second one, or on the third one . . . or on the twentieth one. That is, there are exactly 20 ways of making a single mistake. It is more tedious to write down all the ways of making two mistakes, but we can calculate how many ways there are, easily enough, and it comes to 190. It is harder still to count the ways of making three mistakes, but you can see that it could be done. And so on.
Suppose, in this hypothetical experiment, two mistakes is actually what I did make. We want to know how good my score was, on a spectrum of all possible ways of guessing. What we need to know is how many possible ways of choosing are as good as, or better than, my score. The number as good as my score is 190. The number better than my score is 20 (one mistake) plus 1 (no mistakes). So, the total number as good as or better than my score is 211. It is important to add in the ways of scoring better than my actual score because they properly belong in the petwhac, along with the 190 ways of scoring exactly as well as I did.
We have to set 211 against the total number of ways in which the 20 scripts could have been classified by penny-tossers. This is not difficult to calculate. The first script could have been boy or girl; that is two
possibilities. The second script also could have been boy or girl. So, for each of the two possibilities for the first script, there were two possibilities for the second. That is 2 x 2 = 4 possibilities for the first two scripts. The possibilities for the first three scripts are 2 x 2 x 2 = 8. And the possible ways of classifying all 20 scripts are 2 x 2 x 2 . . . 2. 0 times, or 2 to the power 20. This is a pretty big number, 1,048,576.
So, of all possible ways of guessing, the proportion of ways that are as good as or better than my actual score is 211 divided by 1,048,576, which is approximately 0. 0003, or 0. 02 per cent. To put it another way, if 10,000 people sorted the scripts entirely by tossing pennies, you'd expect only two of them to score as well as I actually did. This means that my score is pretty impressive and, if I performed as well as this, it would be strong evidence that boys and girls differ systematically in their handwriting. Let me repeat that this is all hypothetical. As far as I know, I have no such ability to sex handwriting. I should also add that, even if there was good evidence for a sex difference in handwriting, this would say nothing about whether the difference is innate or learned. The evidence, at least if it came from the kind of experiment just described, would be equally compatible with the idea that girls are systematically taught a different handwriting from boys - perhaps a more 'ladylike' and less 'assertive' fist.
We have just performed what is technically called a test of statistical significance. We reasoned from first principles, which made it rather tedious. In practice, research workers can call upon tables of probabilities and distributions that have been previously calculated. We therefore don't literally have to write down all possible ways in which things could have happened. But the underlying theory, the basis upon which the tables were calculated, depends, in essence, upon the same fundamental procedure. Take the events that could have been obtained and throw them down repeatedly at random. Look at the actual way the events occurred and measure how extreme it is, on the spectrum of all possible ways in which they could have been thrown down.
Notice that a test of statistical significance does not prove anything conclusively. It can't rule out luck as the generator of the result that we observe. The best it can do is place the observed result on a par with a specified amount of luck. In our particular hypothetical example, it was on a par with two out of 10,000 random guessers. When we say that an effect is statistically significant, we must always specify a so-called p- value. This is the probability that a purely random process would have generated a result at least as impressive as the actual result. A p-value of 2 in 10,000 is pretty impressive, but it is still possible that there is no genuine pattern there. The beauty of doing a proper statistical test is that we know how probable it is that there is no genuine pattern there.
Conventionally, scientists allow themselves to be swayed by p-values of 1 in 100, or even as high as 1 in 20: far less impressive than 2, in 10,000. What p-value you accept depends upon how important the result is, and upon what decisions might follow from it. If all you are trying to decide is whether it is worth repeating the experiment with a larger sample, a p- value of 0. 05, or 1 in 20, is quite acceptable. Even though there is a 1 in 20 chance that your interesting result would have happened anyway by chance, not much is at stake: the error is not a costly one. If the decision is a life and death matter, as in some medical research, a much lower p- value than 1 in 20 should be sought. The same is true of experiments that purport to show highly controversial results, such as telepathy or 'paranormal' effects.
As we briefly saw in connection with DNA fingerprinting, statisticians distinguish false positive from false negative errors, sometimes called
type 1 and type 2 errors respectively. A type 2 error, or false negative, is
a failure to detect an effect when there really is one. A type 1 error, or false positive, is the opposite; concluding that there really is something going on when actually there is nothing but randomness. The p-value is the measure of the probability that you have made a type 1 error. Statistical judgement means steering a middle course between the two kinds of error. There is a type 5 error in which your mind goes totally blank whenever you try to remember which is which of type 1 and type 2. I still look them up after a lifetime of use. Where it matters, therefore, I shall use the more easily remembered names, false positive and false negative. I also, by the way, frequently make mistakes in arithmetic. In practice I should never dream of doing a statistical test from first principles as I did for the hypothetical handwriting case. I'd always look up in a table that somebody else - preferably a computer - had calculated.
Skinner's superstitious pigeons made false positive errors. There was in fact no pattern in their world that truly connected their actions to the deliveries of the reward mechanism. But they behaved as if they had detected such a pattern. One pigeon 'thought' (or behaved as if it thought) that left stepping caused the reward mechanism to deliver. Another 'thought' that thrusting its head into the corner had the same beneficial effect. Both were making false positive errors. A false negative error is made by a pigeon in a Skinner box who never notices that a peck at the key yields food if the red light is on, but that a peck when the blue light
is on punishes by switching the mechanism off for ten minutes. There is a genuine pattern waiting to be detected in the little world of this Skinner box, but our hypothetical pigeon does not detect it. It pecks indiscriminately to both colours, and therefore gets a reward less frequently than it could.
A false positive error is made by a farmer who thinks that sacrificing to the gods brings longed-for rain. In fact, I presume (although I haven't investigated the matter experimentally), there is no such pattern in his world, but he does not discover this and persists in his useless and wasteful sacrifices. A false negative error is made by a farmer who fails to notice that there is a pattern in the world relating manuring of a field to the subsequent crop yield of that field. Good farmers steer a middle way between type 1 and type 2 errors.
It is my thesis that all animals, to a greater or lesser extent, behave as intuitive statisticians, choosing a middle course between type 1 and type 2 errors. Natural selection penalizes both type 1 and type 2 errors, but the penalties are not symmetrical and no doubt vary with the different ways of life of species. A stick caterpillar looks so like the twig it is sitting on that we cannot doubt that natural selection has shaped it to resemble a twig. Many caterpillars died to produce this beautiful result. They died because they did not sufficiently resemble a twig. Birds, or other predators, found them out. Even some very good twig mimics must have been found out. How else did natural selection push evolution towards the pitch of perfection that we see? But, equally, birds must many times have missed caterpillars because they resembled twigs, in some cases only slightly. Any prey animal, no matter how well camouflaged, can be detected by predators under ideal seeing conditions. Equally, any prey animal, no matter how poorly camouflaged, can be missed by predators under bad seeing conditions. Seeing conditions can vary with angle (a predator may spot a well-camouflaged animal when looking straight at it, but will miss a poorly camouflaged animal out of the corner of its eye). They can vary with light intensity (a prey may be overlooked at twilight, whereas it would be seen at noon). They can vary with distance (a prey which would be seen at six inches range may be overlooked at a range of 100 yards).
Imagine a bird cruising around a wood, looking for prey. It is surrounded by twigs, a very few of which might be edible caterpillars. The problem is to decide. We can assume that the bird could guarantee to tell whether an apparent twig was actually a caterpillar if it approached the twig really close and subjected it to a minute, concentrated examination in a good light. But there isn't time to do that for all twigs. Small birds with high turnover metabolism have to find food alarmingly often in order to stay alive. Any bird that scanned every individual twig with the equivalent of a magnifying glass would die of starvation before it found its first caterpillar. Efficient searching demands a faster, more cursory and rapid scanning, even though this carries a risk of missing some food. The bird has to strike a balance. Too cursory and it will never find anything. Too detailed and it will detect every caterpillar it looks at, but it will look at too few, and starve.
It is easy to apply the language of type 1 and type 2 errors. A false negative is committed by a bird that sails by a caterpillar without giving it a closer look. A false positive is committed by a bird that zooms in on a suspected caterpillar, only to discover that it is really a twig. The penalty for a false positive is the time and energy wasted flying in for the close inspection: not serious on any one occasion, but it could mount up fatally. The penalty for a false negative is missing a meal. No bird outside Cloud Cuckooland can hope to be free of all type 1 and type 2 errors. Individual birds will be programmed by natural selection to adopt some compromise policy calculated to achieve an optimum intermediate level of false positives and false negatives. Some birds may be biased towards type 1 errors, others towards the opposite extreme. There will be some intermediate setting which is best, and natural selection will steer evolution towards it.
Which intermediate setting is best will vary from species to species. In
our example it will also depend upon conditions in the wood, for example, the size of the caterpillar population in relation to the number of twigs. These conditions may change from week to week. Or they may vary from wood to wood. Birds may be programmed to learn to adjust their policy
as a result of their statistical experience. Whether they learn or not, successfully hunting animals must usually behave as if they are good statisticians. (I hope it is not necessary, by the way, to plod through the usual disclaimer: No, no, the birds aren't consciously working it out with calculator and probability tables. They are behaving as if they were calculating p-values.
They are no more aware of what a p-value means than you are aware of the equation for a parabolic trajectory when you catch a cricket ball or baseball in the outfield. )
Angler fish take advantage of the gullibility of little fish such as gobies. But that is an unfairly value-laden way of putting it. It would be better not to speak of gullibility and say that they exploit the inevitable difficulty the little fish have in steering between type 1 and type 2 errors. The little fish themselves need to eat. What they eat varies, but it often includes small wriggling objects such as worms or shrimps. Their eyes and nervous systems are tuned to wriggling things. They look for wriggling movement and if they see it they pounce. The angler fish exploits this tendency. It has a long fishing rod, evolved from a modified spine, commandeered by natural selection from its original location at the front of the dorsal fin. The angler fish itself is highly camouflaged and it sits motionless on the sea bottom for hours at a time, blending perfectly with the weeds and rocks. The only part of it which is conspicuous is a 'bait', which looks like a worm, a shrimp or a small fish, at the end of its fishing rod. In some deep-sea species the bait is even luminous. In any case, it seems to wriggle like something worth eating
when the angler waves its rod. A possible prey fish say, a goby, is attracted. The angler 'plays' its prey for a little while to hook its attention, then casts the bait down into the still unsuspected region in front of its own invisible mouth, and the little fish often follows. Suddenly that huge mouth is invisible no longer. It gapes massively, there is a violent inrushing of water, engulfing every floating object in the vicinity, and the little fish has pursued its last worm.
From the point of view of a hunting goby, any worm may be overlooked or it may be seen. Once the worm has been detected, it may turn out to be a real worm or an angler fish's lure, and the unfortunate fish is faced with a dilemma. A false negative error would be to refrain from attacking a perfectly good worm for fear that it might be an angler fish lure. A false positive error would be to attack a worm, only to discover that it is really a lure. Once again, it is impracticable in the real world to get it right all the time. A fish that is too risk-averse will starve because it never attacks worms. A fish that is too foolhardy won't starve but it may be eaten. The optimum in this case may not be halfway between. More surprisingly, the optimum may be one of the extremes. It is possible that angler fish are sufficiently rare that natural selection favours the extreme policy of attacking all apparent worms. I am fond of a remark of the philosopher and psychologist William James on human angling:
There are more worms unattached to hooks than impaled upon them; therefore, on the whole, says Nature to her fishy children, bite at every worm and take your chances. (1910)
Like all other animals, and even plants, humans can and must behave as intuitive statisticians. The difference with us is that we can do our calculations twice over. The first time intuitively, as though we were birds or fish. And then again explicitly, with pencil and paper or computer. It is tempting to say that the pencil and paper way gets the right answer, so long as we don't make some publicly detectable blunder like adding in the date, whereas the intuitive way may yield the wrong answer. But there strictly is no 'right' answer, even in the case of pencil and paper statistics. There may be a right way to do the sums, to calculate the p-value, but the criterion, or threshold p-value, that we demand before choosing a particular action is still our decision and it depends upon our aversion to risk. If the penalty for making a false positive error is much greater than the penalty for making a false negative error, we should adopt a cautious, conservative threshold; almost never try a 'worm' for fear of the consequences. Conversely, if the risk-asymmetry is opposite, we should rush in and try every 'worm' that is going: it is unlikely to matter if we keep tasting false worms so we may as well have a go.
Taking on board the need to steer between false positive and false negative errors, let me return to uncanny coincidence and the calculation of the probability that it would have happened anyway. If I dream of a long-forgotten friend who dies the same night, I am tempted, like anyone else, to see meaning or pattern in the coincidence. I really have to force myself to remember that quite a few people die every night, masses of people dream every night, they quite often dream that people die, and coincidences like this are probably happening to several hundred people in the world every night. Even as I think this through, my own intuition cries out that there must be meaning in the coincidence because it has happened to me. If it is true that intuition is, in this case, making a false positive error, we need to come up with a satisfactory explanation for why human intuition errs in this direction. As Darwinians, we should be alive to the possible pressures towards erring on the type 1 or the type 2. side of the divide.
As a Darwinian, I want to suggest that our willingness to be impressed at apparently uncanny coincidence (which is a case of our willingness to see pattern where there is none) is related to the typical population size of our ancestors and the relative poverty of their everyday experience. Anthropology, fossil evidence and the study of other apes all suggest that our ancestors, for much of the past few million years, probably lived in either small roving bands or small villages. Either of these would mean that the number of friends and acquaintances that our ancestors would ordinarily meet and talk to with any frequency was not more than a few dozen. A prehistoric villager could expect to hear stories of startling coincidence in proportion to this small number of acquaintances. If the coincidence happened to somebody not in his village, he wouldn't hear the story. So our brains became calibrated to detect pattern and gasp with astonishment at a level of coincidence which would actually be quite modest if our catchment area of friends and acquaintances had been large.
Nowadays, our catchment area is large, especially because of newspapers, radio and other vehicles of mass news circulation. I've already spelled out the argument. The very best and most spine-creeping coincidences have the opportunity to circulate, in the form of bated-breath stories, over a
far wider audience than was ever possible in ancestral times. But, I am now conjecturing, our brains are calibrated by ancestral natural selection to expect a much more modest level of coincidence, calibrated under small village conditions. So we are impressed by coincidences because of a miscalibrated gasp threshold. Our subjective petwhacs have been calibrated by natural selection in small villages, and, as is the case with so much of modern life, the calibration is now out of date.
A similar argument could be used to explain why we are so hysterically risk-averse to hazards that are much publicized in the newspapers - perhaps anxious parents who imagine ravening paedophiles lurking behind every lamp post on their children's walk from school are 'miscalibrated'.
I guess that there may be another, particular effect pushing in the same direction. I suspect that our individual lives under modern conditions are richer in experiences per hour than were ancestral lives. We don't just get up in the morning, scratch a living in the same way as yesterday, eat a meal or two and go to sleep again. We read books and magazines, we watch television, we travel at high speed to new places, we pass thousands of people in the street as we walk to work. The number of faces we see, the number of different situations we are exposed to, the number of separate things that happen to us, is much greater than for our village ancestors. This means that the number of opportunities for coincidence is greater for each one of us than it would have been for our ancestors, and consequently greater than our brains are calibrated to assess. This is an additional effect, over and above the population size effect that I have already noted.
With respect to both these effects, it is theoretically possible for us to recalibrate ourselves, learn to adjust our gasp threshold to a level more appropriate to modem populations and modern richnesses of experience. But this seems to be revealingly difficult even for sophisticated scientists and mathematicians. The fact that we still do gasp when we do, that clairvoyants and mediums and psychics and astrologers manage to make such a nice living out of us, all suggests that we do not, on the whole, learn to recalibrate ourselves. It suggests that the parts of our brains responsible for doing intuitive statistics are still back in the stone age.
The same may be true of intuition generally. In The Unnatural Nature of Science (1992), the distinguished embryologist Lewis Wolpert has argued that science is difficult because it is more or less systematically counter- intuitive. This is contrary to the view of T. H. Huxley (Darwin's Bulldog) who saw science as 'nothing but trained and organized common sense, differing from the latter only as a veteran may differ from a raw recruit'. For Huxley, the methods of science 'differ from those of common sense only as far as the guardsman's cut and thrust differ from the manner in which a savage wields his club'. Wolpert insists that science is deeply paradoxical and surprising, an affront to common sense rather than an extension of it, and he makes a good case. For example, every time you drink a glass of water you are imbibing at least one molecule that passed through the bladder of Oliver Cromwell. This follows by extrapolation from Wolpert's observation that 'there are many more molecules in a glass of water than there are glasses of water in the sea'. Newton's law
that objects stay in motion unless positively stopped is counter-intuitive. So is Galileo's discovery that, when there is no air resistance, light objects fall at the same rate as heavy objects. So is the fact that solid matter, even a hard diamond, consists almost entirely of empty space. Steven Pinker gives an illuminating discussion of the evolutionary origins of our physical intuitions in How the Mind Works (1998).
More profoundly difficult are the conclusions of quantum theory, overwhelmingly supported by experimental evidence to a stupefyingly convincing number of decimal places, yet so alien to the evolved human mind that even professional physicists don't understand them in their intuitive thoughts. It seems to be not just our intuitive statistics but our very minds themselves that are back in the stone age.
8
HUGE CLOUDY SYMBOLS OF A HIGH ROMANCE
To gild refined gold, to paint the lily,
To throw a perfume on the violet,
To smooth the ice, or add another hue
Unto the rainbow, or with taper-light
To seek the beauteous eye of heaven to garnish, Is wasteful and ridiculous excess.
WILLIAM SHAKESPEARE,
King John, Act IV, scene ii
It is a central tenet of this book that science, at its best, should leave room for poetry. It should note helpful analogies and metaphors that stimulate the imagination, conjure in the mind images and allusions that go beyond the needs of straightforward understanding. But there's bad poetry as well as good, and bad poetic science can lead the imagination along false trails. That danger is the subject of this chapter. By bad poetic science I mean something other than incompetent or graceless writing. I am talking about almost the opposite: about the power of poetic imagery and metaphor to inspire bad science, even if it is good poetry, perhaps especially if it is good poetry, for that gives it the greater power to mislead.
Bad poetry in the form of an over-indulgent eye for poetic allegory, or the inflation of casual and meaningless resemblances into huge cloudy symbols of a high romance (Keats's phrase), lurks behind many magical and religious customs. Sir James Frazer, in The Golden Bough (1922), recognizes a major category of magic which he calls homeopathic or
imitative magic. The imitation varies from the literal to the symbolic. The Dyaks of Sarawak would eat the hands and knees of the slain in order to steady their own hands and strengthen their own knees. The bad poetic idea here is the notion that there is some essence of hand or essence of knee which can be transmitted from person to person. Frazer notes that, before the Spanish conquest, the Aztecs of Mexico believed that by consecrating bread their priests could turn it into the very body of their god, so that all who thereupon partook of the consecrated bread entered into a mystic communion with the deity by receiving a portion of his divine substance into themselves. The doctrine of transubstantiation, or the magical conversion of bread into flesh, was also familiar to the Aryans of ancient India long before the spread and even the rise of Christianity.
Frazer later generalizes the theme:
It is now easy to understand why a savage should desire to partake of the flesh of an animal or man whom he regards as divine. By eating the body of the god he shares in the god's attributes and when he is a vine-god the juice of the grape is his blood; and so by eating the bread and drinking the wine the worshipper partakes of the real body and blood of his god. Thus the drinking of wine in the rites of a vine-god like Dionysus is not an act of revelry, it is a solemn sacrament.
All over the world, ceremonies are based upon an obsession with things representing other things that they slightly resemble, or resemble in one respect. Powdered rhinoceros horn is, with tragic consequences, thought to be aphrodisiac, apparently for no better reason than the superficial resemblance of the horn itself to an erect penis. To take another common practice, professional rainmakers frequently imitate thunder or lightning, or they conjure a miniature 'homeopathic dose' of rain by sprinkling water from a bundle of twigs. Such rituals can become elaborate and costly in time and effort.
Among the Dieri of central Australia, rainmaking wizards, symbolically representative of ancestor gods, were bled (dripping blood represents the longed-for rain) into a large hole inside a hut especially built for the purpose. Two rocks, intended to stand for clouds and presage rain, were then carried by the two wizards some 10 or 15 miles away, where they were placed atop a tall tree, to symbolize the height of the clouds. Meanwhile, back at the hut, the men of the tribe would stoop low and, without using their hands, charge at the walls and butt their way through with their heads. They continued butting back and forth until the hut was destroyed. The piercing of the walls with their heads symbolized the piercing of the clouds and, they believed, released rain from real clouds. As an added precaution, the Great Council of the Dieri
would also keep a stockpile of boys' foreskins in constant readiness, because of their homeopathic power to produce rain (do penises not 'rain' urine - surely eloquent evidence of their power? ).
Another homeopathic theme is the 'scapegoat' (so-called because a particular Jewish version of the rite involved a goat), in which a victim is chosen to embody, signify, or be loaded up with, all the sins and misfortunes of the village. The scapegoat is then driven out, or in some cases killed, carrying the evils of the people with him. Among the Garos people of Assam, near the foothills of the eastern Himalayas, a langur monkey (or sometimes a bamboo rat) used to be captured, led to every house in the village to soak up their evil spirits and then crucified on a bamboo scaffold. In Frazer's words, the monkey is the public scapegoat, which by its vicarious sufferings and death relieves the people from all sickness and mishap in the coming year.
In many cultures the scapegoat is a human victim, and often he is identified with a god. The symbolic notion of water 'washing' away sins is another common theme, sometimes combined with the idea of the scapegoat. In one New Zealand tribe, a service was performed over an individual, by which all the sins of the tribe were supposed to be transferred to him, a fern stalk was previously tied to his person with which he jumped into the river and there unbinding, allowed it to float away to the sea, bearing their sins with it
Frazer also reports that water was used by the rajah of Manipur as a vehicle to transfer his sins to a human scapegoat, who crouched under a scaffold on which the rajah took his bath, dripping water (and washed- away sins) on to the scapegoat below.
Condescension towards 'primitive' cultures is not admirable, so I have carefully chosen examples to remind us that theologies closer to home are not immune to homeopathic or imitative magic. The water of baptism 'washes' away sins. Jesus himself is a stand-in for humanity (in some versions via a symbolic standing in for Adam) in his crucifixion, which homoeopathically atones for our sins. Whole schools of Mariology discern a symbolic virtue in the 'feminine principle'.
Sophisticated theologians who do not literally believe in the Virgin Birth, the Six Day Creation, the Miracles, the Transubstantiation or the Easter Resurrection are nevertheless fond of dreaming up what these events might symbolically mean. It is as if the double helix model of DNA were one day to be disproved and scientists, instead of accepting that they had simply got it wrong, sought desperately for a symbolic meaning so deep as to transcend mere factual refutation. 'Of course,' one can hear them saying, 'we don't literally believe factually in the double helix any more.
That would indeed be crudely simplistic. It was a story that was right for its own time, but we've moved on. Today, the double helix has a new meaning for us. The compatibility of guanine with cytosine, the glove-like fit of adenine with thymine, and especially the intimate mutual twining of the left spiral around the right, all speak to us of loving, caring, nurturing relationships . . . ' Well, I'd be surprised if it quite came to that, and not only because the double helix model is now very unlikely to be disproved. But in science, as in any other field, there really are dangers of becoming intoxicated by symbolism, by meaningless resemblances, and led farther and farther from the truth, rather than towards it. Steven Pinker reports that he is troubled by correspondents who have discovered that everything in the universe comes in threes:
the Father, the Son, and the Holy Ghost; protons, neutrons and electrons; masculine, feminine and neuter; Huey, Dewey, and Louie; and so on, for page after page.
How the Mind Works (1998)
Slightly more seriously, Sir Peter Medawar, the distinguished British zoologist and polymath whom I quoted before, invents a great new universal principle of complementarity (not Bohr's) according to which there is an essential inner similarity in the relationships that hold between antigen and antibody, male and female, electropositive and electronegative, thesis and antithesis, and so on. These pairs have indeed a certain matching oppositeness' in common, but that is all they have in common. The similarity between them is not the taxonomic key to some other, deeper affinity, and our recognizing its existence marks the end, not the inauguration, of a train of thought
Pluto's Republic (1982)
While I am quoting Medawar in the context of becoming intoxicated by symbolism, I cannot resist mentioning his devastating review of The Phenomenon of Man (1959), in which Teilhard de Chardin 'resorts to that tipsy, euphoristic prose poetry which is one of the more tiresome manifestations of the French spirit'. This book is, for Medawar (and for me now, although I confess that I was captivated when I read it as an over-romantic undergraduate), the quintessence of bad poetic science. One of the topics Teilhard covers is the evolution of consciousness, and Medawar quotes him as follows, again in Pluto's Republic:
By the end of the Tertiary era, the psychical temperature in the cellular world had been rising for more than 5oo million years . . . When the anthropoid, so to speak, had been brought 'mentally' to boiling-point some further calories were added . . . No more was needed for the whole inner equilibrium to be upset . . . By a tiny 'tangential' increase, the
'radial' was turned back on itself and so to speak took an infinite leap forward. Outwardly, almost nothing in the organs had changed. But in depth, a great revolution had taken place; consciousness was now leaping and boiling in a space of super-sensory relationships and representations . . .
Medawar drily comments:
The analogy, it should be explained, is with the vaporization of water when it is brought to boiling-point, and the image of hot vapour remains when all else is forgotten.
Medawar also calls attention to the notorious fondness of mystics for 'energy' and 'vibrations', technical terms misused to create the illusion of scientific content where there is no content of any kind. Astrologers, too, think that each planet exudes its own, qualitatively distinct 'energy', which affects human life and has affinities with some human emotion; love in the case of Venus, aggression for Mars, intelligence for Mercury. These planetary qualities are based on - what else? - the characters of the Roman gods after whom the planets are named. In a style reminiscent of the aboriginal rainmakers, the Zodiacal signs are further identified with the four alchemical 'elements': earth, air, fire and water. People born under earth signs like Taurus are, to quote an astrological page chosen at random from the worldwide web, dependable, realistic, down to earth . . . People with water in their chart are sympathetic, compassionate, nurturing, sensitive, psychic, mysterious and possess an intuitive awareness . . . Those who lack water may be unsympathetic and cold.
Pisces is a water sign (I wonder why) and the element of water 'represents unconscious force's energy and power motivating us . . . '
Though Teilhard's book purports to be a work of science, his psychical 'temperature' and 'calories' seem approximately as meaningless as astrological planetary energies. The metaphorical usages are not usefully connected to their real-world equivalents. There is either no resemblance at all, or what resemblance there is impedes understanding rather than aids it.
With all this negativity, we mustn't forget that it is precisely the use of symbolic intuition to uncover genuine patterns of resemblance that leads scientists to their greatest contributions. Thomas Hobbes went too far when he concluded, in chapter 5 of Leviathan (1651), that
Reason is the pace,- Encrease of Science, the way,- and the Benefit of man-kind, the end. And, on the contrary, Metaphors, and senselesse and
ambiguous words, are like ignes fatui; and reasoning upon them, is wandering amongst innumerable absurdities; and their end, contention, and sedition, or contempt.
Skill in wielding metaphors and symbols is one of the hallmarks of scientific genius.
The literary scholar, theologian and children's author C. S. Lewis, in a 1959 essay, made a distinction between magisterial poetry (in which scientists, say, use metaphoric and poetic language to explain to the rest of us something that they already understand) and pupillary poetry (in which scientists use poetic imagery to assist themselves in their own thinking). Important as both are, it is the second usage that I am emphasizing here. Michael Faraday's invention of magnetic 'lines of force', which we can think of as made of springy materials under tension, eager to release their energy (in the sense carefully defined by physicists) was vital to his own understanding of electromagnetism. I've already made
use of the physicist's poetic image of inanimate entities - electrons, say, or light waves - striving to minimize their travel time. This is an easy way to get the right answer,' and it is surprising how far it can be taken. I once heard Jacques Monod, the great French molecular biologist, say that he gained chemical insight by imagining how it would feel to be an electron at a particular molecular juncture. The German organic chemist Kekule reported that he dreamed of the benzene ring in the form of a snake devouring its tail. Einstein was forever imagining: his extraordinary mind led by poetic thought-experiments through seas of thought stranger than even Newton voyaged.
But this chapter is about bad poetic science and we come down with a bump in the following example, sent me by a correspondent:
I consider our cosmic environment has a tremendous influence on the course of evolution. How else do we account for the helical structure of DNA which may be either due to the helical path of incoming solar radiation or the path of Earth orbiting the Sun which, due to its magnetic axis, tilted at 2. ? from the perpendicular, is helical, hence the solstices and equinoxes?
Realistically, there is not the smallest connection between the helical structure of DNA and the helical path of radiation or the planet's orbit. The association is superficial and meaningless. None of the three assists our understanding of any of the others. The author is drunk on metaphor, captivated by the idea of the helix, which misleads him into seeing connections which do not illuminate the truth in any way. Calling it poetic science is too kind: it is more like theological science.
Recently my incoming mail has registered a sharp rise in the normal load of 'chaos theory', 'complexity theory', 'non-linear criticality' and similar phrases. Now I'm not saying that these correspondents lack the faintest, foggiest clue what they are talking about. But I will say it's hard to discover whether they do. New Age cults of all kinds are swimming in bogus scientific language, regurgitated, half-understood (no, less than half) jargon: energy fields, vibration, chaos theory, catastrophe theory, quantum consciousness. Michael Shermer, in Why People Believe Weird Things (1997), quotes a typical example:
This planet has been slumbering for eons and with the inception of higher energy frequencies is about to awaken in terms of consciousness and spirituality. Masters of limitation and masters of divination use the same creative force to manifest their realities, however, one moves in a downward spiral and the latter moves in an upward spiral, each increasing the resonant vibration inherent in them.
My informant about Las Vegas has also made an informal study of London betting shops. She reports that one particular gambler habitually runs, after placing his bet, to a certain tile in the floor, where he stands on one leg while watching the race on the bookmaker's television. Presumably he once won while standing on this tile and conceived the notion that there was a causal link. Now, if somebody else stands on 'his' lucky tile (some other sportsmen do this deliberately, perhaps to try to hijack some of his 'luck' or just to annoy him) he dances around it, desperately trying to get a foot on the tile before the race ends. Other gamblers refuse to change their shirt, or to cut their hair, while they are 'on a lucky streak'. In contrast one Irish punter, who had a fine head of hair, shaved it completely bald in a desperate effort to change his luck. His hypothesis was that he was having rotten luck on the horses and he had lots of hair. Perhaps the two were connected somehow; perhaps these facts were all part of a meaningful pattern! Before we feel too superior, let us remember that large numbers of us were brought up to believe that Samson's fortunes changed utterly after Delilah cut off his hair.
How can we tell which apparent patterns are genuine, which random and meaningless? Methods exist, and they belong in the science of statistics and experimental design. I want to spend a little more time explaining a few of the principles, though not the details, of statistics. Statistics can largely be seen as the art of distinguishing pattern from randomness. Randomness means lack of pattern. There are various ways of explaining the ideas of randomness and pattern. Suppose I claim that I can tell girls' handwriting from boys'. If I am right, this would have to mean that there is a real pattern relating sex to handwriting. A sceptic might doubt this, agreeing that handwriting varies from person to person but denying that there is a sex-related pattern to this variation. How should you decide whether my claim, or the sceptic's, is right? It is no use just accepting my word for it. Like a superstitious Las Vegas gambler, I could easily have mistaken a lucky streak for a real, repeatable skill. In any case, you have every right to demand evidence. What evidence should satisfy you? The answer is evidence that is publicly recorded, and properly analysed.
The claim is, in any case, only a statistical claim. I do not maintain (in this hypothetical example - in reality I am not claiming anything) that I can infallibly judge the sex of the author of a given piece of handwriting. I claim only that among the great variation that exists among handwriting,
some component of that variation correlates with sex. Therefore, even though I shall often make mistakes, if you give me, say, 100 samples of handwriting I should be able to sort them into boys and girls more accurately than could be achieved purely by guessing at random. It follows that, in order to assess any claim, you are going to have to calculate how likely it is that a given result could have been achieved by guessing at random. Once again, we have an exercise in calculating the odds of coincidence.
Before we get to the statistics, there are some precautions you need to take in designing the experiment. The pattern - the non-randomness we seek - is a pattern relating sex to handwriting. It is important not to confound the issue with extraneous variables. The handwriting samples that you give me should not, for instance, be personal letters. It would be too easy for me to guess the sex of the writer from the content of the letter rather than from the handwriting. Don't choose all the girls from one school and all the boys from another. The pupils from one school might share aspects of their handwriting, learning either from each other or from a teacher. These could result in real differences in handwriting, and they might even be interesting, but they could be representative of different schools, and only incidentally of different sexes. And don't ask the children to write out a passage from a favourite book. I should be influenced by a choice of Black Beauty or Biggies (readers whose childhood culture is different from mine will substitute examples of their own).
Obviously, it is important that the children should all be strangers to me, otherwise I'd recognize their individual writing and hence know their sex. When you hand me the papers they must not have the children's names on them, but you must have some means of keeping track of whose is which. Put secret codes on them for your own benefit, but be careful how you choose the codes. Don't put a green mark on the boys' papers and a yellow mark on the girls'. Admittedly, I won't know which is which, but I'll guess that yellow denotes one sex and green the other, and that would be a big help. It would be a good idea to give every paper a code number. But don't give the boys the numbers 1 to 10 and the girls 11 to 20; that would be just like the yellow and green marks all over again. So would giving the boys odd numbers and the girls even. Instead, give the papers random numbers and keep the crib list locked up where I cannot find it. These precautions are those named 'double blind' in the literature of medical trials.
Let's assume that all the proper double blind precautions have been taken, and that you have assembled 20 anonymous samples of handwriting, shuffled into random order. I go through the papers, sorting them into two piles for suspected boys and suspected girls. I may have
some 'don't knows', but let's assume that you compel me to make the best guess I can in such cases. At the end of the experiment I have made two piles and you look through to see how accurate I have been.
Now the statistics. You'd expect me to guess right quite often even if I was guessing purely at random. But how often? If my claim to be able to sex handwriting is unjustified, my guessing rate should be no better than somebody tossing a coin. The question is whether my actual performance is sufficiently different from a coin-tosser's to be impressive. Here is how to set about answering the question.
Think about all possible ways in which I could have guessed the sex of the 20 writers. List them in order of impressiveness, beginning with all 20 correct and going down to completely random (all 20 exactly wrong is nearly as impressive as all 20 exactly right, because it shows that I can discriminate, even though I perversely reverse the sign). Then look at the actual way I sorted them and count up the percentage of all possible sortings that would have been as impressive as the actual one, or more. Here's how to think about all possible sortings. First, note that there is only one way of being 100 per cent right, and one way of being 100 per cent wrong, but there are lots of ways of being 50 per cent right. One could be right on the first paper, wrong on the second, wrong on the third, right on the fourth . . , There are somewhat fewer ways of being 60 per cent right. Fewer ways still of being 70 per cent right, and so on. The number of ways of making a single mistake is sufficiently few that we can write them all down. There were 20 scripts. The mistake could have been made on the first one, or on the second one, or on the third one . . . or on the twentieth one. That is, there are exactly 20 ways of making a single mistake. It is more tedious to write down all the ways of making two mistakes, but we can calculate how many ways there are, easily enough, and it comes to 190. It is harder still to count the ways of making three mistakes, but you can see that it could be done. And so on.
Suppose, in this hypothetical experiment, two mistakes is actually what I did make. We want to know how good my score was, on a spectrum of all possible ways of guessing. What we need to know is how many possible ways of choosing are as good as, or better than, my score. The number as good as my score is 190. The number better than my score is 20 (one mistake) plus 1 (no mistakes). So, the total number as good as or better than my score is 211. It is important to add in the ways of scoring better than my actual score because they properly belong in the petwhac, along with the 190 ways of scoring exactly as well as I did.
We have to set 211 against the total number of ways in which the 20 scripts could have been classified by penny-tossers. This is not difficult to calculate. The first script could have been boy or girl; that is two
possibilities. The second script also could have been boy or girl. So, for each of the two possibilities for the first script, there were two possibilities for the second. That is 2 x 2 = 4 possibilities for the first two scripts. The possibilities for the first three scripts are 2 x 2 x 2 = 8. And the possible ways of classifying all 20 scripts are 2 x 2 x 2 . . . 2. 0 times, or 2 to the power 20. This is a pretty big number, 1,048,576.
So, of all possible ways of guessing, the proportion of ways that are as good as or better than my actual score is 211 divided by 1,048,576, which is approximately 0. 0003, or 0. 02 per cent. To put it another way, if 10,000 people sorted the scripts entirely by tossing pennies, you'd expect only two of them to score as well as I actually did. This means that my score is pretty impressive and, if I performed as well as this, it would be strong evidence that boys and girls differ systematically in their handwriting. Let me repeat that this is all hypothetical. As far as I know, I have no such ability to sex handwriting. I should also add that, even if there was good evidence for a sex difference in handwriting, this would say nothing about whether the difference is innate or learned. The evidence, at least if it came from the kind of experiment just described, would be equally compatible with the idea that girls are systematically taught a different handwriting from boys - perhaps a more 'ladylike' and less 'assertive' fist.
We have just performed what is technically called a test of statistical significance. We reasoned from first principles, which made it rather tedious. In practice, research workers can call upon tables of probabilities and distributions that have been previously calculated. We therefore don't literally have to write down all possible ways in which things could have happened. But the underlying theory, the basis upon which the tables were calculated, depends, in essence, upon the same fundamental procedure. Take the events that could have been obtained and throw them down repeatedly at random. Look at the actual way the events occurred and measure how extreme it is, on the spectrum of all possible ways in which they could have been thrown down.
Notice that a test of statistical significance does not prove anything conclusively. It can't rule out luck as the generator of the result that we observe. The best it can do is place the observed result on a par with a specified amount of luck. In our particular hypothetical example, it was on a par with two out of 10,000 random guessers. When we say that an effect is statistically significant, we must always specify a so-called p- value. This is the probability that a purely random process would have generated a result at least as impressive as the actual result. A p-value of 2 in 10,000 is pretty impressive, but it is still possible that there is no genuine pattern there. The beauty of doing a proper statistical test is that we know how probable it is that there is no genuine pattern there.
Conventionally, scientists allow themselves to be swayed by p-values of 1 in 100, or even as high as 1 in 20: far less impressive than 2, in 10,000. What p-value you accept depends upon how important the result is, and upon what decisions might follow from it. If all you are trying to decide is whether it is worth repeating the experiment with a larger sample, a p- value of 0. 05, or 1 in 20, is quite acceptable. Even though there is a 1 in 20 chance that your interesting result would have happened anyway by chance, not much is at stake: the error is not a costly one. If the decision is a life and death matter, as in some medical research, a much lower p- value than 1 in 20 should be sought. The same is true of experiments that purport to show highly controversial results, such as telepathy or 'paranormal' effects.
As we briefly saw in connection with DNA fingerprinting, statisticians distinguish false positive from false negative errors, sometimes called
type 1 and type 2 errors respectively. A type 2 error, or false negative, is
a failure to detect an effect when there really is one. A type 1 error, or false positive, is the opposite; concluding that there really is something going on when actually there is nothing but randomness. The p-value is the measure of the probability that you have made a type 1 error. Statistical judgement means steering a middle course between the two kinds of error. There is a type 5 error in which your mind goes totally blank whenever you try to remember which is which of type 1 and type 2. I still look them up after a lifetime of use. Where it matters, therefore, I shall use the more easily remembered names, false positive and false negative. I also, by the way, frequently make mistakes in arithmetic. In practice I should never dream of doing a statistical test from first principles as I did for the hypothetical handwriting case. I'd always look up in a table that somebody else - preferably a computer - had calculated.
Skinner's superstitious pigeons made false positive errors. There was in fact no pattern in their world that truly connected their actions to the deliveries of the reward mechanism. But they behaved as if they had detected such a pattern. One pigeon 'thought' (or behaved as if it thought) that left stepping caused the reward mechanism to deliver. Another 'thought' that thrusting its head into the corner had the same beneficial effect. Both were making false positive errors. A false negative error is made by a pigeon in a Skinner box who never notices that a peck at the key yields food if the red light is on, but that a peck when the blue light
is on punishes by switching the mechanism off for ten minutes. There is a genuine pattern waiting to be detected in the little world of this Skinner box, but our hypothetical pigeon does not detect it. It pecks indiscriminately to both colours, and therefore gets a reward less frequently than it could.
A false positive error is made by a farmer who thinks that sacrificing to the gods brings longed-for rain. In fact, I presume (although I haven't investigated the matter experimentally), there is no such pattern in his world, but he does not discover this and persists in his useless and wasteful sacrifices. A false negative error is made by a farmer who fails to notice that there is a pattern in the world relating manuring of a field to the subsequent crop yield of that field. Good farmers steer a middle way between type 1 and type 2 errors.
It is my thesis that all animals, to a greater or lesser extent, behave as intuitive statisticians, choosing a middle course between type 1 and type 2 errors. Natural selection penalizes both type 1 and type 2 errors, but the penalties are not symmetrical and no doubt vary with the different ways of life of species. A stick caterpillar looks so like the twig it is sitting on that we cannot doubt that natural selection has shaped it to resemble a twig. Many caterpillars died to produce this beautiful result. They died because they did not sufficiently resemble a twig. Birds, or other predators, found them out. Even some very good twig mimics must have been found out. How else did natural selection push evolution towards the pitch of perfection that we see? But, equally, birds must many times have missed caterpillars because they resembled twigs, in some cases only slightly. Any prey animal, no matter how well camouflaged, can be detected by predators under ideal seeing conditions. Equally, any prey animal, no matter how poorly camouflaged, can be missed by predators under bad seeing conditions. Seeing conditions can vary with angle (a predator may spot a well-camouflaged animal when looking straight at it, but will miss a poorly camouflaged animal out of the corner of its eye). They can vary with light intensity (a prey may be overlooked at twilight, whereas it would be seen at noon). They can vary with distance (a prey which would be seen at six inches range may be overlooked at a range of 100 yards).
Imagine a bird cruising around a wood, looking for prey. It is surrounded by twigs, a very few of which might be edible caterpillars. The problem is to decide. We can assume that the bird could guarantee to tell whether an apparent twig was actually a caterpillar if it approached the twig really close and subjected it to a minute, concentrated examination in a good light. But there isn't time to do that for all twigs. Small birds with high turnover metabolism have to find food alarmingly often in order to stay alive. Any bird that scanned every individual twig with the equivalent of a magnifying glass would die of starvation before it found its first caterpillar. Efficient searching demands a faster, more cursory and rapid scanning, even though this carries a risk of missing some food. The bird has to strike a balance. Too cursory and it will never find anything. Too detailed and it will detect every caterpillar it looks at, but it will look at too few, and starve.
It is easy to apply the language of type 1 and type 2 errors. A false negative is committed by a bird that sails by a caterpillar without giving it a closer look. A false positive is committed by a bird that zooms in on a suspected caterpillar, only to discover that it is really a twig. The penalty for a false positive is the time and energy wasted flying in for the close inspection: not serious on any one occasion, but it could mount up fatally. The penalty for a false negative is missing a meal. No bird outside Cloud Cuckooland can hope to be free of all type 1 and type 2 errors. Individual birds will be programmed by natural selection to adopt some compromise policy calculated to achieve an optimum intermediate level of false positives and false negatives. Some birds may be biased towards type 1 errors, others towards the opposite extreme. There will be some intermediate setting which is best, and natural selection will steer evolution towards it.
Which intermediate setting is best will vary from species to species. In
our example it will also depend upon conditions in the wood, for example, the size of the caterpillar population in relation to the number of twigs. These conditions may change from week to week. Or they may vary from wood to wood. Birds may be programmed to learn to adjust their policy
as a result of their statistical experience. Whether they learn or not, successfully hunting animals must usually behave as if they are good statisticians. (I hope it is not necessary, by the way, to plod through the usual disclaimer: No, no, the birds aren't consciously working it out with calculator and probability tables. They are behaving as if they were calculating p-values.
They are no more aware of what a p-value means than you are aware of the equation for a parabolic trajectory when you catch a cricket ball or baseball in the outfield. )
Angler fish take advantage of the gullibility of little fish such as gobies. But that is an unfairly value-laden way of putting it. It would be better not to speak of gullibility and say that they exploit the inevitable difficulty the little fish have in steering between type 1 and type 2 errors. The little fish themselves need to eat. What they eat varies, but it often includes small wriggling objects such as worms or shrimps. Their eyes and nervous systems are tuned to wriggling things. They look for wriggling movement and if they see it they pounce. The angler fish exploits this tendency. It has a long fishing rod, evolved from a modified spine, commandeered by natural selection from its original location at the front of the dorsal fin. The angler fish itself is highly camouflaged and it sits motionless on the sea bottom for hours at a time, blending perfectly with the weeds and rocks. The only part of it which is conspicuous is a 'bait', which looks like a worm, a shrimp or a small fish, at the end of its fishing rod. In some deep-sea species the bait is even luminous. In any case, it seems to wriggle like something worth eating
when the angler waves its rod. A possible prey fish say, a goby, is attracted. The angler 'plays' its prey for a little while to hook its attention, then casts the bait down into the still unsuspected region in front of its own invisible mouth, and the little fish often follows. Suddenly that huge mouth is invisible no longer. It gapes massively, there is a violent inrushing of water, engulfing every floating object in the vicinity, and the little fish has pursued its last worm.
From the point of view of a hunting goby, any worm may be overlooked or it may be seen. Once the worm has been detected, it may turn out to be a real worm or an angler fish's lure, and the unfortunate fish is faced with a dilemma. A false negative error would be to refrain from attacking a perfectly good worm for fear that it might be an angler fish lure. A false positive error would be to attack a worm, only to discover that it is really a lure. Once again, it is impracticable in the real world to get it right all the time. A fish that is too risk-averse will starve because it never attacks worms. A fish that is too foolhardy won't starve but it may be eaten. The optimum in this case may not be halfway between. More surprisingly, the optimum may be one of the extremes. It is possible that angler fish are sufficiently rare that natural selection favours the extreme policy of attacking all apparent worms. I am fond of a remark of the philosopher and psychologist William James on human angling:
There are more worms unattached to hooks than impaled upon them; therefore, on the whole, says Nature to her fishy children, bite at every worm and take your chances. (1910)
Like all other animals, and even plants, humans can and must behave as intuitive statisticians. The difference with us is that we can do our calculations twice over. The first time intuitively, as though we were birds or fish. And then again explicitly, with pencil and paper or computer. It is tempting to say that the pencil and paper way gets the right answer, so long as we don't make some publicly detectable blunder like adding in the date, whereas the intuitive way may yield the wrong answer. But there strictly is no 'right' answer, even in the case of pencil and paper statistics. There may be a right way to do the sums, to calculate the p-value, but the criterion, or threshold p-value, that we demand before choosing a particular action is still our decision and it depends upon our aversion to risk. If the penalty for making a false positive error is much greater than the penalty for making a false negative error, we should adopt a cautious, conservative threshold; almost never try a 'worm' for fear of the consequences. Conversely, if the risk-asymmetry is opposite, we should rush in and try every 'worm' that is going: it is unlikely to matter if we keep tasting false worms so we may as well have a go.
Taking on board the need to steer between false positive and false negative errors, let me return to uncanny coincidence and the calculation of the probability that it would have happened anyway. If I dream of a long-forgotten friend who dies the same night, I am tempted, like anyone else, to see meaning or pattern in the coincidence. I really have to force myself to remember that quite a few people die every night, masses of people dream every night, they quite often dream that people die, and coincidences like this are probably happening to several hundred people in the world every night. Even as I think this through, my own intuition cries out that there must be meaning in the coincidence because it has happened to me. If it is true that intuition is, in this case, making a false positive error, we need to come up with a satisfactory explanation for why human intuition errs in this direction. As Darwinians, we should be alive to the possible pressures towards erring on the type 1 or the type 2. side of the divide.
As a Darwinian, I want to suggest that our willingness to be impressed at apparently uncanny coincidence (which is a case of our willingness to see pattern where there is none) is related to the typical population size of our ancestors and the relative poverty of their everyday experience. Anthropology, fossil evidence and the study of other apes all suggest that our ancestors, for much of the past few million years, probably lived in either small roving bands or small villages. Either of these would mean that the number of friends and acquaintances that our ancestors would ordinarily meet and talk to with any frequency was not more than a few dozen. A prehistoric villager could expect to hear stories of startling coincidence in proportion to this small number of acquaintances. If the coincidence happened to somebody not in his village, he wouldn't hear the story. So our brains became calibrated to detect pattern and gasp with astonishment at a level of coincidence which would actually be quite modest if our catchment area of friends and acquaintances had been large.
Nowadays, our catchment area is large, especially because of newspapers, radio and other vehicles of mass news circulation. I've already spelled out the argument. The very best and most spine-creeping coincidences have the opportunity to circulate, in the form of bated-breath stories, over a
far wider audience than was ever possible in ancestral times. But, I am now conjecturing, our brains are calibrated by ancestral natural selection to expect a much more modest level of coincidence, calibrated under small village conditions. So we are impressed by coincidences because of a miscalibrated gasp threshold. Our subjective petwhacs have been calibrated by natural selection in small villages, and, as is the case with so much of modern life, the calibration is now out of date.
A similar argument could be used to explain why we are so hysterically risk-averse to hazards that are much publicized in the newspapers - perhaps anxious parents who imagine ravening paedophiles lurking behind every lamp post on their children's walk from school are 'miscalibrated'.
I guess that there may be another, particular effect pushing in the same direction. I suspect that our individual lives under modern conditions are richer in experiences per hour than were ancestral lives. We don't just get up in the morning, scratch a living in the same way as yesterday, eat a meal or two and go to sleep again. We read books and magazines, we watch television, we travel at high speed to new places, we pass thousands of people in the street as we walk to work. The number of faces we see, the number of different situations we are exposed to, the number of separate things that happen to us, is much greater than for our village ancestors. This means that the number of opportunities for coincidence is greater for each one of us than it would have been for our ancestors, and consequently greater than our brains are calibrated to assess. This is an additional effect, over and above the population size effect that I have already noted.
With respect to both these effects, it is theoretically possible for us to recalibrate ourselves, learn to adjust our gasp threshold to a level more appropriate to modem populations and modern richnesses of experience. But this seems to be revealingly difficult even for sophisticated scientists and mathematicians. The fact that we still do gasp when we do, that clairvoyants and mediums and psychics and astrologers manage to make such a nice living out of us, all suggests that we do not, on the whole, learn to recalibrate ourselves. It suggests that the parts of our brains responsible for doing intuitive statistics are still back in the stone age.
The same may be true of intuition generally. In The Unnatural Nature of Science (1992), the distinguished embryologist Lewis Wolpert has argued that science is difficult because it is more or less systematically counter- intuitive. This is contrary to the view of T. H. Huxley (Darwin's Bulldog) who saw science as 'nothing but trained and organized common sense, differing from the latter only as a veteran may differ from a raw recruit'. For Huxley, the methods of science 'differ from those of common sense only as far as the guardsman's cut and thrust differ from the manner in which a savage wields his club'. Wolpert insists that science is deeply paradoxical and surprising, an affront to common sense rather than an extension of it, and he makes a good case. For example, every time you drink a glass of water you are imbibing at least one molecule that passed through the bladder of Oliver Cromwell. This follows by extrapolation from Wolpert's observation that 'there are many more molecules in a glass of water than there are glasses of water in the sea'. Newton's law
that objects stay in motion unless positively stopped is counter-intuitive. So is Galileo's discovery that, when there is no air resistance, light objects fall at the same rate as heavy objects. So is the fact that solid matter, even a hard diamond, consists almost entirely of empty space. Steven Pinker gives an illuminating discussion of the evolutionary origins of our physical intuitions in How the Mind Works (1998).
More profoundly difficult are the conclusions of quantum theory, overwhelmingly supported by experimental evidence to a stupefyingly convincing number of decimal places, yet so alien to the evolved human mind that even professional physicists don't understand them in their intuitive thoughts. It seems to be not just our intuitive statistics but our very minds themselves that are back in the stone age.
8
HUGE CLOUDY SYMBOLS OF A HIGH ROMANCE
To gild refined gold, to paint the lily,
To throw a perfume on the violet,
To smooth the ice, or add another hue
Unto the rainbow, or with taper-light
To seek the beauteous eye of heaven to garnish, Is wasteful and ridiculous excess.
WILLIAM SHAKESPEARE,
King John, Act IV, scene ii
It is a central tenet of this book that science, at its best, should leave room for poetry. It should note helpful analogies and metaphors that stimulate the imagination, conjure in the mind images and allusions that go beyond the needs of straightforward understanding. But there's bad poetry as well as good, and bad poetic science can lead the imagination along false trails. That danger is the subject of this chapter. By bad poetic science I mean something other than incompetent or graceless writing. I am talking about almost the opposite: about the power of poetic imagery and metaphor to inspire bad science, even if it is good poetry, perhaps especially if it is good poetry, for that gives it the greater power to mislead.
Bad poetry in the form of an over-indulgent eye for poetic allegory, or the inflation of casual and meaningless resemblances into huge cloudy symbols of a high romance (Keats's phrase), lurks behind many magical and religious customs. Sir James Frazer, in The Golden Bough (1922), recognizes a major category of magic which he calls homeopathic or
imitative magic. The imitation varies from the literal to the symbolic. The Dyaks of Sarawak would eat the hands and knees of the slain in order to steady their own hands and strengthen their own knees. The bad poetic idea here is the notion that there is some essence of hand or essence of knee which can be transmitted from person to person. Frazer notes that, before the Spanish conquest, the Aztecs of Mexico believed that by consecrating bread their priests could turn it into the very body of their god, so that all who thereupon partook of the consecrated bread entered into a mystic communion with the deity by receiving a portion of his divine substance into themselves. The doctrine of transubstantiation, or the magical conversion of bread into flesh, was also familiar to the Aryans of ancient India long before the spread and even the rise of Christianity.
Frazer later generalizes the theme:
It is now easy to understand why a savage should desire to partake of the flesh of an animal or man whom he regards as divine. By eating the body of the god he shares in the god's attributes and when he is a vine-god the juice of the grape is his blood; and so by eating the bread and drinking the wine the worshipper partakes of the real body and blood of his god. Thus the drinking of wine in the rites of a vine-god like Dionysus is not an act of revelry, it is a solemn sacrament.
All over the world, ceremonies are based upon an obsession with things representing other things that they slightly resemble, or resemble in one respect. Powdered rhinoceros horn is, with tragic consequences, thought to be aphrodisiac, apparently for no better reason than the superficial resemblance of the horn itself to an erect penis. To take another common practice, professional rainmakers frequently imitate thunder or lightning, or they conjure a miniature 'homeopathic dose' of rain by sprinkling water from a bundle of twigs. Such rituals can become elaborate and costly in time and effort.
Among the Dieri of central Australia, rainmaking wizards, symbolically representative of ancestor gods, were bled (dripping blood represents the longed-for rain) into a large hole inside a hut especially built for the purpose. Two rocks, intended to stand for clouds and presage rain, were then carried by the two wizards some 10 or 15 miles away, where they were placed atop a tall tree, to symbolize the height of the clouds. Meanwhile, back at the hut, the men of the tribe would stoop low and, without using their hands, charge at the walls and butt their way through with their heads. They continued butting back and forth until the hut was destroyed. The piercing of the walls with their heads symbolized the piercing of the clouds and, they believed, released rain from real clouds. As an added precaution, the Great Council of the Dieri
would also keep a stockpile of boys' foreskins in constant readiness, because of their homeopathic power to produce rain (do penises not 'rain' urine - surely eloquent evidence of their power? ).
Another homeopathic theme is the 'scapegoat' (so-called because a particular Jewish version of the rite involved a goat), in which a victim is chosen to embody, signify, or be loaded up with, all the sins and misfortunes of the village. The scapegoat is then driven out, or in some cases killed, carrying the evils of the people with him. Among the Garos people of Assam, near the foothills of the eastern Himalayas, a langur monkey (or sometimes a bamboo rat) used to be captured, led to every house in the village to soak up their evil spirits and then crucified on a bamboo scaffold. In Frazer's words, the monkey is the public scapegoat, which by its vicarious sufferings and death relieves the people from all sickness and mishap in the coming year.
In many cultures the scapegoat is a human victim, and often he is identified with a god. The symbolic notion of water 'washing' away sins is another common theme, sometimes combined with the idea of the scapegoat. In one New Zealand tribe, a service was performed over an individual, by which all the sins of the tribe were supposed to be transferred to him, a fern stalk was previously tied to his person with which he jumped into the river and there unbinding, allowed it to float away to the sea, bearing their sins with it
Frazer also reports that water was used by the rajah of Manipur as a vehicle to transfer his sins to a human scapegoat, who crouched under a scaffold on which the rajah took his bath, dripping water (and washed- away sins) on to the scapegoat below.
Condescension towards 'primitive' cultures is not admirable, so I have carefully chosen examples to remind us that theologies closer to home are not immune to homeopathic or imitative magic. The water of baptism 'washes' away sins. Jesus himself is a stand-in for humanity (in some versions via a symbolic standing in for Adam) in his crucifixion, which homoeopathically atones for our sins. Whole schools of Mariology discern a symbolic virtue in the 'feminine principle'.
Sophisticated theologians who do not literally believe in the Virgin Birth, the Six Day Creation, the Miracles, the Transubstantiation or the Easter Resurrection are nevertheless fond of dreaming up what these events might symbolically mean. It is as if the double helix model of DNA were one day to be disproved and scientists, instead of accepting that they had simply got it wrong, sought desperately for a symbolic meaning so deep as to transcend mere factual refutation. 'Of course,' one can hear them saying, 'we don't literally believe factually in the double helix any more.
That would indeed be crudely simplistic. It was a story that was right for its own time, but we've moved on. Today, the double helix has a new meaning for us. The compatibility of guanine with cytosine, the glove-like fit of adenine with thymine, and especially the intimate mutual twining of the left spiral around the right, all speak to us of loving, caring, nurturing relationships . . . ' Well, I'd be surprised if it quite came to that, and not only because the double helix model is now very unlikely to be disproved. But in science, as in any other field, there really are dangers of becoming intoxicated by symbolism, by meaningless resemblances, and led farther and farther from the truth, rather than towards it. Steven Pinker reports that he is troubled by correspondents who have discovered that everything in the universe comes in threes:
the Father, the Son, and the Holy Ghost; protons, neutrons and electrons; masculine, feminine and neuter; Huey, Dewey, and Louie; and so on, for page after page.
How the Mind Works (1998)
Slightly more seriously, Sir Peter Medawar, the distinguished British zoologist and polymath whom I quoted before, invents a great new universal principle of complementarity (not Bohr's) according to which there is an essential inner similarity in the relationships that hold between antigen and antibody, male and female, electropositive and electronegative, thesis and antithesis, and so on. These pairs have indeed a certain matching oppositeness' in common, but that is all they have in common. The similarity between them is not the taxonomic key to some other, deeper affinity, and our recognizing its existence marks the end, not the inauguration, of a train of thought
Pluto's Republic (1982)
While I am quoting Medawar in the context of becoming intoxicated by symbolism, I cannot resist mentioning his devastating review of The Phenomenon of Man (1959), in which Teilhard de Chardin 'resorts to that tipsy, euphoristic prose poetry which is one of the more tiresome manifestations of the French spirit'. This book is, for Medawar (and for me now, although I confess that I was captivated when I read it as an over-romantic undergraduate), the quintessence of bad poetic science. One of the topics Teilhard covers is the evolution of consciousness, and Medawar quotes him as follows, again in Pluto's Republic:
By the end of the Tertiary era, the psychical temperature in the cellular world had been rising for more than 5oo million years . . . When the anthropoid, so to speak, had been brought 'mentally' to boiling-point some further calories were added . . . No more was needed for the whole inner equilibrium to be upset . . . By a tiny 'tangential' increase, the
'radial' was turned back on itself and so to speak took an infinite leap forward. Outwardly, almost nothing in the organs had changed. But in depth, a great revolution had taken place; consciousness was now leaping and boiling in a space of super-sensory relationships and representations . . .
Medawar drily comments:
The analogy, it should be explained, is with the vaporization of water when it is brought to boiling-point, and the image of hot vapour remains when all else is forgotten.
Medawar also calls attention to the notorious fondness of mystics for 'energy' and 'vibrations', technical terms misused to create the illusion of scientific content where there is no content of any kind. Astrologers, too, think that each planet exudes its own, qualitatively distinct 'energy', which affects human life and has affinities with some human emotion; love in the case of Venus, aggression for Mars, intelligence for Mercury. These planetary qualities are based on - what else? - the characters of the Roman gods after whom the planets are named. In a style reminiscent of the aboriginal rainmakers, the Zodiacal signs are further identified with the four alchemical 'elements': earth, air, fire and water. People born under earth signs like Taurus are, to quote an astrological page chosen at random from the worldwide web, dependable, realistic, down to earth . . . People with water in their chart are sympathetic, compassionate, nurturing, sensitive, psychic, mysterious and possess an intuitive awareness . . . Those who lack water may be unsympathetic and cold.
Pisces is a water sign (I wonder why) and the element of water 'represents unconscious force's energy and power motivating us . . . '
Though Teilhard's book purports to be a work of science, his psychical 'temperature' and 'calories' seem approximately as meaningless as astrological planetary energies. The metaphorical usages are not usefully connected to their real-world equivalents. There is either no resemblance at all, or what resemblance there is impedes understanding rather than aids it.
With all this negativity, we mustn't forget that it is precisely the use of symbolic intuition to uncover genuine patterns of resemblance that leads scientists to their greatest contributions. Thomas Hobbes went too far when he concluded, in chapter 5 of Leviathan (1651), that
Reason is the pace,- Encrease of Science, the way,- and the Benefit of man-kind, the end. And, on the contrary, Metaphors, and senselesse and
ambiguous words, are like ignes fatui; and reasoning upon them, is wandering amongst innumerable absurdities; and their end, contention, and sedition, or contempt.
Skill in wielding metaphors and symbols is one of the hallmarks of scientific genius.
The literary scholar, theologian and children's author C. S. Lewis, in a 1959 essay, made a distinction between magisterial poetry (in which scientists, say, use metaphoric and poetic language to explain to the rest of us something that they already understand) and pupillary poetry (in which scientists use poetic imagery to assist themselves in their own thinking). Important as both are, it is the second usage that I am emphasizing here. Michael Faraday's invention of magnetic 'lines of force', which we can think of as made of springy materials under tension, eager to release their energy (in the sense carefully defined by physicists) was vital to his own understanding of electromagnetism. I've already made
use of the physicist's poetic image of inanimate entities - electrons, say, or light waves - striving to minimize their travel time. This is an easy way to get the right answer,' and it is surprising how far it can be taken. I once heard Jacques Monod, the great French molecular biologist, say that he gained chemical insight by imagining how it would feel to be an electron at a particular molecular juncture. The German organic chemist Kekule reported that he dreamed of the benzene ring in the form of a snake devouring its tail. Einstein was forever imagining: his extraordinary mind led by poetic thought-experiments through seas of thought stranger than even Newton voyaged.
But this chapter is about bad poetic science and we come down with a bump in the following example, sent me by a correspondent:
I consider our cosmic environment has a tremendous influence on the course of evolution. How else do we account for the helical structure of DNA which may be either due to the helical path of incoming solar radiation or the path of Earth orbiting the Sun which, due to its magnetic axis, tilted at 2. ? from the perpendicular, is helical, hence the solstices and equinoxes?
Realistically, there is not the smallest connection between the helical structure of DNA and the helical path of radiation or the planet's orbit. The association is superficial and meaningless. None of the three assists our understanding of any of the others. The author is drunk on metaphor, captivated by the idea of the helix, which misleads him into seeing connections which do not illuminate the truth in any way. Calling it poetic science is too kind: it is more like theological science.
Recently my incoming mail has registered a sharp rise in the normal load of 'chaos theory', 'complexity theory', 'non-linear criticality' and similar phrases. Now I'm not saying that these correspondents lack the faintest, foggiest clue what they are talking about. But I will say it's hard to discover whether they do. New Age cults of all kinds are swimming in bogus scientific language, regurgitated, half-understood (no, less than half) jargon: energy fields, vibration, chaos theory, catastrophe theory, quantum consciousness. Michael Shermer, in Why People Believe Weird Things (1997), quotes a typical example:
This planet has been slumbering for eons and with the inception of higher energy frequencies is about to awaken in terms of consciousness and spirituality. Masters of limitation and masters of divination use the same creative force to manifest their realities, however, one moves in a downward spiral and the latter moves in an upward spiral, each increasing the resonant vibration inherent in them.
