This is where the
radioactive
probes come in.
Richard-Dawkins-Unweaving-the-Rainbow
"A similar method was used to identify skeletons discovered in Yekaterinburg and suspected of belonging to the executed Russian royal family. Prince Philip, Duke of Edinburgh, whose exact relationship to the Romanovs is known, graciously gave blood, and from this it was possible to establish that the skeletons were indeed those of the Tsar's family. In a more macabre case, a skeleton exhumed in South America was proved to belong to Doctor Josef Mengele, the Nazi war criminal known as the 'Angel of Death'. DNA taken from the bones was compared with blood from Mengele's still-living son, and the identity of the skeleton proved. More recently, a corpse dug up in Berlin has been proved, by the same method, to be that of Martin Bormann, Hitler's deputy, whose disappearance had led to endless legends and rumours and more than 6,000 'sightings' around the world.
Despite the name 'fingerprinting', our DNA, being digital, is even more individually characteristic than the patterns of whorls on our fingers. The name is appropriate because, like true fingerprints, DNA evidence is often inadvertently left behind after a person has departed the scene. DNA can be extracted from a bloodstain on a carpet, from semen inside a rape victim, from a crust of dried nasal mucus on a handkerchief, from sweat or from shed hairs. The DNA in the sample can then be compared with that in the blood taken from a suspect. It is possible to assess, to almost any desired level of probability, whether the sample belongs to a particular person or not.
So, what are the snags? Why is DNA evidence controversial? What is it about this important kind of evidence that makes it possible for lawyers to bamboozle juries into misinterpreting or ignoring it? Why have some courts been moved to the despairing extreme of ruling out this evidence altogether?
There are three major classes of potential problem, one simple, one sophisticated and one silly. I'll come to the silly problem and the more sophisticated difficulties later but first, as with any kind of evidence, there is the simple - and very important - possibility of human error. Possibilities, rather, for there are plenty of opportunities for mistakes and even sabotage. A tube of blood may be mislabelled, either by accident or in a deliberate attempt to frame somebody. A sample from the scene of a crime may be contaminated by sweat from a lab technician or a police officer. The danger of contamination is especially great in those cases where an ingenious technique of amplification called PCR (polymerase chain reaction) is used.
You can easily see why amplification might be desirable. A tiny smear of sweat on a gun butt contains precious little DNA. Sensitive though DNA analysis can be, it needs a certain minimum quantity of material to work on. The technique of PCR, invented in 1983 by the American biochemist Kary B. Mullis, is the dramatically successful answer. PCR takes what little DNA there is and produces millions of copies, multiplying again and again whatever code sequences are there. But, as always with amplification, errors are amplified along with the true signal. Stray scraps of DNA contamination from a technician's sweat are amplified as effectively as the specimen from the scene of the crime, with obvious possibilities for injustice.
But human error is not peculiar to DNA evidence. All kinds of evidence are vulnerable to bungling and sabotage, and must be handled with scrupulous care. The files in a conventional fingerprint library may be mislabelled. The murder weapon may have been touched by innocent people as well as the murderer, and their fingerprints have to be taken, along with the suspect's, for elimination purposes. Courts of law are already accustomed to the need to take all possible precautions against mistakes and they still, sometimes tragically, happen. DNA evidence is not immune to human bungling but nor is it particularly vulnerable, except in so far as PCR amplifies error. If all DNA evidence were to be thrown out because of occasional mistakes, the precedent should rule out most other kinds of evidence, too. We have to suppose that codes of practice and rigorous precautions can be developed to guard against human error in the presentation of all kinds of legal evidence.
The more sophisticated difficulties that bedevil DNA evidence will take longer to explain. They, too, have their precedents in conventional types of evidence, although this point often does not seem to be understood in law courts.
Where identification evidence of any kind is concerned, there are two types of error which correspond to the two types of error in any statistical evidence. In another chapter, we shall call them Type 1 and Type 2 errors, but it is easier to think of them as false positive and false negative. A guilty suspect may escape, through not being recognized - false negative. And - false positive (which most people would see as the more dangerous error) - an innocent suspect may be convicted because he happens, by ill luck, to resemble the genuinely guilty party. In the case of ordinary eye- witness identification, an innocent bystander who happens to look a bit like the real criminal could consequently be arrested - false positive. Identity parades are designed to make this less probable. The chance of a miscarriage of justice is inversely related to the number of people standing in the line-up. The danger can be increased in the ways we have
already considered - the line-up being unfairly stacked with clean-shaven men for example.
In the case of DNA evidence the danger of a false positive conviction is theoretically very low indeed. We have a blood sample from a suspect, and we have a specimen from the scene of the crime. If the entire set of genes in both these samples could be written down, the probability of a false conviction is one in billions and billions. Identical twins apart, the chance that any two humans would match all their DNA is tantamount to zero. But unfortunately it is not practical to work out the complete gene sequence of a human being. Even after the Human Genome Project is completed, to attempt the equivalent in the solution of each crime is unrealistic. In practice, forensic detectives concentrate on small sections of the genome, preferably sections that are known to vary in the population. And now our fear must be that, although we could safely rule out mis-identification if the whole genome were considered, there might be a danger of two individuals' being identical with respect to the small portion of DNA that we have time to analyse.
The probability that this would happen ought to be measurable for any particular section of the genome; we could then decide whether it was an acceptable risk. The larger the section of DNA, the smaller the probability of error, just as, in an identity parade, the longer the line-up the safer the conviction. The difference is that an identity parade, in order to compete with the DNA equivalent, would need to contain not a couple of dozen people but thousands, millions or even billions in the line. Apart from this quantitative difference, the analogy- with the identity parade continues. We shall see that there is a DNA equivalent of our hypothetical line-up of clean-shaven men with one bearded suspect. But first, a little more background on DNA fingerprinting.
Obviously we sample the equivalent parts of the genome in both suspect and specimen. These parts of the genome are chosen for their tendency to vary widely in the population. A Darwinian would note that the parts that don't vary are often the parts that have an important role to play in the survival of the organism. Any substantial variations in these important genes are likely to have been removed from the population by the death of their possessors - Darwinian natural selection. But there are other parts of the genome that are very variable, perhaps because they are not important for survival. This isn't the whole story because in fact some useful genes are quite variable. The reasons for this are controversial. It's a bit of a digression but . . . What is this life if, full of stress, we have no freedom to digress?
The 'neutralist' school of thought, associated with the distinguished Japanese geneticist Motoo Kimura, believes that useful genes are equally
useful in a variety of different forms. This emphatically does not mean that they are useless, only that the different forms are equally good at what they do. If you think of genes as writing out their recipes in words, the alternative forms of a gene can be thought of as the very same words written in different typefaces: the meaning is the same, and the product of the recipe will come out the same. Genetic changes, 'mutations', that make no difference are not 'seen' by natural selection. They aren't mutations at all, for all the difference they make to the life of the animal, but they are potentially useful mutations from the point of view of the forensic scientist. The population ends up with lots of variety at such a locus (position in a chromosome), and this kind of variety could in principle be used for fingerprinting.
The other theory of variation, opposed to Kimura's neutral theory, believes that the different versions of the genes really do different things and that there is some special reason why both are preserved by natural selection in the population. For example, there might be two alternative forms of a blood protein, A? and ss, which are susceptible to two infectious diseases called alfluenza and betaccosis respectively, each being immune to the other disease. Typically, an infectious disease needs a critical density of susceptible victims in a population, otherwise an epidemic can't get going. In a population dominated by A? types, there are frequent epidemics of alfluenza but not of betaccosis. So natural selection favours the ss types who are immune to alfluenza. It favours them so much that after a while they come to dominate the population. Now the tables are turned. There are epidemics of betaccosis, but not of alfluenza. The A? types now are favoured by natural selection because they are immune to betaccosis. The population may keep oscillating between A? dominance and ss dominance, or it may settle down to an intermediate mixture, an 'equilibrium'. Either way, we'll see plenty of variation at the gene locus concerned, and this is good news for the finger-printers. The phenomenon is called 'frequency dependent selection' and it is one suggested reason for high levels of genetic variation in the population. There are others.
However, for our forensic purposes, it matters only that there are variable sections of the genome. Whatever the verdict in the controversy over whether the useful bits of the genome are variable, there are in any case lots of other regions of the genome which are never even read, or never translated into their protein equivalents. Indeed, an astonishingly high proportion of our genes seem to be doing nothing whatsoever. They are therefore free to vary-, which makes them excellent DNA fingerprinting material.
As if to confirm the fact that a great deal of DNA is doing nothing useful, the sheer quantity of DNA in the cells of different kinds of organisms is
wildly variable. Since DNA information is digital, we can measure it in
the same kind of units as we measure computer information. One bit of information is enough to specify one yes/no decision: a 1 or a 0, a true or a false. The computer on which I am writing this has 256 megabits (32 megabytes) of core memory. (The first computer that I owned was a
bigger box but had less than one five thousandth of the memory capacity. ) The equivalent fundamental unit in DNA is the nucleotide base. Since there are 4 possible bases, the information content of each base is equivalent to 2 bits. The common gut bacterium Escherichia coli has a genome of 4 mega-bases or 8 megabits. The crested newt, Triturus cristatus, has 40,000 megabits. The 5,000-fold ratio between crested
newt and bacterium is about the same as that between my present computer and my first one. We humans have 5,000 mega-bases or 6,000 megabits. This is 750 times as great as the bacterium (which satisfies
our vanity), but what are we to make of the newt trumping us sixfold? We'd like to think that genome size is not strictly proportional to what it does: presumably quite a lot of that newt DNA isn't doing anything. This is certainly true. It is also true of most of our DNA. We know from other evidence that, of the 3,000 mega-base human genome, only about 2 per cent is actually used for coding protein synthesis. The rest is often called junk DNA. Presumably the crested newt has an even higher percentage
of junk DNA. Other newts have not.
The surplus of unused DNA falls into various categories. Some of it looks like real genetic information, and probably represents old, defunct genes, or out-of-date copies of genes that are still in use. These pseudo-genes would make sense if they were read and translated. But they are not read and translated. Hard disks on computers usually contain comparable junk: old copies of work in progress, scratchpad space used by the computer for interim operations, and so on. We users don't see this junk, because our computers only show us those parts of the disk that we need to know about. But if you get right down and read the actual information on the disk, byte by byte, you'll see the junk, and much of it will make some sort of sense. There are probably dozens of disjointed fragments of this very chapter peppered around my hard disk at present, although there is only one 'official' copy that the computer tells me about (plus a prudent back-up).
In addition to the junk DNA which could he read but isn't, there is plenty of junk DNA which not only isn't read but wouldn't make any sense if it were. There are huge stretches of repeated nonsense, perhaps repeats of one base, or alternations of the same two bases, or repeats of a more complicated pattern. Unlike the other class of junk DNA, we cannot account for these 'tandem repeats' as outdated copies of useful genes. This repetitive DNA has never been decoded, and presumably has never been of any use. (Never useful for the animal's survival, anyway. From
the point of view of the selfish gene, as I explained in another book, we could say that any kind of junk DNA is 'useful' to itself if it just keeps surviving and making more copies of itself. This suggestion has come to be known by the catch-phrase 'selfish DNA', although this is a little unfortunate because, in my original sense, working DNA is selfish too. For this reason, some people have taken to calling it 'ultra-selfish DNA'. )
Anyway, whatever the reason, junk DNA is there, and there in prodigious quantities. Because it is not used, it is free to vary. Useful genes, as we have seen, are severely constrained in their freedom to change. Most changes (mutations) make a gene work less effectively, the animal dies and the change is not passed on. This is what Darwinian natural selection is all about. But mutations in junk DNA (mostly changes in the number of repeats in a given region) are not noticed by natural selection. So, as we look around the population, we find most of the variation that is useful for fingerprinting in the junk regions. As we shall now see, tandem repeats are particularly useful because they vary with respect to number of repeats, a gross feature which is easy to measure.
If it wasn't for this, the forensic geneticist would need to look at the exact sequence of bases in our sample region. This can be done, but sequencing DNA is time-consuming. The tandem repeats allow us to use cunning short-cuts, as discovered by Alec Jeffreys of the University of Leicester, rightly regarded as the father of DNA fingerprinting (and now Sir Alec). Different people have different numbers of tandem repeats in particular places. I might have 147 repeats of a particular piece of nonsense, where you have 84 repeats of the same piece of nonsense in the corresponding place in your genome. In another region, I might have 24 repeats of a particular piece of nonsense to your 38 repeats. Each of us has a characteristic fingerprint consisting of a set of numbers. Each of these numbers in our fingerprint is the number of times a particular piece of nonsense is repeated in our genome.
We get our tandem repeats from our parents. We each have 46 chromosomes, 25 from our father and 23 homologous, or corresponding, chromosomes from our mother. These chromosomes come complete with tandem repeats. Your father got his 46 chromosomes from your paternal grandparents, but he didn't pass them on to you in their entirety. Each of his mother's chromosomes was lined up with its paternal opposite number and bits were exchanged before a composite chromosome was put into the sperm that helped to make you. Every sperm and every egg is unique because it is a different mix of maternal and paternal chromosomes. The mixing process affects the tandem repeat sections as well as the meaningful sections of the chromosomes. So our characteristic numbers of tandem repeats are inherited, in much the same way as our eye colour and hair curliness are inherited. With the difference that, whereas our eye colour results from some kind of joint
verdict of our paternal and our maternal genes, our tandem repeat numbers are properties of the chromosomes themselves and can therefore be measured separately for paternal and maternal chromosomes. At any particular tandem repeat region, each of us has two readings: a paternal chromosome repeat number and a maternal chromosome repeat number. From time to time, chromosomes mutate - suffer a random change - in their tandem repeat numbers. Or a particular tandem region may be split by chromosomal crossing over. This is why there is variation in tandem repeat numbers in the population. The beauty of tandem repeat numbers is that they are easy to measure. You don't have to get embroiled in detailed sequencing of coded DNA bases. You do something a bit like weighing them. Or, to take another equally apt analogy, you spread them out like coloured bands from a prism. I'll explain one way of doing this.
First you need to make some preparations. You make a so-called DNA probe, which is a short sequence of DNA that exactly matches the nonsense sequence in question - up to about 20 nucleotide bases long. This is not difficult to do nowadays. There are several methods. You can even buy a machine off the shelf which makes short DNA sequences to any specification, just as you can buy a keyboard to punch any desired string of letters on a paper tape. By supplying the synthesizing machine with radioactive raw materials, you make the probes themselves radioactive, and so 'label' them. This makes the probes easy to find again later, as natural DNA is not radioactive, and so the two are readily distinguishable from each other.
Radioactive probes are a tool of the trade, which you must have ready before you start a Jeffreys fingerprinting exercise. Another essential tool is the 'restriction enzyme'. Restriction enzymes are chemical tools that specialize in cutting DNA, but cutting it only in particular places. For example, one restriction enzyme may search the length of a chromosome until it finds the sequence GAATTC (G, C, T and A are the four letters of the DNA alphabet; all genes, from all species on earth, differ only in consisting of different sequences of these four letters). Another restriction enzyme cuts the DNA wherever it can find the sequence GCGGCCGC. A number of different restriction enzymes are available in the toolbox of the molecular biologist. They originate from bacteria, who use them for their own defensive purposes. Each restriction enzyme has its own unique search string which it homes in on and cuts.
Now, the trick is to choose a restriction enzyme whose specific search string is completely absent from the tandem repeat we are interested in. The whole length of DNA is therefore chopped into short stretches, bounded by the characteristic search string of the restriction enzyme. Of course, not all the stretches will consist of the tandem repeat we are
looking for. All sorts of other stretches of DNA will happen to be bounded by the favoured search string of the restriction enzyme scissors. But some of them will consist of tandem repeats and the length of each scissored stretch will be largely determined by the number of tandem repeats in it. If I have 147 repeats of a particular piece of DNA nonsense, where you have only 85, my snipped fragments will be correspondingly longer than your snipped fragments.
We can measure these characteristic lengths using a technique that has been around in molecular biology for quite a while. This is the bit that is rather like spreading them out with a prism, as Newton did for white light. The standard DNA 'prism' is a gel electrophoresis column, that is, a long tube filled with jelly through which an electric current is passed. A solution containing the scissored stretches of DNA, all jumbled together, is poured into one end of the tube. The DNA fragments are all electrically attracted to the negative end of the column, which is at the other end of the tube, and they move steadily through the jelly. But they don't all move at the same rate. Like light of low vibration frequency moving through glass, small fragments of DNA move faster than large ones. The result is that, if you switch the current off after a suitable interval, the fragments have spread themselves out along the column, just as Newton's colours spread themselves out because light from the blue end of the spectrum is more readily slowed down by glass than light from the red end.
But so far we can't see the fragments. The jelly column looks uniform all the way down. There is nothing to show that DNA fragments of different size are lurking in discrete bands along its length, and nothing to show which bands contain which variety of tandem repeat. How do we make them visible?
This is where the radioactive probes come in.
To make them visible you can use another cunning technique, the Southern blot, named after its inventor, Edward Southern. (Slightly confusingly, there are other techniques called the Northern blot and the Western blot, but no Mr Northern or Mr Western. ) The jelly column is removed from the tube and laid out on blotting paper. The liquid in the jelly, including the DNA fragments, seeps out of the jelly into the blotting paper. The blotting paper has previously been laced with quantities of the radioactive probe for the particular tandem repeat that we are interested in. The probe molecules line up along the blotting paper, pairing precisely, by the ordinary rules of DNA, with their opposite numbers in the tandem repeats. Surplus probe molecules are washed away. Now the only radioactive probe molecules left in the blotting paper are those bound to their exact opposite numbers that seeped out of the jelly. The blotting paper is now placed on a piece of X-ray film, which is then marked by the radioactivity. So, what you see when you develop the
film is a set of dark bands - another barcode. The final barcode pattern that we read on the Southern blot is a fingerprint for a person, in very much the same way as the Fraunhofer lines are a fingerprint for a star, or the formant lines are the fingerprint for a vowel sound. Indeed, the barcode from the blood looks very like Fraunhofer lines or formant lines.
The details of DNA fingerprinting techniques get quite complicated and I won't go much further. For instance, one strategy is to hit the DNA with lots of probes all at the same time. What you get then is a mixed bag of barcode stripes simultaneously. In extreme cases, the stripes merge into each other and all you get is one big smear with all possible sizes of DNA fragment represented somewhere in the genome. This is no good for identification purposes. At the other extreme, people use only one probe at a time looking at one genetic 'locus'. This 'single-locus fingerprinting' gives you nice clean bars like Fraunhofer lines. But only one or two bars per person. Even so, the chances of confusing people are small. This is because the characteristics we are talking about are not like 'brown eyes versus blue eyes', in which case lots of people would be the same. The characteristics we are measuring, remember, are lengths of tandem repeat fragments. The number of possible lengths is very large, so even single-locus fingerprinting is pretty good for identification purposes. Not quite good enough, however, so in practice forensic DNA finger-printers usually use half a dozen separate probes. Now the chances of error are very low indeed. But we still need to talk about exactly how low, because people's lives or liberties might depend upon it.
First, we must return to our distinction between false positives and false negatives. DNA evidence can be used to clear an innocent suspect, or it can be made to point the finger at a guilty one. Suppose semen is recovered from the vagina of a rape victim. Circumstantial evidence leads the police to arrest a man, suspect A. Suspect A gives a blood sample and it is compared to the semen sample, using a single DNA probe to look at one tandem repeat locus. If the two are different, suspect A is in the clear. We don't even need to look at a second locus.
But what if suspect A's blood matches the semen sample at this locus? Suppose they both share the same barcode pattern, which we shall call pattern P. This is compatible with the suspect's being guilty, but it doesn't prove it. He could just happen to share pattern P with the real rapist. We must now look at some more loci. If the samples still match, what are the odds against such a match being coincidental - a false positive mis-identification? This is where we have to start thinking statistically about the population at large. In theory, by taking blood from a sample of men in the population at large, we should be able to calculate the likelihood that any two men will be identical at each locus
concerned. But from which section of the population do we draw our sample?
Remember our lone bearded man in the old-fashioned line-up identity parade? Here's the molecular equivalent. Suppose that, in the world at large, only one in a million men has pattern P. Does this mean that there is a million to one chance against a wrongful conviction of suspect A? No. Suspect A may belong to a minority group of people whose ancestors immigrated from a particular part of the world. Local populations often share genetic peculiarities, for the simple reason that they are descended from the same ancestors. Of the 2. 5 million South African Dutch, or Afrikaners, most are descended from one shipload of immigrants who arrived from the Netherlands in 1652. As an indicator of the narrowness of this genetic bottleneck, about a million still bear the surnames of 20 of these original settlers. The Afrikaners have a much higher frequency of certain genetic diseases than the population of the world in general. According to one estimate, about 8,000 (one in 300) have the blood condition porphyria variegata, which is much rarer in the rest of the world. This is apparently because they are descended from one particular couple on the ship, Gerrit Jansz and Ariaantje Jacobs, although it is not known which one was the carrier of the (dominant) gene for the condition. (She was one of eight Rotterdam orphanage girls put on the ship to provide wives for the settlers. ) In fact, the condition wasn't noticed at all before modern medicine, because its most marked symptom is a lethal reaction to certain modern anaesthetics (South African hospitals now routinely test for the gene before administering anaesthetic). Other populations often have locally high frequencies of other particular genes, for the same kind of reason. If, to return to our hypothetical court case, suspect A and the real criminal both belong to the same minority group, the likelihood of chance confusion could be dramatically greater than you'd think if you based your estimates on the population at large. The point is that the frequency of pattern P in humans at large is no longer relevant. We need to know the frequency of pattern P in the group to which the suspect belongs.
This need is nothing new. We've already seen the equivalent danger in an ordinary line-up identity parade. If the prime suspect is Chinese, it doesn't do to stand him in a line-up largely consisting of westerners. And the same kind of statistical reasoning about the background population is needed in identifying stolen goods, as well as individual suspects. I have already mentioned my jury service in the Oxford Court. In one of the three cases I sat on, a man was accused of stealing three coins from a rival numismatist. The accused had been caught with three coins in his possession which matched those lost. Counsel for the prosecution was eloquent.
Ladies and gentlemen of the jury, are we really supposed to believe that three coins, of exactly the same type as the three missing coins, would just happen to be present in the house of a rival collector? I put it to you that such a coincidence is too much to stomach.
Jurymen are not permitted to cross-examine. That was the duty of counsel for the defence, and he, though doubtless learned in the law and also eloquent, had no more clue about probability theory' than the prosecutor. I wish he'd said something like this:
M'Lud, we don't know whether the coincidence is too much to stomach, because m'learned friend has not presented us with any evidence at all as to the rarity or commonness of these three coins in the population at large. If these coins are so rare that only one in a hundred collectors in the country has any one of them, the prosecution has a good case, since the defendant was caught with three of them. If on the other hand, these coins are as common as dirt, there is not enough evidence to convict. (To push to the extreme, three coins that I have in my pocket today, all current legal tender, are very probably the same as three coins in Your Lordships pocket)
My point is that it simply never occurred to any of the legally trained minds in the court that it was relevant even to ask how rare these three coins were in the population at large. Lawyers can certainly add up (I once received a lawyer's bill, the last item of which was 'Time spent making out this bill') but probability theory is another matter.
I expect the coins were actually rare. If they hadn't been, the theft would not have been such a serious matter, and the prosecution presumably would never have been brought. But the jury should have been told explicitly. I remember that the question came up in the jury room, and we wished that we were allowed to go back into the court to seek clarification. The equivalent question is equally relevant in the case of DNA evidence, and it is most certainly being asked. Fortunately, provided a sufficient number of separate genetic loci are examined, the chances of mis-identification - even among members of minority groups, even among family members (except identical twins) - can be reduced to genuinely very small levels, far smaller than can be achieved by any other method of identification, including eye-witness evidence.
Exactly how small the residual possibility of error is may still be open to dispute. And this is where we come to the third category of objection to DNA evidence, the just plain silly. Lawyers are accustomed to pouncing when expert witnesses seem to disagree. If two geneticists are summoned to the stand and are asked to estimate the probability of a mis-
identification with DNA evidence, the first may say a 1,000,000 to one while the second may say only a 100,000 to one. Pounce. 'Aha! AHA! The experts disagree! Ladies and gentlemen of the jury, what confidence can we place in a scientific method if the experts themselves can't get within a factor of ten of one another? Obviously the only thing to do is throw the entire evidence out, lock, stock and barrel. '
But, in these cases, although geneticists may be inclined to give different weightings to imponderables such as the racial subgroup effect, any disagreement between them is only over whether the odds against a wrongful identification are hyper-mega-astronomical or just plain astronomical. The odds cannot normally be lower than thousands to one, and they may well be up in the billions. Even on the most conservative estimate, the odds against wrongful identification are hugely greater than they are in an ordinary identity parade. 'M'lud, an identity parade of only 30 men is grossly unfair on my client. I demand a line-up of at least a million men! ' Expert statisticians called to give evidence on the likelihood that a conventional 20-man identity parade could yield a false identification would also disagree among themselves. Some would give the simple answer, one in 20. Under cross-examination they would then agree that it could be one in less than 20, depending upon the nature of the variation in the line-up in relation to the features of the suspect (this was the point about the lone bearded man in the line-up). But the one thing all the statisticians would agree upon is that the odds of mis- identification by sheer chance are at least one in 20. Yet lawyers and judges are normally happy to go along with ordinary identity parades in which the suspect stands in a line of only 20 men.
After reporting the throwing out of DNA evidence in a case at London's central criminal court the Old Bailey, the Independent newspaper of 12 December 1992 predicted a consequent flood of appeals. The idea is that everybody at present languishing in jail, as a result of DNA identification evidence, will now be able to appeal, citing the precedent. But the flood may be even greater than the Independent imagines because, if this throwing out of DNA evidence is really a serious precedent for anything, it will cast doubt on all cases in which the odds against a chance mistake are less than thousands to one. If a witness says she 'saw' somebody and identified him in a line-up, lawyers and juries are satisfied. But the odds of mistaken identity when the human eye is involved are far greater than when the identification is done by DNA fingerprinting. If we take the precedent seriously, it ought to mean that every' convicted criminal m the country will have excellent cause to appeal on grounds of mistaken identity. Even where a suspect was seen by dozens of witnesses with a smoking gun in his hand, the odds of injustice must be greater than one in 1,000,000. A recent highly publicized case in America, where the jury were systematically confused about DNA evidence, has also become
notorious for another piece of bungled probability theory. The defendant, who was known to have beaten his wife, was on trial for finally murdering her. One of the high-profile defence team, a Harvard professor of law, advanced the following argument: Statistics show that of men who beat their wives, only one in 1,000 go on to kill them. The inference that any jury might be expected to draw (indeed, were intended to draw) is that the defendant's beating of his wife should be discounted in the murder trial. Doesn't the evidence show overwhelmingly that a wife- beater is unlikely to turn into a wife murderer? Wrong. Doctor I. J. Good, a professor of statistics, wrote to the scientific journal Nature in June 1995 to explode the fallacy. The defence lawyer's argument overlooks the additional fact that wife-killing is rare compared with wife-beating. Good calculated that if you take that minority of wives who are both beaten by their husbands and murdered by somebody, it is very likely indeed that the murderer will be the husband. This is the relevant way to calculate the odds because, in the case under discussion, the unfortunate wife had been murdered by somebody, after being beaten by her husband.
No doubt there are lawyers, judges and coroners who could benefit from a better understanding of the theory of probability. On some occasions, however, one cannot help suspecting that they understand very well and are feigning incompetence. I do not know if this was so in the case just quoted. The same suspicion is raised by Doctor Theodore Dalrymple, the (London) Spectator's acerbic medical raconteur, in this typically sardonic account, from 7 January 1995, of his being called as an expert witness in a coroner's court:
. . . a wealthy and successful man I knew swallowed 200 tablets and a bottle of rum. The coroner asked me whether I thought he might have taken them by accident I was about to answer with a ringing and confident no, when the coroner made himself a little clearer: was there even a one in a million chance he had taken them by accident?
'Err, well, I suppose so,' replied The coroner (and the man's family) relaxed, an open verdict was returned, the family was ? 750,000 the richer and an insurance company the poorer by an equivalent sum, at least until it put my premium up.
The power of DNA fingerprinting is an aspect of the general power of science that makes some people fear it. It is important not to exacerbate such fears by claiming too much or trying to move too fast. Let me end this rather technical chapter by returning to society and an important and difficult decision that we must collectively make. I would normally fight shy of discussing a topical issue for fear of going out of date, or a local one for fear of being parochial, but the question of a national DNA database is starting to preoccupy most nations in their different ways, and it is bound to become more pressing in the future.
It would in theory be possible to keep a national database of DNA sequences from every man, woman and child in the country. Then, whenever a sample of blood, semen, saliva, skin or hair was found at the scene of a crime, the police would not have to locate a suspect by other means before comparing his DNA with the sample. They could simply do a computer search of the national database. The very suggestion elicits howls of protest. It would be an infringement of individual liberty. It's the thin end of the wedge. A giant step towards a police state. I have always been a little puzzled about why people automatically react so strongly against suggestions such as these. If I examine the matter dispassionately, I think that, on balance, I come out against it. But it is not something to condemn out of hand without even looking at the pros and cons. So let us do so.
If the information is guaranteed to be used only for catching criminals, it is hard to see why anybody who is not a criminal should object. I am aware that plenty of activists for civil liberties will still object in principle. But I genuinely don't understand why, unless we want to protect the rights of criminals to perform crimes without detection. I also see no good reason against a national database of conventional, ink-pad fingerprints (except the practical one that, unlike with DNA, it is hard to do an automatic computer search of conventional fingerprints). Crime is a serious problem which diminishes the quality of life for everybody except the criminals (perhaps even them: presumably there is nothing to stop a burglar's house being burgled). If a national DNA database would significantly help the police to catch criminals, the objections had better be good ones to outweigh the benefits.
Here's an important caution, though, to begin with. It's one thing to use DNA evidence, or mass-screening identification evidence of any kind, to corroborate a suspicion that the police have already reached on other grounds. It's quite another matter to use it to arrest anybody in the country who matches the sample. If there is a certain low probability of coincidental resemblance between, say, a semen sample and the blood of an innocent individual, the probability that that individual will also be falsely suspected on independent grounds is obviously far lower. So the technique of simply searching the database and arresting the one person who matches the sample is significantly more likely to lead to injustice than a system which requires other grounds for suspicion first. If a sample from the scene of a crime in Edinburgh happens to match my DNA, should the police be allowed to hammer on my door in Oxford and arrest me on no other evidence? I think not, but it is worth remarking that the police already do something equivalent with facial features, when they release to the national newspapers an Identikit picture, or a snapshot taken by a witness, and invite people from all over the country
to telephone them if they 'recognize' the face. Once again, we must beware of our natural tendency to trust facial recognition above all other kinds of individual identification.
Setting crime aside, there is a real danger of the information in the national DNA database falling into the wrong hands. I mean into the hands of those who wish to use it not for catching criminals but for other purposes, perhaps connected with medical insurance or blackmail. There are respectable reasons why people with no criminal intent at all might not wish their DNA profile to be known, and it seems to me that their privacy should be respected. For instance, a significant number of individuals who believe they are the father of a particular child are not. Equally, a significant number of children believe somebody to be their real father who is not. Anyone with access to the national DNA database might discover the truth, and the result could be huge emotional distress, marital breakdown, nervous breakdown, blackmail, or worse. There may be some who feel that the truth should always out, however painful, but I think a good case could be made that the sum total of human happiness would not be enhanced by a sudden outburst of revelations about everybody's true paternity.
Then there are the medical and insurance issues. The whole life insurance business depends upon the inability to forecast exactly when somebody will die. As Sir Arthur Eddington said: 'Human life is proverbially uncertain; few things are more certain than the solvency of a life-insurance company. ' We all pay our premiums. Those of us who die later than expected subsidize (the heirs of) those who die earlier than expected. Insurance companies already make statistical guesses which partially subvert the system by enabling them to charge high-risk clients larger premiums. They send a doctor to listen to our hearts, take our blood pressure and investigate our smoking and drinking habits. If actuaries knew exactly when we were all going to die, life insurance would become impossible. In principle, a national DNA database, if actuaries could get their hands on it, might lead us closer to this unfortunate outcome. An extreme could be reached where the only kind of death risk that could be insured against would be pure accident.
Similarly, people screening job applicants, or applicants for places at university, could use DNA information in ways that many of us might find undesirable. Some employers already use dubious methods such as graphology (analysis of handwriting as a supposed guide to character or aptitude). Unlike the case of graphology, there is good reason to think that DNA information might be genuinely useful for judging abilities. But still, I would be one of many who would be disturbed if selection panels made use of DNA information, at least if they did so secretly.
One of the general arguments against national databases of any kind is the 'What if it fell into the hands of a Hitler? ' argument. On the face of it, it is not clear how an evil government would benefit from a database of true information about people. They are so adept at using false information, one might say, why should they bother to abuse true information? In the case of Hitler, however, there is the point about his campaign against Jews and others. Although it is not true that you can recognize a Jew from his DNA, there are particular genes which are characteristic of people whose ancestors come from certain regions of, say, central Europe, and there are statistical correlations between possession of certain genes and being Jewish. It seems undeniable that, if Hitler's regime had had a national DNA database at their disposal, they would have found terrible ways to abuse it.
Are there ways to safeguard society from these potential ills, while retaining the benefit of helping to catch criminals? I'm not sure. I think it might be difficult. You could protect honest citizens against insurance companies and employers by restricting the national database to non- coding regions of the genome. The database would refer only to tandem repeat areas of the genome, not genes that actually do anything. This would prevent actuaries working out our life expectancy and talent scouts second-guessing our abilities. But it would do nothing to protect us against discovering (or against blackmailers discovering) truths about paternity that we might prefer not to know. Quite the contrary. The identification of Josef Mengele's bones from his son's blood was entirely based upon tandem repeat DNA.
