Bruce Schneier | |||||||||||||||
Schneier on SecurityA blog covering security and security technology. « Mythbusters Episode on RFID Security Nixed | Main | Turning off Fire Hydrants in the Name of Terrorism » September 11, 2008DNA Matching and the Birthday ParadoxIs it possible that the F.B.I. is right about the statistics it cites, and that there could be 122 nine-out-of-13 matches in Arizona's database? EDITED TO ADD (9/14): The FBI is trying to suppress the analysis. Posted on September 11, 2008 at 6:21 AM • 30 Comments • View Blog Reactions To receive these entries once a month by e-mail, sign up for the Crypto-Gram Newsletter. Now add the odds in that more than one suspect in a given case happens to match DNA found at the crime scene and you are back to 1 in a trillion odds. Matthew Posted by: Matthew at September 11, 2008 6:41 AM I guess that I am going to have to RTFA because I don't know what it is about just from the extract. Posted by: Bernie at September 11, 2008 7:11 AM They keep leaving out that 1 mismatch in N loci is proof of a mismatch. Somehow a few failures to match gets ignored and the innocent gets convicted. Posted by: Roy at September 11, 2008 7:12 AM Of course, these odds aren't really relevant if the DNA evidence is used to prove the guild or innocence of a suspect, for who there already is other (circumstantial) evidence. When you start using DNA to *find* the suspect, *and* prove his guild, things start to go wrong. BTW, the 1.4 trillion searches are based on a very naive algorithm. With some clever form of representation and list sorting, you would probably be able to reduce it to a search for matches between consecutive items. Posted by: Sparky at September 11, 2008 7:17 AM Unfortunately, all of the math is based on the false assumption that the loci are statistically independent. I realize that the assumption makes the math easier, but that's no excuse. Within genetically similar populations (races, tribes, families, etc.) all of the odds change. While I support this use of DNA, people shouldn't be citing odds that have no basis in reality. Its a misuse of mathematics. Jeff Posted by: jeff at September 11, 2008 7:19 AM @Roy: that might be because you might get mismatched loci because of damaged crime scene samples. It's a bit like regular fingerprinting, you don't have to match every single like, because you're always comparing two imperfect prints Posted by: Sparky at September 11, 2008 7:20 AM They should be using loci chosen by population geneticists, in other words specific loci within the general population and there's a reason they use 13 markers as a minimum number to match an individual. One of the regions is likely microsatellite. Also, if one thinks about it, one is studying a unique population - those human beings who are incarcerated in prison for various reasons. There may be an unconscious sample selection bias going on within the criminal justice labs due to the fact that these people are already previous or past convicts. If this subpopulation has many markers in common such as SNPs, and these markers aren't ignored, then one would get a very high frequency of matches, but the dataset would still be biased. Are you screening this individual to see if he committed a current crime or whether he has been incarcerated at some point for a past crime? Since the data gathered is mostly from convicts, one is working with a biased data set from the start. If one looks for common patterns within the biased data set, one proves what one is looking for. This is pointed out in The Mismeasure of Man by Gould and it's bad science. Posted by: John Moore at September 11, 2008 7:43 AM > While I support this use of DNA, people Which is exactly the point being made -- FBI regularly misuses mathematics to misrepresent to juries what exactly does "defendant's DNA was found at the crime scene" actually mean. They're trying *hard* to give the impression that DNA matching is infallible in practice, which is simply not true. Posted by: xxx at September 11, 2008 7:48 AM Kary Mullis, who won the Nobel for inventing the polymerase chain reaction that made DNA analysis practical, insisted on a large number of loci -- 24, if memory serves -- for forensic identification. Posted by: Roy at September 11, 2008 8:06 AM So what are the odds of a false negative? So many criminals convicted of serious crimes have been exhonorated based on DNA evidence, could any of them have been guilty? Posted by: Garbage In at September 11, 2008 8:09 AM @xxx but the odds of someone becoming a suspect AND matching on 13 loci is still so small as to not be relevant. If you just go looking through the whole population to find a match to DNA found at a crime scene, and have no corroborating evidence, then that's another story. I doubt that will ever happen, as the odds are still good that any match would have a good alibi, e.g., living in another region from where the crime occurred, back in prison, dead, etc. Posted by: noah at September 11, 2008 8:23 AM noah, if they test precisely 13 loci and someone (who is already a suspect for other reasons, including actual evidence) matches on all, that's a strong indicator. If they test 200 loci and someone matches on 13, that's rather weak evidence. Bruce, this is the reason people like us don't often get to serve on juries. I've been excluded for knowing arithmetic. Posted by: Seth at September 11, 2008 8:37 AM > So what are the odds of a false negative? Pretty good. I seem to recall a case where identical twins *both* lost ("won"?) a paternaty case. Posted by: M Welinder at September 11, 2008 8:39 AM @ Garbage In > So what are the odds of a false negative? So many Considering that one way this could happen is if the criminals intentionally have planted DNA evidence from an unrelated party, I'd guess that mathematical ponderings aren't going to help you calculate the odds here. Posted by: RonK at September 11, 2008 8:46 AM @noah LOL! You have nothing to fear as long as you have an alibi like you're dead or already in prison - too funny. I hope that was meant as a joke anyhow. Posted by: rich at September 11, 2008 9:05 AM The problem with the logic is this statement "aren't just looking for a match on 9 specific loci, but rather on any 9 of 13 loci" In criminal court the loci match has to be excact. In the examples provided if loci 1 matched loci 9 then it would be considered a succesfull match. For it to be used for identification purposes loci 1 has to match to loci 1 and so on. Posted by: Chris at September 11, 2008 9:22 AM Note that by the standards of the typical criminal judicial process, even a 1% false positive rate is more than acceptable. Compare that rate with what one would expect from tests such as witnesses picking suspects out of a 6-person line-up, or from a stack of 20 photographs, and you get an idea of why the FP rate doesn't really matter. We're just used to trusting live witnesses more than inanimate ones, so we demand a higher standard of fingerprints, DNA, and the like. Not really justifiably so, in my opinion. Posted by: Carlo Graziani at September 11, 2008 9:36 AM (note: where I live, there is no such thing as a jury trial) Why is it that the attorneys get to exclude people from a jury anyway? Wasn't the whole idea that a jury is a random sample from the population? Posted by: Sparky at September 11, 2008 10:12 AM Generally, attorneys can exclude anybody who they convince the judge is likely to be biased; in addition, each side usually gets some number of "peremptory" challenges where they can dismiss people without stating a reason. The first makes sense; the second avoids a lot of hassle over whether someone is sufficiently prejudiced to exclude. Posted by: Seth at September 11, 2008 10:25 AM I was actually on a murder jury years ago when I lived in Minnesota, and I remember two things about the DNA evidence: 1. Listening to cross-examination of the DNA sequencing technician is one of the most boring things I've ever experienced, and 2. The false positive rate they quoted along with the DNA evidence was "approximately 1 in 250", consistent with what they say in this article. I remember killing a lot of time in my sequestered hotel room playing around with the probabilities to convince myself that that the 1 in 250 number was reasonable, and what that implied. Posted by: kaszeta at September 11, 2008 10:45 AM The more I read the comments the more I believe that none of us know what we are talking about. DNA testing has been highly scrutinized in courts and the way the evidence has to be presented is very specific. This is a huge improvement over fingerprints which still have an air of infallibiltity around them. At least with DNA evidence the odds are presented. Posted by: Derick at September 11, 2008 12:12 PM @M Welinder Actually, that was a false positive, not a false negative. Both twins were identified as being the father. One is the father and the other isn't creating a false positive. Posted by: NickFadz at September 11, 2008 1:16 PM @rich: noah's comment must've been a joke. i mean when was the last time that already being in jail was a credible alibi? Posted by: kiwano at September 11, 2008 1:55 PM I hang out with way too many molecular geneticists. I've always enjoyed getting their goat by proposing that the only appropriate use of genetics in the courtroom is to prove the negative. I never bother to try this with pop-gen kids because they tend to know the difference. Posted by: TB at September 11, 2008 11:35 PM Surely the odds of a particular individual matching some human DNA profile cannot be narrower than 1 in the-total-population-of-history, which is probably on order of 10^10 people. So saying anything stronger than "chance is 1 in 10 billion" when referring to the human population is a fallacy. *someone* had to match, and there are only on order 10 billion someone to choose from! Posted by: Craigh Hughes at September 12, 2008 1:09 PM I am in fact working as a population genetics postdoc. First one must not assume that courts are up with the facts. They are run by lawyers, not scientists and really have little to know idea when it comes to numbers espicaly statistics. You only have to look at some of the lead evidence used by the FBI that is now pretty well debunked now to see how courts don't deal well with this. The problem often comes down to the "the probability of what?" Note the probability of a match is incomplete. Whats the Null? What matches what? My DNA to DNA found at the crime scene.. where I work? This is the problem of proper hypothesis testing. A good example of this is what locus vary with ethnic origin. Its quite different, and as such if a particular race (say Maori people in NZ) have a much higher chance of matching another person from that same race that 2 Europeans matching each other (I can go the other way round too, like for some African groups). The second is what kind of Judaical bias you prefer? Is it better to let 100 guilty people go free rather than one innocent person be locked up? Most western countries are biased *against* locking up or mistreating innocents in principal (Perhaps not in practice). I will reiterate what some have said above. If you data mine a DNA database, the current loci are not good enough since this is not what they are designed for. But as weight of evidence from a small list of suspects (selected *without* knowledge of the DNA) it works well and perhaps even intuitively. A suspect with other evidence that was found *after* the DNA "match" was discovered is less concrete, and givening odds like 1:1000000 are complete rubbish in this case. At least without proper statical hypothesis. Posted by: greg at September 15, 2008 7:13 AM @Chris: Posted by: Filias Cupio at September 15, 2008 10:40 PM @noah: "but the odds of someone becoming a suspect AND matching on 13 loci is still so small as to not be relevant. If you just go looking through the whole population to find a match to DNA found at a crime scene, and have no corroborating evidence, then that's another story. I doubt that will ever happen, as the odds are still good that any match would have a good alibi, e.g., living in another region from where the crime occurred"
http://www.forensic-evidence.com/site/EVID/EL_DNAerror.html Here the guy had an alibi, lived 200 miles away, was too ill to have committed the crime (it doesn't go into details in the article, but if it's the case I'm thinking of, the burglar broke in through a small, high window, whereas this guy couldn't walk in through his own front door on a bad day), and was picked out of a DNA database with no other evidence against him. None of this counted in his favour, though -- he was in jail for months until his lawyer got another DNA test done, on more loci, which failed to match. (In this case, the original match was only 6 loci, but since this was "a 1 in 37 million probability", obviously "it had to be him".)
Posted by: wm at September 17, 2008 4:02 AM Post a comment
Powered by Movable Type. Photo at top by Steve Woit.
Schneier.com is a personal website. Opinions expressed are not necessarily those of BT. |
|
Comments