DNA Matching and the Birthday Paradox
Is it possible that the F.B.I. is right about the statistics it cites, and that there could be 122 nine-out-of-13 matches in Arizona’s database?
Perhaps surprisingly, the answer turns out to be yes. Let’s say that the chance of any two individuals matching at any one locus is 7.5 percent. In reality, the frequency of a match varies from locus to locus, but I think 7.5 percent is pretty reasonable. For instance, with a 7.5 percent chance of matching at each locus, the chance that any 2 random people would match at all 13 loci is about 1 in 400 trillion. If you choose exactly 9 loci for 2 random people, the chance that they will match all 9 is 1 in 13 billion. Those are the sorts of numbers the F.B.I. tosses around, I think.
So under these same assumptions, how many pairs would we expect to find matching on at least 9 of 13 loci in the Arizona database? Remarkably, about 100. If you start with 65,000 people and do a pairwise match of all of them, you are actually making over 2 billion separate comparisons (65,000 * 64,999/2). And if you aren’t just looking for a match on 9 specific loci, but rather on any 9 of 13 loci, then for each of those pairs of people there are over 700 different combinations that are being searched.
So all told, you end up doing about 1.4 trillion searches! If 1 in 13 billion searches yields a positive match as noted above, this leads to roughly 100 expected matches on 9 of 13 loci in a database the size of Arizona’s. (The way I did the calculations, I am allowing for 2 individuals to match on different sets of loci; so to get 100 different pairs of people who match, I need a match rate of slightly higher than 7.5 percent per locus.)
EDITED TO ADD (9/14): The FBI is trying to suppress the analysis.