## Racial Profiling No Better than Random Screening

Not that this is any news, but there’s some new research to back it up:

The study was performed by William Press, who does bioinformatics research at the University of Texas, Austin, with a joint appointment at Los Alamos National Labs. His background in statistics is apparent in his ability to handle various mathematical formulae with aplomb, but he’s apparently used to explaining his work to biologists, since the descriptions that surround those formulae make the general outlines of the paper fairly accessible.

Press starts by examining what could be viewed as an idealized situation, at least from the screening perspective: a single perpetrator living under an authoritarian government that has perfect records on its citizens. Applying a profile to those records should allow the government to rank those citizens in order of risk, and it can screen them one-by-one until it identifies the actual perpetrator. Those circumstances lead to a pretty rapid screening process, and they can be generalized out to a situation where there are multiple likely perpetrators.

Things go rapidly sour for this system, however, as soon as you have an imperfect profile. In that case, which is more likely to reflect reality, there’s a finite chance that the screening process misses a likely security risk. Since it works its way through the list of individuals iteratively, it never goes back to rescreen someone that’s made it through the first pass. The impact of this flaw grows rapidly as the ability to accurately match the profile to the data available on an individual gets worse. Since we’ve already said that making a profile is challenging, and we know that even authoritarian governments don’t have perfect information on their citizens, this system is probably worse than random screening in the real world.

In the real world, of course, most of us aren’t going through security checks run by authoritarian governments. In Press’ phrasing, democracies resample with replacement, in that they don’t keep records of who goes through careful security screening at places like airports, so people get placed back on the list to go through the screening process again. One consequence of this is that, since screening resources are never infinite, we can only resample a small subset of the total population at any given moment.

Press then examines the effect of what he terms a strong profiling strategy, one in which a limited set of screening resources is deployed solely based the risk probabilities identified through profiling. It turns out that this also works poorly as the population size goes up. “The reason that this strong profiling strategy is inefficient,” Press writes, “is that, on average, it keeps retesting the same innocent individuals who happen to have large pj [risk profile match] values.”

According to Press, the solution is something that’s widely recognized by the statistics community: identify individuals for robust screening based on the square root of their risk value. That gives the profile some weight, but distributes the screening much more broadly through the population, and uses limited resources more effectively. It’s so widely used in mathematical circles that Press concludes his paper by writing, “It seems peculiar that the method is not better known.”

Other articles on the research here, here, and here. Me on profiling.

Carlo Graziani • February 4, 2009 2:44 PM

Interesting paper. I like the idea of separating the sampling probability from the prior probability, and optimizing the former. I’d never heard of the square-root sampling strategy either, so I guess I learned something today.

I don’t care for the section on “Probabilistic Recognition¨, though. The idea that multiple looks at an individual are probabilistically independent, identically distributed (iid) is very naive. I can believe that the advantage of the Optimal Authoritarian strategy over the Optimal Democratic strategy is less in this circumstance, but not in the way inferred from those curves.