We're Giving Up Privacy and Getting Little in Return
Better to Put People, Not Computers, in Charge of Investigating Potential Plots
Collecting information about every American’s phone calls is an example of data mining. The basic idea is to collect as much information as possible on everyone, sift through it with massive computers, and uncover terrorist plots. It’s a compelling idea, and convinces many. But it’s wrong. We’re not going to find terrorist plots through systems like this, and we’re going to waste valuable resources chasing down false alarms. To understand why, we have to look at the economics of the system.
Data mining works best when you’re searching for a well-defined profile, a reasonable number of attacks per year, and a low cost of false alarms. Credit-card fraud is one of data mining’s success stories: All credit-card companies mine their transaction databases for data for spending patterns that indicate a stolen card.
Many credit-card thieves share a pattern — purchase expensive luxury goods, purchase things that can be easily fenced, etc. — and data mining systems can minimize the losses in many cases by shutting down the card. In addition, the cost of false alarms is only a phone call to the cardholder asking him to verify a couple of purchases. The cardholders don’t even resent these phone calls — as long as they’re infrequent — so the cost is just a few minutes of operator time.
Terrorist plots are different; there is no well-defined profile and attacks are very rare. This means that data-mining systems won’t uncover any terrorist plots until they are very accurate, and that even very accurate systems will be so flooded with false alarms that they will be useless.
Just in the United States, there are trillions of connections between people and events — things that the data-mining system will have to “look at” — and very few plots. This rarity makes even accurate identification systems useless.
Let’s look at some numbers. We’ll be optimistic — we’ll assume the system has a one in 100 false-positive rate (99 percent accurate), and a one in 1,000 false-negative rate (99.9 percent accurate). Assume 1 trillion possible indicators to sift through: that’s about 10 events — e-mails, phone calls, purchases, Web destinations, whatever — per person in the United States per day. Also assume that 10 of them actually indicate terrorists plotting.
This unrealistically accurate system will generate 1 billion false alarms for every real terrorist plot it uncovers. Every day, the police will have to investigate 27 million potential plots in order to find the one real terrorist plot per month. Clearly ridiculous.
This isn’t anything new. In statistics, it’s called the “base rate fallacy,” and it applies in other domains as well. And this is exactly the sort of thing we saw with the National Security Agency (NSA) eavesdropping program: The New York Times reported that the computers spat out thousands of tips per month. Every one of them turned out to be a false alarm, at enormous cost in money and civil liberties.
Finding terrorism plots is not a problem that lends itself to data mining. It’s a needle-in-a-haystack problem, and throwing more hay on the pile doesn’t make that problem any easier. We’d be far better off putting people in charge of investigating potential plots and letting them direct the computers, instead of putting the computers in charge and letting them decide who should be investigated.
By allowing the NSA to eavesdrop on us all, we’re not trading privacy for security. We’re giving up privacy without getting any security in return.