Data Mining for Terrorists Doesn't Work

According to a massive report from the National Research Council, data mining for terrorists doesn’t work. Here’s a good summary:

The report was written by a committee whose members include William Perry, a professor at Stanford University; Charles Vest, the former president of MIT; W. Earl Boebert, a retired senior scientist at Sandia National Laboratories; Cynthia Dwork of Microsoft Research; R. Gil Kerlikowske, Seattle’s police chief; and Daryl Pregibon, a research scientist at Google.

They admit that far more Americans live their lives online, using everything from VoIP phones to Facebook to RFID tags in automobiles, than a decade ago, and the databases created by those activities are tempting targets for federal agencies. And they draw a distinction between subject-based data mining (starting with one individual and looking for connections) compared with pattern-based data mining (looking for anomalous activities that could show illegal activities).

But the authors conclude the type of data mining that government bureaucrats would like to do—perhaps inspired by watching too many episodes of the Fox series 24—can’t work. “If it were possible to automatically find the digital tracks of terrorists and automatically monitor only the communications of terrorists, public policy choices in this domain would be much simpler. But it is not possible to do so.”

A summary of the recommendations:

  • U.S. government agencies should be required to follow a systematic process to evaluate the effectiveness, lawfulness, and consistency with U.S. values of every information-based program, whether classified or unclassified, for detecting and countering terrorists before it can be deployed, and periodically thereafter.
  • Periodically after a program has been operationally deployed, and in particular before a program enters a new phase in its life cycle, policy makers should (carefully review) the program before allowing it to continue operations or to proceed to the next phase.
  • To protect the privacy of innocent people, the research and development of any information-based counterterrorism program should be conducted with synthetic population data… At all stages of a phased deployment, data about individuals should be rigorously subjected to the full safeguards of the framework.
  • Any information-based counterterrorism program of the U.S. government should be subjected to robust, independent oversight of the operations of that program, a part of which would entail a practice of using the same data mining technologies to “mine the miners and track the trackers.”
  • Counterterrorism programs should provide meaningful redress to any individuals inappropriately harmed by their operation.
  • The U.S. government should periodically review the nation’s laws, policies, and procedures that protect individuals’ private information for relevance and effectiveness in light of changing technologies and circumstances. In particular, Congress should re-examine existing law to consider how privacy should be protected in the context of information-based programs (e.g., data mining) for counterterrorism.

Here are more news articles on the report. I explained why data mining wouldn’t find terrorists back in 2005.

EDITED TO ADD (10/10): More commentary:

As the NRC report points out, not only is the training data lacking, but the input data that you’d actually be mining has been purposely corrupted by the terrorists themselves. Terrorist plotters actively disguise their activities using operational security measures (opsec) like code words, encryption, and other forms of covert communication. So, even if we had access to a copious and pristine body of training data that we could use to generalize about the “typical terrorist,” the new data that’s coming into the data mining system is suspect.

To return to the credit reporting analogy, credit scores would be worthless to lenders if everyone could manipulate their credit history (e.g., hide past delinquencies) the way that terrorists can manipulate the data trails that they leave as they buy gas, enter buildings, make phone calls, surf the Internet, etc.

So this application of data mining bumps up against the classic GIGO (garbage in, garbage out) problem in computing, with the terrorists deliberately feeding the system garbage. What this means in real-world terms is that the success of our counter-terrorism data mining efforts is completely dependent on the failure of terrorist cells to maintain operational security.

The combination of the GIGO problem and the lack of suitable training data combine to make big investments in automated terrorist identification a futile and wasteful effort. Furthermore, these two problems are structural, so they’re not going away. All legitimate concerns about false positives and corrosive effects on civil liberties aside, data mining will never give authorities the ability to identify terrorists or terrorist networks with any degree of confidence.

Posted on October 10, 2008 at 6:35 AM22 Comments

Comments

John October 10, 2008 6:56 AM

No worries, the database is still perfectly useful for finding dirt on specific persons. Instead of trawling facts for suspicious persons, they’ll trawl to find suspicious facts for persons. And as earlier intelligence agencies have demonstrated, that works just fine.

Pete Austin October 10, 2008 8:52 AM

Data Mining is valuable when you already have a group of potential terorists, but not when it’s used to scan the whole population.

For example suppose there’s a good successful test for terrorist tendencies (1% false positives and 1% false negatives).

If you already know a terrorist and want to filter the 1000 people they have any contact with, to find the 1 who is another terrorist, this test will probably filter out that one person plus 10 innocents. This ratio is a good starting point for more police work.

But if you just run the test on all 300 million US citizens, to search for 100 terrorists, this test will filter out most of the terrorists, plus 3 million innocents. A worthless starting point for police work, because you have 30,000 innocents per terrorist.

The same logic applies in many other fields. For example in medicine, because a drug or a screening test is valuable to treat people who are already ill, this does not mean it should be applied to the whole population to treat the undiagnosed ill, because any side effects are multiplied in the same way by the large proportion of well people who cannot benefit but might suffer. Or in welfare, because money donations help disaster victims, this does not necessarily mean that hand-outs should be given to the whole population, because side-effects such as a disencentive to work are multiplied. Whenever you apply a technique more widely, it is likely to be less effective.

Carlo Graziani October 10, 2008 8:57 AM

I don’t know whether to laugh or cry. This farcical exercise is re-made every few years, with the same results.

The NRC conducts this sort of exercise periodically, calling out the Federal government for the shamefully stupid magical thinking that underlies it’s bad-guy detection programs. The report is typically scientifically careful, and marshalls the available evidence to produce an unarguably sensible set of judgments and recommendations.

The report is then promptly round-filed by the securocracy, which merrily carries on, safe in the knowledge that this sort of technomagical bullshit unlocks all kinds of budgetary treasure-troves in Congress and within the Executive. We’ve seen the exact same script play out with polygraphs as loyalty-screening tools, despite the fact that their effectiveness has been demonstrated to be about comparable to that of ouija boards.

The past is prologue: not one of these recommendations will be implemented in any meaningful way. This bullshit will continue to reap budgetary rewards from credulous appropriators and government accountants.

sooth sayer October 10, 2008 9:23 AM

May be the title of the item should be
“Current data mining programs for terrorist don’t work”

The current title is trying to prove something patently unprovable i.e. that data mining can’t work.

This appears to me, is an opinion based on social cost/benefit “feelings” rather than true scientific underpinnings.

paul October 10, 2008 10:10 AM

I wonder what constitutes inappropriate harm. Do you actually have to be arrested or disappeared, or does it suffice that (as with the new NSA disclosures) your private conversations are passed around the surveillance office for amusement?

Clive Robinson October 10, 2008 10:19 AM

I was going to bang on about “follow the money” and “pork” but not only have I been beeten to it I noticed something realy worrying,

“…meaningful redress to any individuals inappropriately harmed by their operation.”

It sounds very laudable except for the “inappropriately” bit.

Basicaly who decides what is and is not “inappropriate”.

You obviously cannot use judges as it will get appealed up to the SCourt by the Gov.

And bassed on some of the ludi-crass findings they have made in recent times I would not be surprised if they sentanced the “harmed party” to life for “wasting police time” or “Government resourcess” or some such…

MarcoVincenzo October 10, 2008 11:37 AM

from boingboing, a little more info on William Perry.

“That’s Bill Perry, former SecDef from 93-97! It’s not just some ivory tower analysis then …. “

George October 10, 2008 11:39 AM

The National Research Council obviously lacks anyone on its staff with serious Security credentials. Anyone involved in planning and fighting the Global War On Terror knows that all aspects of any widespread surveillance or data-mining operation needs to be classifed at the highest level to ensure its effectiveness against the enemy.

(The enemy, of course, includes the many Liberal members of the public, press, and Congress who hate America and will pounce on any rumors of “ineffectiveness,” “abuse,” or “waste” to undermine the effort and aid terrorists. Loyal, patriotic Americans will naturally be grateful and completely supportive of anything the Unitary Executive does to protect the Homeland and our children from Unspeakable Evil. They know that if they have nothing to hide, they have nothing to worry about if the government sweeps up their data.)

Silly Ratfaced Git October 11, 2008 12:04 AM

@George

If you have nothing to hide then please give me your name, debit card number and its PIN please.

Everyone has things to hide. Those that think they don’t are idiots and are a danger to themselves.

neill October 11, 2008 12:57 AM

can data-mining be done with the IP encrypted?
that way the identity of innocent people would be ensured, but IF there is probable cause to believe the IP belongs to criminals a court order could allow/enable decryption of the IP

PLS October 11, 2008 1:14 AM

Documents like this are political, not technical. Someone who used to be president of anything is a long way from technical work.

Such panels come around for briefings to our labs but usually can only make metaphors about what they see, usually incorrect ones.

Heavy on Seattle, low on recent knowledge. I’m not an expert on this subject but I’ve seen their kind on the subjects where I am expert and it wasn’t pretty.

Clive Robinson October 11, 2008 7:54 AM

@ Silly Ratfaced Git,

Not sure if you or George are attempting to take the **** more 😉

However you did leave out one point in your,

“Everyone has things to hide. Those that think they don’t are idiots and are a danger to themselves.”

Which is they are even more of a danger to others…

Which being the slightly selfish g** I am worries me the most…

Michael October 12, 2008 10:54 AM

The 9/11 commission took a hard look at this and related issues. Notably, wide-spread data mining did not come up as a recommendation. However, information sharing between domestic and foreign intelligence, network analysis of associations with known enemy actors and better deployment of human intelligence assets were all incorporated in their recommendations.

fairb October 13, 2008 8:28 AM

The GIGO comment was interesting. I’m as concerned by the prospect of anyone using data mining techniques as the next man. But surely the argument about OpSec applies to any and all methods of policing / counter-terrorism. If all criminals/terrorists had perfect OpSec they would never be caught by any strategy, from data mining to walking the beat with your eyes open. We rely on THEM being as human as US, always have. By that argument we should just give up and wait for anarchy to bloom.

John Scholes October 13, 2008 10:43 AM

@fairb

Your perfect OpSec comment has to be right.

The real fallacy is the belief that something has to be done about terrorism. It doesn’t. The terrorist problem is too small to bother about.

Unfortunately, life is full of unpleasant twists of fate. They cannot all be avoided. Resources are limited. Efforts have to be focussed where they can do the most good.

The snag about most anti-terrorist efforts is that they don’t do much good (because the problem is so small), whereas they do substantial harm (loss of civil liberties).

Unfortunately, all discussion of this is hopelessly skewed because Joe SixPack thinks risk/harm is proportional to amount of media coverage.

Of course, various groups in society have much to gain from scaring Joe SixPack in this way.

I have a great faith in the ill-educated masses, and in the positive side of technical developments like the internet. I also tend to think that more information, more discussion is the right way forward.

But at the moment we are not doing a terribly good job. It took years for the public in the UK and the US to realize they had been conned on Iraq, even now substantial minorities wrap themselves in the flag and think it their duty to “support the government”. It looks as though the 42-day detention may finally be sunk in the UK, but it has been a long tough struggle.

Clive Robinson October 13, 2008 1:51 PM

@ John Scholes

“It looks as though the 42-day detention may finally be sunk in the UK, but it has been a long tough struggle.”

Only for now…

Bruce has a saying about cryptography and the way attacks against a system only get stronger with time.

I think it is perhaps time he re-worked it for civil liberties…

Leave a comment

Login

Allowed HTML <a href="URL"> • <em> <cite> <i> • <strong> <b> • <sub> <sup> • <ul> <ol> <li> • <blockquote> <pre> Markdown Extra syntax via https://michelf.ca/projects/php-markdown/extra/

Sidebar photo of Bruce Schneier by Joe MacInnis.