Schneier on Security
A blog covering security and security technology.
« Friday Squid Blogging: Giant Robotic Squid |
| Printer Security »
August 7, 2006
Data Mining and Terrorism
Nice article from CIO Magazine about data mining and terrorism.
Posted on August 7, 2006 at 6:48 AM
• 8 Comments
To receive these entries once a month by e-mail, sign up for the Crypto-Gram Newsletter.
So... now we know whose fault it is! :)
"One subject-based data mining technique gaining traction among government practitioners and academics is called link analysis. Link analysis uses data to make connections between seemingly unconnected people or events. If you know someone is a terrorist, you can use link analysis software to uncover other people with whom the suspect may be interacting."
If you know someone is a terrorist, why wouldn't you be tapping their phone, apartment, car, shoes and dog?
What's the need for "software" here? Cops have been doing this for years?
"Many experts believe that the NSA project analyzing millions of domestic phone records is this kind of link analysis system."
I'm not an expert, but I don't believe that. You wouldn't need to check "millions of domestic phone records". You'd just need to check the phone records of the known or suspected terrorists.
"These patterns might include purchases that are out of line with someone's pay grade, unreported foreign travel or e-mail exchanges with a person known to work for a foreign government, says a counterintelligence official involved with the project who requested anonymity."
Once again, you're starting with known individuals. It seems that their rationalizations keep coming back to that.
Unfortunately, we are dealing with the needle in the haystack. As Bruce has written, lots of looking at data will only be able to identify potential targets. For instance, credit card expenditures that exceed paygrade, or lots of out of state purchases for someone who doesn't make enough money for such a lifestyle will trigger a deeper investigation. There will be a lot of false leads with this type of work, and the question remains are those resources better spent searching out threats through more traditional means? Or alternatively, do we indeed have so many resources that this type of goose chasing is an effective use of them?
I would like to ask the believers if we should put ALL our efforts into datamining, would they feel safe then? I doubt it...
> If you know someone is a terrorist, why wouldn't you be tapping their phone, apartment, car, shoes and dog?
You do do that, of course (well, a pen register, anyway). The problem is that 99% of the people they phone/email/shop with/whatever are innocent. But with link analysis, you can identify particular clusters and even types of interaction which will help to distinguish between casual contacts and additional suspects.
Naturally the suspects will practice tradecraft to try to disguise these sorts of interactions, but -- as the CIA recently demonstrated in Italy -- that is probably a lot harder than it looks.
> the question remains are those resources better spent searching out threats through more traditional means?
One great thing about data mining is that, unlike almost any other kind of investigative technique, it answers this question at the same time that it finds the inferences. That is, it doesn't just say "X knows Y", but provides enough additional information that you can work out if the lead is actually worth following up, and potentially even in what order leads should be investigated for maximum likelihood of success at minimum cost.
In contrast the traditional model is to try every lead you can find, one after another, until you run out of time or money.
> I would like to ask the believers if we should put ALL our efforts into datamining, would they feel safe then? I doubt it...
Of course not. That would be foolish. It happens to be a useful investigative technique, and often a susprisingly cheap and powerful one; but it is just one more tool on the belt. The odd thing, though, is that people who can see that option (putting ALL our resources into this one tool) would be a ridiculous extremum, seem to think that the only logical alternative is NONE!
"The problem is that 99% of the people they phone/email/shop with/whatever are innocent."
"But with link analysis, you can identify particular clusters and even types of interaction which will help to distinguish between casual contacts and additional suspects."
No, what you end up with is the old "7 degrees of Kevin Bacon" situation.
Known Terrorist ("KT") #1 talks to Innocent Person ("IP") #1, who talks to IP #2, ... to IP #6, who talks to another KT #2.
So, those 6 innocent people are know "suspected terrorists".
And their contacts are mapped the same way ... which leads to other contacts with other Known Terrorists.
Eventually, EVERYONE is linked to a Known Terrorist.
> No, what you end up with is the old "7 degrees of Kevin Bacon" situation.
You might, if you approached it naively.
If you actually used social link analysis -- which is a well studied, formal discipline in sociology and network theory -- you do much better, even with rules only slightly more complicated than "can be joined with 6 or fewer links".
For example, see the URL linked through my name. In that real world case study, the author started with two suspects identified in January 2000, and by applying quite simple rules in mapping their social network, developed a map of 43 nodes in which:
* every person on the map turns out to have been proven, or strongly suspected, of being a terrorist for reasons unrelated to the map;
* all of the 9/11 hijackers and both of the USS Cole bombing suspects are on the map; and
* Mohammed Atta is identified as the probable leader.
The data to generate this analysis was all available to the US government before the 9/11 attacks. Hence the perhaps belated interest in this sort of analysis.
Incidentally, the original two suspects on this map (Nawaf al-Hazmi and Khalid al-Mihdhar) were initially identified through bugs in the house of Sameer Mohammed Ahmed al-Hada (Khalid's brother-in-law) in Yemen, and then followed to a terrorist conference in Malaysia. Ahmed's house had been identified as an Al-Qaeda safe house in investigations of the Nairobi embassy bombing, and tracing calls made to it was also the method by which Osama bin Laden's sat phone was identified. It is claimed that this house continued to be a valuable source of intelligence until it was raided by Yemeni police in 2002 -- for non-payment of rent!
"For example, see the URL linked through my name."
Okay, I looked at that. And I see that has has decided what criteria to use after the fact.
And he's left out all the other names that would also be within "one step" of them using those same criteria.
I can easily get you the name of every "sleeper" terrorist currently living in the USofA.
The problem is, the list would include every non-terrorist currently living in the USofA also.
You're taking a pre-filtered list, constructed after the fact as "proof" that the process works.
It does not.
Schneier.com is a personal website. Opinions expressed are not necessarily those of Co3 Systems, Inc.