President Obama and the Intelligence Community
Really good article from the New Yorker.
Page 1 of 1
Really good article from the New Yorker.
Two new stories based on the Snowden documents.
This is getting silly. General Alexander just lied about this to Congress last week. The old NSA tactic of hiding behind a shell game of different code names is failing. It used to be they could get away with saying “Project X doesn’t do that,” knowing full well that Projects Y and Z did and that no one would call them on it. Now they’re just looking shiftier and shiftier.
The program the New York Times exposed is basically Total Information Awareness, which Congress defunded in 2003 because it was just too damned creepy. Now it’s back. (Actually, it never really went away. It just changed code names.)
I’m also curious how all those PRISM-era denials from Internet companies about the NSA not having “direct access” to their servers jibes with this paragraph:
The overall volume of metadata collected by the N.S.A. is reflected in the agency’s secret 2013 budget request to Congress. The budget document, disclosed by Mr. Snowden, shows that the agency is pouring money and manpower into creating a metadata repository capable of taking in 20 billion “record events” daily and making them available to N.S.A. analysts within 60 minutes.
Honestly, I think the details matter less and less. We have to assume that the NSA has everyone who uses electronic communications under constant surveillance. New details about hows and whys will continue to emerge—for example, now we know the NSA’s repository contains travel data—but the big picture will remain the same.
Related: I’ve said that it seems that the NSA now has a PR firm advising it on response. It’s trying to teach General Alexander how to better respond to questioning.
Also related: A cute flowchart on how to avoid NSA surveillance.
This Wall Street Journal investigative piece is a month old, but well worth reading. Basically, the Total Information Awareness program is back with a different name:
The rules now allow the little-known National Counterterrorism Center to examine the government files of U.S. citizens for possible criminal behavior, even if there is no reason to suspect them. That is a departure from past practice, which barred the agency from storing information about ordinary Americans unless a person was a terror suspect or related to an investigation.
Now, NCTC can copy entire government databases—flight records, casino-employee lists, the names of Americans hosting foreign-exchange students and many others. The agency has new authority to keep data about innocent U.S. citizens for up to five years, and to analyze it for suspicious patterns of behavior. Previously, both were prohibited. Data about Americans “reasonably believed to constitute terrorism information” may be permanently retained.
Note that this is government data only, not commercial data. So while it includes “almost any government database, from financial forms submitted by people seeking federally backed mortgages to the health records of people who sought treatment at Veterans Administration hospitals” as well lots of commercial data, it’s data the corporations have already given to the government. It doesn’t include, for example, your detailed cell phone bills or your tweets.
See also this supplementary blog post to the article.
Details are here. What’s troubling to me is that even though Congress pulled funding for the program, it was developed elsewhere and now may be sold back to the U.S.
If you’ve traveled abroad recently, you’ve been investigated. You’ve been assigned a score indicating what kind of terrorist threat you pose. That score is used by the government to determine the treatment you receive when you return to the U.S. and for other purposes as well.
Curious about your score? You can’t see it. Interested in what information was used? You can’t know that. Want to clear your name if you’ve been wrongly categorized? You can’t challenge it. Want to know what kind of rules the computer is using to judge you? That’s secret, too. So is when and how the score will be used.
U.S. customs agencies have been quietly operating this system for several years. Called Automated Targeting System, it assigns a “risk assessment” score to people entering or leaving the country, or engaging in import or export activity. This score, and the information used to derive it, can be shared with federal, state, local and even foreign governments. It can be used if you apply for a government job, grant, license, contract or other benefit. It can be shared with nongovernmental organizations and individuals in the course of an investigation. In some circumstances private contractors can get it, even those outside the country. And it will be saved for 40 years.
Little is known about this program. Its bare outlines were disclosed in the Federal Register in October. We do know that the score is partially based on details of your flight record—where you’re from, how you bought your ticket, where you’re sitting, any special meal requests—or on motor vehicle records, as well as on information from crime, watch-list and other databases.
Civil liberties groups have called the program Kafkaesque. But I have an even bigger problem with it. It’s a waste of money.
The idea of feeding a limited set of characteristics into a computer, which then somehow divines a person’s terrorist leanings, is farcical. Uncovering terrorist plots requires intelligence and investigation, not large-scale processing of everyone.
Additionally, any system like this will generate so many false alarms as to be completely unusable. In 2005 Customs & Border Protection processed 431 million people. Assuming an unrealistic model that identifies terrorists (and innocents) with 99.9% accuracy, that’s still 431,000 false alarms annually.
The number of false alarms will be much higher than that. The no-fly list is filled with inaccuracies; we’ve all read about innocent people named David Nelson who can’t fly without hours-long harassment. Airline data, too, are riddled with errors.
The odds of this program’s being implemented securely, with adequate privacy protections, are not good. Last year I participated in a government working group to assess the security and privacy of a similar program developed by the Transportation Security Administration, called Secure Flight. After five years and $100 million spent, the program still can’t achieve the simple task of matching airline passengers against terrorist watch lists.
In 2002 we learned about yet another program, called Total Information Awareness, for which the government would collect information on every American and assign him or her a terrorist risk score. Congress found the idea so abhorrent that it halted funding for the program. Two years ago, and again this year, Secure Flight was also banned by Congress until it could pass a series of tests for accuracy and privacy protection.
In fact, the Automated Targeting System is arguably illegal, as well (a point several congressmen made recently); all recent Department of Homeland Security appropriations bills specifically prohibit the department from using profiling systems against persons not on a watch list.
There is something un-American about a government program that uses secret criteria to collect dossiers on innocent people and shares that information with various agencies, all without any oversight. It’s the sort of thing you’d expect from the former Soviet Union or East Germany or China. And it doesn’t make us any safer from terrorism.
This essay, without the links, was published in Forbes. They also published a rebuttal by William Baldwin, although it doesn’t seen to rebut any of the actual points.
Here’s an odd division of labor: a corporate data consultant argues for more openness, while a journalist favors more secrecy.
It’s only odd if you don’t understand security.
Remember Total Information Awareness?
In November 2002, the New York Times reported that the Defense Advanced Research Projects Agency (DARPA) was developing a tracking system called “Total Information Awareness” (TIA), which was intended to detect terrorists through analyzing troves of information. The system, developed under the direction of John Poindexter, then-director of DARPA’s Information Awareness Office, was envisioned to give law enforcement access to private data without suspicion of wrongdoing or a warrant.
TIA purported to capture the “information signature” of people so that the government could track potential terrorists and criminals involved in “low-intensity/low-density” forms of warfare and crime. The goal was to track individuals through collecting as much information about them as possible and using computer algorithms and human analysis to detect potential activity.
The project called for the development of “revolutionary technology for ultra-large all-source information repositories,” which would contain information from multiple sources to create a “virtual, centralized, grand database.” This database would be populated by transaction data contained in current databases such as financial records, medical records, communication records, and travel records as well as new sources of information. Also fed into the database would be intelligence data.
The public found it so abhorrent, and objected so forcefully, that Congress killed funding for the program in September 2003.
None of us thought that meant the end of TIA, only that it would turn into a classified program and be renamed. Well, the program is now called Tangram, and it is classified:
The government’s top intelligence agency is building a computerized system to search very large stores of information for patterns of activity that look like terrorist planning. The system, which is run by the Office of the Director of National Intelligence, is in the early research phases and is being tested, in part, with government intelligence that may contain information on U.S. citizens and other people inside the country.
It encompasses existing profiling and detection systems, including those that create “suspicion scores” for suspected terrorists by analyzing very large databases of government intelligence, as well as records of individuals’ private communications, financial transactions, and other everyday activities.
The information about Tangram comes from a government document looking for contractors to help design and build the system.
DefenseTech writes:
The document, which is a description of the Tangram program for potential contractors, describes other, existing profiling and detection systems that haven’t moved beyond so-called “guilt-by-association models,” which link suspected terrorists to potential associates, but apparently don’t tell analysts much about why those links are significant. Tangram wants to improve upon these methods, as well as investigate the effectiveness of other detection links such as “collective inferencing,” which attempt to create suspicion scores of entire networks of people simultaneously.
Data mining for terrorists has always been a dumb idea. And the existence of Tangram illustrates the problem with Congress trying to stop a program by killing its funding; it just comes back under a different name.
Technology Review has an interesting article discussing some of the technologies used by the NSA in its warrantless wiretapping program, some of them from the killed Total Information Awareness (TIA) program.
Washington’s lawmakers ostensibly killed the TIA project in Section 8131 of the Department of Defense Appropriations Act for fiscal 2004. But legislators wrote a classified annex to that document which preserved funding for TIA’s component technologies, if they were transferred to other government agencies, say sources who have seen the document, according to reports first published in The National Journal. Congress did stipulate that those technologies should only be used for military or foreign intelligence purposes against non-U.S. citizens. Still, while those component projects’ names were changed, their funding remained intact, sometimes under the same contracts.
Thus, two principal components of the overall TIA project have migrated to the Advanced Research and Development Activity (ARDA), which is housed somewhere among the 60-odd buildings of “Crypto City,” as NSA headquarters in Fort Meade, MD, is nicknamed. One of the TIA components that ARDA acquired, the Information Awareness Prototype System, was the core architecture that would have integrated all the information extraction, analysis, and dissemination tools developed under TIA. According to The National Journal, it was renamed “Basketball.” The other, Genoa II, used information technologies to help analysts and decision makers anticipate and pre-empt terrorist attacks. It was renamed “Topsail.”
In the post 9/11 world, there’s much focus on connecting the dots. Many believe that data mining is the crystal ball that will enable us to uncover future terrorist plots. But even in the most wildly optimistic projections, data mining isn’t tenable for that purpose. We’re not trading privacy for security; we’re giving up privacy and getting no security in return.
Most people first learned about data mining in November 2002, when news broke about a massive government data mining program called Total Information Awareness. The basic idea was as audacious as it was repellent: suck up as much data as possible about everyone, sift through it with massive computers, and investigate patterns that might indicate terrorist plots. Americans across the political spectrum denounced the program, and in September 2003, Congress eliminated its funding and closed its offices.
But TIA didn’t die. According to The National Journal, it just changed its name and moved inside the Defense Department.
This shouldn’t be a surprise. In May 2004, the General Accounting Office published a report that listed 122 different federal government data mining programs that used people’s personal information. This list didn’t include classified programs, like the NSA’s eavesdropping effort, or state-run programs like MATRIX.
The promise of data mining is compelling, and convinces many. But it’s wrong. We’re not going to find terrorist plots through systems like this, and we’re going to waste valuable resources chasing down false alarms. To understand why, we have to look at the economics of the system.
Security is always a trade-off, and for a system to be worthwhile, the advantages have to be greater than the disadvantages. A national security data mining program is going to find some percentage of real attacks, and some percentage of false alarms. If the benefits of finding and stopping those attacks outweigh the cost—in money, liberties, etc.—then the system is a good one. If not, then you’d be better off spending that cost elsewhere.
Data mining works best when there’s a well-defined profile you’re searching for, a reasonable number of attacks per year, and a low cost of false alarms. Credit card fraud is one of data mining’s success stories: all credit card companies data mine their transaction databases, looking for spending patterns that indicate a stolen card. Many credit card thieves share a pattern—purchase expensive luxury goods, purchase things that can be easily fenced, etc.—and data mining systems can minimize the losses in many cases by shutting down the card. In addition, the cost of false alarms is only a phone call to the cardholder asking him to verify a couple of purchases. The cardholders don’t even resent these phone calls—as long as they’re infrequent—so the cost is just a few minutes of operator time.
Terrorist plots are different. There is no well-defined profile, and attacks are very rare. Taken together, these facts mean that data mining systems won’t uncover any terrorist plots until they are very accurate, and that even very accurate systems will be so flooded with false alarms that they will be useless.
All data mining systems fail in two different ways: false positives and false negatives. A false positive is when the system identifies a terrorist plot that really isn’t one. A false negative is when the system misses an actual terrorist plot. Depending on how you “tune” your detection algorithms, you can err on one side or the other: you can increase the number of false positives to ensure that you are less likely to miss an actual terrorist plot, or you can reduce the number of false positives at the expense of missing terrorist plots.
To reduce both those numbers, you need a well-defined profile. And that’s a problem when it comes to terrorism. In hindsight, it was really easy to connect the 9/11 dots and point to the warning signs, but it’s much harder before the fact. Certainly, there are common warning signs that many terrorist plots share, but each is unique, as well. The better you can define what you’re looking for, the better your results will be. Data mining for terrorist plots is going to be sloppy, and it’s going to be hard to find anything useful.
Data mining is like searching for a needle in a haystack. There are 900 million credit cards in circulation in the United States. According to the FTC September 2003 Identity Theft Survey Report, about 1% (10 million) cards are stolen and fraudulently used each year. Terrorism is different. There are trillions of connections between people and events—things that the data mining system will have to “look at”—and very few plots. This rarity makes even accurate identification systems useless.
Let’s look at some numbers. We’ll be optimistic. We’ll assume the system has a 1 in 100 false positive rate (99% accurate), and a 1 in 1,000 false negative rate (99.9% accurate).
Assume one trillion possible indicators to sift through: that’s about ten events—e-mails, phone calls, purchases, web surfings, whatever—per person in the U.S. per day. Also assume that 10 of them are actually terrorists plotting.
This unrealistically-accurate system will generate one billion false alarms for every real terrorist plot it uncovers. Every day of every year, the police will have to investigate 27 million potential plots in order to find the one real terrorist plot per month. Raise that false-positive accuracy to an absurd 99.9999% and you’re still chasing 2,750 false alarms per day—but that will inevitably raise your false negatives, and you’re going to miss some of those ten real plots.
This isn’t anything new. In statistics, it’s called the “base rate fallacy,” and it applies in other domains as well. For example, even highly accurate medical tests are useless as diagnostic tools if the incidence of the disease is rare in the general population. Terrorist attacks are also rare, any “test” is going to result in an endless stream of false alarms.
This is exactly the sort of thing we saw with the NSA’s eavesdropping program: the New York Times reported that the computers spat out thousands of tips per month. Every one of them turned out to be a false alarm.
And the cost was enormous: not just the cost of the FBI agents running around chasing dead-end leads instead of doing things that might actually make us safer, but also the cost in civil liberties. The fundamental freedoms that make our country the envy of the world are valuable, and not something that we should throw away lightly.
Data mining can work. It helps Visa keep the costs of fraud down, just as it helps Amazon.com show me books that I might want to buy, and Google show me advertising I’m more likely to be interested in. But these are all instances where the cost of false positives is low—a phone call from a Visa operator, or an uninteresting ad—and in systems that have value even if there is a high number of false negatives.
Finding terrorism plots is not a problem that lends itself to data mining. It’s a needle-in-a-haystack problem, and throwing more hay on the pile doesn’t make that problem any easier. We’d be far better off putting people in charge of investigating potential plots and letting them direct the computers, instead of putting the computers in charge and letting them decide who should be investigated.
This essay originally appeared on Wired.com.
Sidebar photo of Bruce Schneier by Joe MacInnis.