Entries Tagged "NSA"

Page 51 of 53

The Problems with Data Mining

Great op-ed in The New York Times on why the NSA’s data mining efforts won’t work, by Jonathan Farley, math professor at Harvard.

The simplest reason is that we’re all connected. Not in the Haight-Ashbury/Timothy Leary/late-period Beatles kind of way, but in the sense of the Kevin Bacon game. The sociologist Stanley Milgram made this clear in the 1960’s when he took pairs of people unknown to each other, separated by a continent, and asked one of the pair to send a package to the other — but only by passing the package to a person he knew, who could then send the package only to someone he knew, and so on. On average, it took only six mailings — the famous six degrees of separation — for the package to reach its intended destination.

Looked at this way, President Bush is only a few steps away from Osama bin Laden (in the 1970’s he ran a company partly financed by the American representative for one of the Qaeda leader’s brothers). And terrorist hermits like the Unabomber are connected to only a very few people. So much for finding the guilty by association.

A second problem with the spy agency’s apparent methodology lies in the way terrorist groups operate and what scientists call the “strength of weak ties.” As the military scientist Robert Spulak has described it to me, you might not see your college roommate for 10 years, but if he were to call you up and ask to stay in your apartment, you’d let him. This is the principle under which sleeper cells operate: there is no communication for years. Thus for the most dangerous threats, the links between nodes that the agency is looking for simply might not exist.

(This, by him, is also worth reading.)

Posted on May 24, 2006 at 7:44 AMView Comments

Computer Problems at the NSA

Interesting:

Computers are integral to everything NSA does, yet it is not uncommon for the agency’s unstable computer system to freeze for hours, unlike the previous system, which had a backup mechanism that enabled analysts to continue their work, said Matthew Aid, a former NSA analyst and congressional intelligence staff member.

When the agency’s communications lines become overloaded, the Groundbreaker system has been known to deliver garbled intelligence reports, Aid said. Some analysts and managers have said their productivity is half of what it used to be because the new system requires them to perform many more steps to accomplish what a few keystrokes used to, he said. They also report being locked out of their computers without warning.

Similarly, agency linguists say the number of conversation segments they can translate in a day has dropped significantly under Groundbreaker, according to another former NSA employee.

Under Groundbreaker, employees get new computers every three years on a rotating schedule, so some analysts always have computers as much as three years older than their colleagues’, often with incompatible software, the former employee said.

As a result of compatibility problems, e-mail attachments can get lost in the system. An internal incident report, obtained by The Sun, states that when an employee inquired about what had happened to missing attachments, the Eagle Alliance administrator said only that “they must have fallen out.”

Posted on May 11, 2006 at 7:31 AMView Comments

NSA Warrantless Wiretapping and Total Information Awareness

Technology Review has an interesting article discussing some of the technologies used by the NSA in its warrantless wiretapping program, some of them from the killed Total Information Awareness (TIA) program.

Washington’s lawmakers ostensibly killed the TIA project in Section 8131 of the Department of Defense Appropriations Act for fiscal 2004. But legislators wrote a classified annex to that document which preserved funding for TIA’s component technologies, if they were transferred to other government agencies, say sources who have seen the document, according to reports first published in The National Journal. Congress did stipulate that those technologies should only be used for military or foreign intelligence purposes against non-U.S. citizens. Still, while those component projects’ names were changed, their funding remained intact, sometimes under the same contracts.

Thus, two principal components of the overall TIA project have migrated to the Advanced Research and Development Activity (ARDA), which is housed somewhere among the 60-odd buildings of “Crypto City,” as NSA headquarters in Fort Meade, MD, is nicknamed. One of the TIA components that ARDA acquired, the Information Awareness Prototype System, was the core architecture that would have integrated all the information extraction, analysis, and dissemination tools developed under TIA. According to The National Journal, it was renamed “Basketball.” The other, Genoa II, used information technologies to help analysts and decision makers anticipate and pre-empt terrorist attacks. It was renamed “Topsail.”

Posted on April 28, 2006 at 8:01 AMView Comments

AT&T Assisting NSA Surveillance

Interesting details emerging from EFF’s lawsuit:

According to a statement released by Klein’s attorney, an NSA agent showed up at the San Francisco switching center in 2002 to interview a management-level technician for a special job. In January 2003, Klein observed a new room being built adjacent to the room housing AT&T’s #4ESS switching equipment, which is responsible for routing long distance and international calls.

“I learned that the person whom the NSA interviewed for the secret job was the person working to install equipment in this room,” Klein wrote. “The regular technician work force was not allowed in the room.”

Klein’s job eventually included connecting internet circuits to a splitting cabinet that led to the secret room. During the course of that work, he learned from a co-worker that similar cabinets were being installed in other cities, including Seattle, San Jose, Los Angeles and San Diego.

“While doing my job, I learned that fiber optic cables from the secret room were tapping into the Worldnet (AT&T’s internet service) circuits by splitting off a portion of the light signal,” Klein wrote.

The split circuits included traffic from peering links connecting to other internet backbone providers, meaning that AT&T was also diverting traffic routed from its network to or from other domestic and international providers, according to Klein’s statement.

The secret room also included data-mining equipment called a Narus STA 6400, “known to be used particularly by government intelligence agencies because of its ability to sift through large amounts of data looking for preprogrammed targets,” according to Klein’s statement.

Narus, whose website touts AT&T as a client, sells software to help internet service providers and telecoms monitor and manage their networks, look for intrusions, and wiretap phone calls as mandated by federal law.

More about what the Narus box can do.

EDITED TO ADD (4/14): More about Narus.

Posted on April 14, 2006 at 7:58 AMView Comments

Data Mining for Terrorists

In the post 9/11 world, there’s much focus on connecting the dots. Many believe that data mining is the crystal ball that will enable us to uncover future terrorist plots. But even in the most wildly optimistic projections, data mining isn’t tenable for that purpose. We’re not trading privacy for security; we’re giving up privacy and getting no security in return.

Most people first learned about data mining in November 2002, when news broke about a massive government data mining program called Total Information Awareness. The basic idea was as audacious as it was repellent: suck up as much data as possible about everyone, sift through it with massive computers, and investigate patterns that might indicate terrorist plots. Americans across the political spectrum denounced the program, and in September 2003, Congress eliminated its funding and closed its offices.

But TIA didn’t die. According to The National Journal, it just changed its name and moved inside the Defense Department.

This shouldn’t be a surprise. In May 2004, the General Accounting Office published a report that listed 122 different federal government data mining programs that used people’s personal information. This list didn’t include classified programs, like the NSA’s eavesdropping effort, or state-run programs like MATRIX.

The promise of data mining is compelling, and convinces many. But it’s wrong. We’re not going to find terrorist plots through systems like this, and we’re going to waste valuable resources chasing down false alarms. To understand why, we have to look at the economics of the system.

Security is always a trade-off, and for a system to be worthwhile, the advantages have to be greater than the disadvantages. A national security data mining program is going to find some percentage of real attacks, and some percentage of false alarms. If the benefits of finding and stopping those attacks outweigh the cost — in money, liberties, etc. — then the system is a good one. If not, then you’d be better off spending that cost elsewhere.

Data mining works best when there’s a well-defined profile you’re searching for, a reasonable number of attacks per year, and a low cost of false alarms. Credit card fraud is one of data mining’s success stories: all credit card companies data mine their transaction databases, looking for spending patterns that indicate a stolen card. Many credit card thieves share a pattern — purchase expensive luxury goods, purchase things that can be easily fenced, etc. — and data mining systems can minimize the losses in many cases by shutting down the card. In addition, the cost of false alarms is only a phone call to the cardholder asking him to verify a couple of purchases. The cardholders don’t even resent these phone calls — as long as they’re infrequent — so the cost is just a few minutes of operator time.

Terrorist plots are different. There is no well-defined profile, and attacks are very rare. Taken together, these facts mean that data mining systems won’t uncover any terrorist plots until they are very accurate, and that even very accurate systems will be so flooded with false alarms that they will be useless.

All data mining systems fail in two different ways: false positives and false negatives. A false positive is when the system identifies a terrorist plot that really isn’t one. A false negative is when the system misses an actual terrorist plot. Depending on how you “tune” your detection algorithms, you can err on one side or the other: you can increase the number of false positives to ensure that you are less likely to miss an actual terrorist plot, or you can reduce the number of false positives at the expense of missing terrorist plots.

To reduce both those numbers, you need a well-defined profile. And that’s a problem when it comes to terrorism. In hindsight, it was really easy to connect the 9/11 dots and point to the warning signs, but it’s much harder before the fact. Certainly, there are common warning signs that many terrorist plots share, but each is unique, as well. The better you can define what you’re looking for, the better your results will be. Data mining for terrorist plots is going to be sloppy, and it’s going to be hard to find anything useful.

Data mining is like searching for a needle in a haystack. There are 900 million credit cards in circulation in the United States. According to the FTC September 2003 Identity Theft Survey Report, about 1% (10 million) cards are stolen and fraudulently used each year. Terrorism is different. There are trillions of connections between people and events — things that the data mining system will have to “look at” — and very few plots. This rarity makes even accurate identification systems useless.

Let’s look at some numbers. We’ll be optimistic. We’ll assume the system has a 1 in 100 false positive rate (99% accurate), and a 1 in 1,000 false negative rate (99.9% accurate).

Assume one trillion possible indicators to sift through: that’s about ten events — e-mails, phone calls, purchases, web surfings, whatever — per person in the U.S. per day. Also assume that 10 of them are actually terrorists plotting.

This unrealistically-accurate system will generate one billion false alarms for every real terrorist plot it uncovers. Every day of every year, the police will have to investigate 27 million potential plots in order to find the one real terrorist plot per month. Raise that false-positive accuracy to an absurd 99.9999% and you’re still chasing 2,750 false alarms per day — but that will inevitably raise your false negatives, and you’re going to miss some of those ten real plots.

This isn’t anything new. In statistics, it’s called the “base rate fallacy,” and it applies in other domains as well. For example, even highly accurate medical tests are useless as diagnostic tools if the incidence of the disease is rare in the general population. Terrorist attacks are also rare, any “test” is going to result in an endless stream of false alarms.

This is exactly the sort of thing we saw with the NSA’s eavesdropping program: the New York Times reported that the computers spat out thousands of tips per month. Every one of them turned out to be a false alarm.

And the cost was enormous: not just the cost of the FBI agents running around chasing dead-end leads instead of doing things that might actually make us safer, but also the cost in civil liberties. The fundamental freedoms that make our country the envy of the world are valuable, and not something that we should throw away lightly.

Data mining can work. It helps Visa keep the costs of fraud down, just as it helps Amazon.com show me books that I might want to buy, and Google show me advertising I’m more likely to be interested in. But these are all instances where the cost of false positives is low — a phone call from a Visa operator, or an uninteresting ad — and in systems that have value even if there is a high number of false negatives.

Finding terrorism plots is not a problem that lends itself to data mining. It’s a needle-in-a-haystack problem, and throwing more hay on the pile doesn’t make that problem any easier. We’d be far better off putting people in charge of investigating potential plots and letting them direct the computers, instead of putting the computers in charge and letting them decide who should be investigated.

This essay originally appeared on Wired.com.

Posted on March 9, 2006 at 7:44 AMView Comments

The Future of Privacy

Over the past 20 years, there’s been a sea change in the battle for personal privacy.

The pervasiveness of computers has resulted in the almost constant surveillance of everyone, with profound implications for our society and our freedoms. Corporations and the police are both using this new trove of surveillance data. We as a society need to understand the technological trends and discuss their implications. If we ignore the problem and leave it to the “market,” we’ll all find that we have almost no privacy left.

Most people think of surveillance in terms of police procedure: Follow that car, watch that person, listen in on his phone conversations. This kind of surveillance still occurs. But today’s surveillance is more like the NSA’s model, recently turned against Americans: Eavesdrop on every phone call, listening for certain keywords. It’s still surveillance, but it’s wholesale surveillance.

Wholesale surveillance is a whole new world. It’s not “follow that car,” it’s “follow every car.” The National Security Agency can eavesdrop on every phone call, looking for patterns of communication or keywords that might indicate a conversation between terrorists. Many airports collect the license plates of every car in their parking lots, and can use that database to locate suspicious or abandoned cars. Several cities have stationary or car-mounted license-plate scanners that keep records of every car that passes, and save that data for later analysis.

More and more, we leave a trail of electronic footprints as we go through our daily lives. We used to walk into a bookstore, browse, and buy a book with cash. Now we visit Amazon, and all of our browsing and purchases are recorded. We used to throw a quarter in a toll booth; now EZ Pass records the date and time our car passed through the booth. Data about us are collected when we make a phone call, send an e-mail message, make a purchase with our credit card, or visit a website.

Much has been written about RFID chips and how they can be used to track people. People can also be tracked by their cell phones, their Bluetooth devices, and their WiFi-enabled computers. In some cities, video cameras capture our image hundreds of times a day.

The common thread here is computers. Computers are involved more and more in our transactions, and data are byproducts of these transactions. As computer memory becomes cheaper, more and more of these electronic footprints are being saved. And as processing becomes cheaper, more and more of it is being cross-indexed and correlated, and then used for secondary purposes.

Information about us has value. It has value to the police, but it also has value to corporations. The Justice Department wants details of Google searches, so they can look for patterns that might help find child pornographers. Google uses that same data so it can deliver context-sensitive advertising messages. The city of Baltimore uses aerial photography to surveil every house, looking for building permit violations. A national lawn-care company uses the same data to better market its services. The phone company keeps detailed call records for billing purposes; the police use them to catch bad guys.

In the dot-com bust, the customer database was often the only salable asset a company had. Companies like Experian and Acxiom are in the business of buying and reselling this sort of data, and their customers are both corporate and government.

Computers are getting smaller and cheaper every year, and these trends will continue. Here’s just one example of the digital footprints we leave:

It would take about 100 megabytes of storage to record everything the fastest typist input to his computer in a year. That’s a single flash memory chip today, and one could imagine computer manufacturers offering this as a reliability feature. Recording everything the average user does on the Internet requires more memory: 4 to 8 gigabytes a year. That’s a lot, but “record everything” is Gmail’s model, and it’s probably only a few years before ISPs offer this service.

The typical person uses 500 cell phone minutes a month; that translates to 5 gigabytes a year to save it all. My iPod can store 12 times that data. A “life recorder” you can wear on your lapel that constantly records is still a few generations off: 200 gigabytes/year for audio and 700 gigabytes/year for video. It’ll be sold as a security device, so that no one can attack you without being recorded. When that happens, will not wearing a life recorder be used as evidence that someone is up to no good, just as prosecutors today use the fact that someone left his cell phone at home as evidence that he didn’t want to be tracked?

In a sense, we’re living in a unique time in history. Identification checks are common, but they still require us to whip out our ID. Soon it’ll happen automatically, either through an RFID chip in our wallet or face-recognition from cameras. And those cameras, now visible, will shrink to the point where we won’t even see them.

We’re never going to stop the march of technology, but we can enact legislation to protect our privacy: comprehensive laws regulating what can be done with personal information about us, and more privacy protection from the police. Today, personal information about you is not yours; it’s owned by the collector. There are laws protecting specific pieces of personal data — videotape rental records, health care information — but nothing like the broad privacy protection laws you find in European countries. That’s really the only solution; leaving the market to sort this out will result in even more invasive wholesale surveillance.

Most of us are happy to give out personal information in exchange for specific services. What we object to is the surreptitious collection of personal information, and the secondary use of information once it’s collected: the buying and selling of our information behind our back.

In some ways, this tidal wave of data is the pollution problem of the information age. All information processes produce it. If we ignore the problem, it will stay around forever. And the only way to successfully deal with it is to pass laws regulating its generation, use and eventual disposal.

This essay was originally published in the Minneapolis Star-Tribune.

Posted on March 6, 2006 at 5:41 AMView Comments

Sidebar photo of Bruce Schneier by Joe MacInnis.