Entries Tagged "data mining"

Page 4 of 7

"Data Mining and the Security-Liberty Debate"

Good paper: “Data Mining and the Security-Liberty Debate,” by Daniel J. Solove.

Abstract: In this essay, written for a symposium on surveillance for the University of Chicago Law Review, I examine some common difficulties in the way that liberty is balanced against security in the context of data mining. Countless discussions about the trade-offs between security and liberty begin by taking a security proposal and then weighing it against what it would cost our civil liberties. Often, the liberty interests are cast as individual rights and balanced against the security interests, which are cast in terms of the safety of society as a whole. Courts and commentators defer to the government’s assertions about the effectiveness of the security interest. In the context of data mining, the liberty interest is limited by narrow understandings of privacy that neglect to account for many privacy problems. As a result, the balancing concludes with a victory in favor of the security interest. But as I argue, important dimensions of data mining’s security benefits require more scrutiny, and the privacy concerns are significantly greater than currently acknowledged. These problems have undermined the balancing process and skewed the results toward the security side of the scale.

My only complaint: it’s not a liberty vs. security debate. Liberty is security. It’s a liberty vs. control debate.

Posted on June 12, 2007 at 7:11 AMView Comments

Is Big Brother a Big Deal?

Big Brother isn’t what he used to be. George Orwell extrapolated his totalitarian state from the 1940s. Today’s information society looks nothing like Orwell’s world, and watching and intimidating a population today isn’t anything like what Winston Smith experienced.

Data collection in 1984 was deliberate; today’s is inadvertent. In the information society, we generate data naturally. In Orwell’s world, people were naturally anonymous; today, we leave digital footprints everywhere.

1984‘s police state was centralized; today’s is decentralized. Your phone company knows who you talk to, your credit card company knows where you shop and Netflix knows what you watch. Your ISP can read your email, your cell phone can track your movements and your supermarket can monitor your purchasing patterns. There’s no single government entity bringing this together, but there doesn’t have to be. As Neal Stephenson said, the threat is no longer Big Brother, but instead thousands of Little Brothers.

1984‘s Big Brother was run by the state; today’s Big Brother is market driven. Data brokers like ChoicePoint and credit bureaus like Experian aren’t trying to build a police state; they’re just trying to turn a profit. Of course these companies will take advantage of a national ID; they’d be stupid not to. And the correlations, data mining and precise categorizing they can do is why the U.S. government buys commercial data from them.

1984-style police states required lots of people. East Germany employed one informant for every 66 citizens. Today, there’s no reason to have anyone watch anyone else; computers can do the work of people.

1984-style police states were expensive. Today, data storage is constantly getting cheaper. If some data is too expensive to save today, it’ll be affordable in a few years.

And finally, the police state of 1984 was deliberately constructed, while today’s is naturally emergent. There’s no reason to postulate a malicious police force and a government trying to subvert our freedoms. Computerized processes naturally throw off personalized data; companies save it for marketing purposes, and even the most well-intentioned law enforcement agency will make use of it.

Of course, Orwell’s Big Brother had a ruthless efficiency that’s hard to imagine in a government today. But that completely misses the point. A sloppy and inefficient police state is no reason to cheer; watch the movie Brazil and see how scary it can be. You can also see hints of what it might look like in our completely dysfunctional “no-fly” list and useless projects to secretly categorize people according to potential terrorist risk. Police states are inherently inefficient. There’s no reason to assume today’s will be any more effective.

The fear isn’t an Orwellian government deliberately creating the ultimate totalitarian state, although with the U.S.’s programs of phone-record surveillance, illegal wiretapping, massive data mining, a national ID card no one wants and Patriot Act abuses, one can make that case. It’s that we’re doing it ourselves, as a natural byproduct of the information society.We’re building the computer infrastructure that makes it easy for governments, corporations, criminal organizations and even teenage hackers to record everything we do, and — yes — even change our votes. And we will continue to do so unless we pass laws regulating the creation, use, protection, resale and disposal of personal data. It’s precisely the attitude that trivializes the problem that creates it.

This essay appeared in the May issue of Information Security, as the second half of a point/counterpoint with Marcus Ranum. Here’s his half.

Posted on May 11, 2007 at 9:19 AMView Comments

NSA Hiring Data Miners

Certainly looks that way:

The Algorithm Developer will work with massive amounts of inter-related data and develop and implement algorithms to search, sort and find patterns and hidden relationships in the data. The preferred candidate would be required to be able to work closely with Analysts to develop Rapid Operational Prototypes. The candidate would have the availability of existing algorithms as a model to begin.

Posted on January 24, 2007 at 2:57 PMView Comments

DHS Privacy Office Report on MATRIX

The Privacy Office of the Department of Homeland Security has issued a report on MATRIX: The Multistate Anti-Terrorism Information Exchange. MATRIX is a now-defunct data mining and data sharing program among federal, state, and local law enforcement agencies, one of the many data-mining programs going on in government (TIA — Total Information Awareness — being the most famous, and Tangram being the newest).

The report is short, and very critical of the program’s inattention to privacy and lack of transparency. That’s probably why it was released to the public just before Christmas, burying it in the media.

Posted on January 3, 2007 at 11:58 AMView Comments

CATO Report on Data Mining and Terrorism

Definitely worth reading:

Though data mining has many valuable uses, it is not well suited to the terrorist discovery problem. It would be unfortunate if data mining for terrorism discovery had currency within national security, law enforcement, and technology circles because pursuing this use of data mining would waste taxpayer dollars, needlessly infringe on privacy and civil liberties, and misdirect the valuable time and energy of the men and women in the national security community.

Posted on December 13, 2006 at 1:38 PMView Comments

New U.S. Customs Database on Trucks and Travellers

It’s yet another massive government surveillance program:

US Customs and Border Protection issued a notice in the Federal Register yesterday which detailed the agency’s massive database that keeps risk assessments on every traveler entering or leaving the country. Citizens who are concerned that their information is inaccurate are all but out of luck: the system “may not be accessed under the Privacy Act for the purpose of contesting the content of the record.”

The system in question is the Automated Targeting System, which is associated with the previously-existing Treasury Enforcement Communications System. TECS was built to screen people and assets that moved in and out of the US, and its database contains more than one billion records that are accessible by more than 30,000 users at 1,800 sites around the country. Customs has adapted parts of the TECS system to its own use and now plans to screen all passengers, inbound and outbound cargo, and ships.

The system creates a risk assessment for each person or item in the database. The assessment is generated from information gleaned from federal and commercial databases, provided by people themselves as they cross the border, and the Passenger Name Record information recorded by airlines. This risk assessment will be maintained for up to 40 years and can be pulled up by agents at a moment’s notice in order to evaluate potential threats against the US.

If you leave the country, the government will suddenly know a lot about you. The Passenger Name Record alone contains names, addresses, telephone numbers, itineraries, frequent-flier information, e-mail addresses — even the name of your travel agent. And this information can be shared with plenty of people:

  • Federal, state, local, tribal, or foreign governments
  • A court, magistrate, or administrative tribunal
  • Third parties during the course of a law enforcement investigation
  • Congressional office in response to an inquiry
  • Contractors, grantees, experts, consultants, students, and others performing or working on a contract, service, or grant
  • Any organization or person who might be a target of terrorist activity or conspiracy
  • The United States Department of Justice
  • The National Archives and Records Administration
  • Federal or foreign government intelligence or counterterrorism agencies
  • Agencies or people when it appears that the security or confidentiality of their information has been compromised.

That’s a lot of people who could be looking at your information and your government-designed risk assessment. The one person who won’t be looking at that information is you. The entire system is exempt from inspection and correction under provision 552a (j)(2) and (k)(2) of US Code Title 5, which allows such exemptions when the data in question involves law enforcement or intelligence information.

This means you can’t review your data for accuracy, and you can’t correct any errors.

But the system can be used to give you a risk assessment score, which presumably will affect how you’re treated when you return to the U.S.

I’ve already explained why data mining does not find terrorists or terrorist plots. So have actual math professors. And we’ve seen this kind of “risk assessment score” idea and the problems it causes with Secure Flight.

This needs some mainstream press attention.

EDITED TO ADD (11/4): More commentary here, here, and here.

EDITED TO ADD (11/5): It’s buried in the back pages, but at least The Washington Post wrote about it.

Posted on November 4, 2006 at 9:19 AMView Comments

Total Information Awareness Is Back

Remember Total Information Awareness?

In November 2002, the New York Times reported that the Defense Advanced Research Projects Agency (DARPA) was developing a tracking system called “Total Information Awareness” (TIA), which was intended to detect terrorists through analyzing troves of information. The system, developed under the direction of John Poindexter, then-director of DARPA’s Information Awareness Office, was envisioned to give law enforcement access to private data without suspicion of wrongdoing or a warrant.

TIA purported to capture the “information signature” of people so that the government could track potential terrorists and criminals involved in “low-intensity/low-density” forms of warfare and crime. The goal was to track individuals through collecting as much information about them as possible and using computer algorithms and human analysis to detect potential activity.

The project called for the development of “revolutionary technology for ultra-large all-source information repositories,” which would contain information from multiple sources to create a “virtual, centralized, grand database.” This database would be populated by transaction data contained in current databases such as financial records, medical records, communication records, and travel records as well as new sources of information. Also fed into the database would be intelligence data.

The public found it so abhorrent, and objected so forcefully, that Congress killed funding for the program in September 2003.

None of us thought that meant the end of TIA, only that it would turn into a classified program and be renamed. Well, the program is now called Tangram, and it is classified:

The government’s top intelligence agency is building a computerized system to search very large stores of information for patterns of activity that look like terrorist planning. The system, which is run by the Office of the Director of National Intelligence, is in the early research phases and is being tested, in part, with government intelligence that may contain information on U.S. citizens and other people inside the country.

It encompasses existing profiling and detection systems, including those that create “suspicion scores” for suspected terrorists by analyzing very large databases of government intelligence, as well as records of individuals’ private communications, financial transactions, and other everyday activities.

The information about Tangram comes from a government document looking for contractors to help design and build the system.

DefenseTech writes:

The document, which is a description of the Tangram program for potential contractors, describes other, existing profiling and detection systems that haven’t moved beyond so-called “guilt-by-association models,” which link suspected terrorists to potential associates, but apparently don’t tell analysts much about why those links are significant. Tangram wants to improve upon these methods, as well as investigate the effectiveness of other detection links such as “collective inferencing,” which attempt to create suspicion scores of entire networks of people simultaneously.

Data mining for terrorists has always been a dumb idea. And the existence of Tangram illustrates the problem with Congress trying to stop a program by killing its funding; it just comes back under a different name.

Posted on October 31, 2006 at 6:59 AMView Comments

AOL Releases Massive Amount of Search Data

From TechCrunch:

AOL has released very private data about its users without their permission. While the AOL username has been changed to a random ID number, the ability to analyze all searches by a single user will often lead people to easily determine who the user is, and what they are up to. The data includes personal names, addresses, social security numbers and everything else someone might type into a search box.

The most serious problem is the fact that many people often search on their own name, or those of their friends and family, to see what information is available about them on the net. Combine these ego searches with porn queries and you have a serious embarrassment. Combine them with “buy ecstasy” and you have evidence of a crime. Combine it with an address, social security number, etc., and you have an identity theft waiting to happen. The possibilities are endless.

This is search data for roughly 658,000 anonymized users over a three month period from March to May — about 1/3 of 1 per cent of their total data for that period.

Now AOL says it was all a mistake. They pulled the data, but it’s still still out there — and probably will be forever. And there’s some pretty scary stuff in it.

You can read more on Slashdot and elsewhere.

Anyone who wants to play NSA can start datamining for terrorists. Let us know if you find anything.

EDITED TO ADD (8/9): The New York Times:

And search by search, click by click, the identity of AOL user No. 4417749 became easier to discern. There are queries for “landscapers in Lilburn, Ga,” several people with the last name Arnold and “homes sold in shadow lake subdivision gwinnett county georgia.”

It did not take much investigating to follow that data trail to Thelma Arnold, a 62-year-old widow who lives in Lilburn, Ga., frequently researches her friends’ medical ailments and loves her three dogs. “Those are my searches,” she said, after a reporter read part of the list to her.

Posted on August 8, 2006 at 11:02 AMView Comments

Sidebar photo of Bruce Schneier by Joe MacInnis.