Entries Tagged "databases"

Page 9 of 14

Government Employee Uses DHS Database to Track Ex-Girlfriend

When you build a surveillance system, you invite trusted insiders to abuse that system:

According to the indictment, Robinson, began a relationship with an unidentified woman in 2002 that ended acrimoniously seven months later. After the breakup, federal authorities allege Robinson accessed a government database known as the TECS (Treasury Enforcement Communications System) at least 163 times to track the travel patterns of the woman and her family.

What I want to know is how he got caught. It can be very hard to catch insiders like this; good audit systems are essential, but often overlooked in the design process.

Posted on October 3, 2007 at 3:02 PMView Comments

Risks of Data Reuse

We learned the news in March: Contrary to decades of denials, the U.S. Census Bureau used individual records to round up Japanese-Americans during World War II.

The Census Bureau normally is prohibited by law from revealing data that could be linked to specific individuals; the law exists to encourage people to answer census questions accurately and without fear. And while the Second War Powers Act of 1942 temporarily suspended that protection in order to locate Japanese-Americans, the Census Bureau had maintained that it only provided general information about neighborhoods.

New research proves they were lying.

The whole incident serves as a poignant illustration of one of the thorniest problems of the information age: data collected for one purpose and then used for another, or “data reuse.”

When we think about our personal data, what bothers us most is generally not the initial collection and use, but the secondary uses. I personally appreciate it when Amazon.com suggests books that might interest me, based on books I have already bought. I like it that my airline knows what type of seat and meal I prefer, and my hotel chain keeps records of my room preferences. I don’t mind that my automatic road-toll collection tag is tied to my credit card, and that I get billed automatically. I even like the detailed summary of my purchases that my credit card company sends me at the end of every year. What I don’t want, though, is any of these companies selling that data to brokers, or for law enforcement to be allowed to paw through those records without a warrant.

There are two bothersome issues about data reuse. First, we lose control of our data. In all of the examples above, there is an implied agreement between the data collector and me: It gets the data in order to provide me with some sort of service. Once the data collector sells it to a broker, though, it’s out of my hands. It might show up on some telemarketer’s screen, or in a detailed report to a potential employer, or as part of a data-mining system to evaluate my personal terrorism risk. It becomes part of my data shadow, which always follows me around but I can never see.

This, of course, affects our willingness to give up personal data in the first place. The reason U.S. census data was declared off-limits for other uses was to placate Americans’ fears and assure them that they could answer questions truthfully. How accurate would you be in filling out your census forms if you knew the FBI would be mining the data, looking for terrorists? How would it affect your supermarket purchases if you knew people were examining them and making judgments about your lifestyle? I know many people who engage in data poisoning: deliberately lying on forms in order to propagate erroneous data. I’m sure many of them would stop that practice if they could be sure that the data was only used for the purpose for which it was collected.

The second issue about data reuse is error rates. All data has errors, and different uses can tolerate different amounts of error. The sorts of marketing databases you can buy on the web, for example, are notoriously error-filled. That’s OK; if the database of ultra-affluent Americans of a particular ethnicity you just bought has a 10 percent error rate, you can factor that cost into your marketing campaign. But that same database, with that same error rate, might be useless for law enforcement purposes.

Understanding error rates and how they propagate is vital when evaluating any system that reuses data, especially for law enforcement purposes. A few years ago, the Transportation Security Administration’s follow-on watch list system, Secure Flight, was going to use commercial data to give people a terrorism risk score and determine how much they were going to be questioned or searched at the airport. People rightly rebelled against the thought of being judged in secret, but there was much less discussion about whether the commercial data from credit bureaus was accurate enough for this application.

An even more egregious example of error-rate problems occurred in 2000, when the Florida Division of Elections contracted with Database Technologies (since merged with ChoicePoint) to remove convicted felons from the voting rolls. The databases used were filled with errors and the matching procedures were sloppy, which resulted in thousands of disenfranchised voters—mostly black—and almost certainly changed a presidential election result.

Of course, there are beneficial uses of secondary data. Take, for example, personal medical data. It’s personal and intimate, yet valuable to society in aggregate. Think of what we could do with a database of everyone’s health information: massive studies examining the long-term effects of different drugs and treatment options, different environmental factors, different lifestyle choices. There’s an enormous amount of important research potential hidden in that data, and it’s worth figuring out how to get at it without compromising individual privacy.

This is largely a matter of legislation. Technology alone can never protect our rights. There are just too many reasons not to trust it, and too many ways to subvert it. Data privacy ultimately stems from our laws, and strong legal protections are fundamental to protecting our information against abuse. But at the same time, technology is still vital.

Both the Japanese internment and the Florida voting-roll purge demonstrate that laws can change … and sometimes change quickly. We need to build systems with privacy-enhancing technologies that limit data collection wherever possible. Data that is never collected cannot be reused. Data that is collected anonymously, or deleted immediately after it is used, is much harder to reuse. It’s easy to build systems that collect data on everything—it’s what computers naturally do—but it’s far better to take the time to understand what data is needed and why, and only collect that.

History will record what we, here in the early decades of the information age, did to foster freedom, liberty and democracy. Did we build information technologies that protected people’s freedoms even during times when society tried to subvert them? Or did we build technologies that could easily be modified to watch and control? It’s bad civic hygiene to build an infrastructure that can be used to facilitate a police state.

This article originally appeared on Wired.com

Posted on June 28, 2007 at 8:34 AMView Comments

Teaching Computers How to Forget

I’ve written about the death of ephemeral conversation, the rise of wholesale surveillance, and the electronic audit trail that now follows us through life. Viktor Mayer-Schönberger, a professor in Harvard’s JFK School of Government, has noticed this too, and believes that computers need to forget.

Why would we want our machines to “forget”? Mayer-Schönberger suggests that we are creating a Benthamist panopticon by archiving so many bits of knowledge for so long. The accumulated weight of stored Google searches, thousands of family photographs, millions of books, credit bureau information, air travel reservations, massive government databases, archived e-mail, etc., can actually be a detriment to speech and action, he argues.

“If whatever we do can be held against us years later, if all our impulsive comments are preserved, they can easily be combined into a composite picture of ourselves,” he writes in the paper. “Afraid how our words and actions may be perceived years later and taken out of context, the lack of forgetting may prompt us to speak less freely and openly.”

In other words, it threatens to make us all politicians.

In contrast to omnibus data protection legislation, Mayer-Schönberger proposes a combination of law and software to ensure that most data is “forgotten” by default. A law would decree that “those who create software that collects and stores data build into their code not only the ability to forget with time, but make such forgetting the default.” Essentially, this means that all collected data is tagged with a new piece of metadata that defines when the information should expire.

In practice, this would mean that iTunes could only store buying data for a limited time, a time defined by law. Should customers explicitly want this time extended, that would be fine, but people must be given a choice. Even data created by users—digital pictures, for example—would be tagged by the cameras that create them to expire in a year or two; pictures that people want to keep could simply be given a date 10,000 years in the future.

Frank Pasquale also comments on the legal implications implicit in this issue. And Paul Ohm wrote a note titled “The Fourth Amendment Right to Delete”:

For years the police have entered homes and offices, hauled away filing cabinets full of records, and searched them back at the police station for evidence. In Fourth Amendment terms, these actions are entry, seizure, and search, respectively, and usually require the police to obtain a warrant. Modern-day police can avoid some of these messy steps with the help of technology: They have tools that duplicate stored records and collect evidence of behavior, all from a distance and without the need for physical entry. These tools generate huge amounts of data that may be searched immediately or stored indefinitely for later analysis. Meanwhile, it is unclear whether the Fourth Amendment’s restrictions apply to these technologies: Are the acts of duplication and collection themselves seizure? Before the data are analyzed, has a search occurred?

EDITED TO ADD (6/14): Interesting presentation earlier this year by Dr. Radia Perlman that represents some work toward this problem. And a counterpoint.

Posted on May 16, 2007 at 6:19 AMView Comments

Is Big Brother a Big Deal?

Big Brother isn’t what he used to be. George Orwell extrapolated his totalitarian state from the 1940s. Today’s information society looks nothing like Orwell’s world, and watching and intimidating a population today isn’t anything like what Winston Smith experienced.

Data collection in 1984 was deliberate; today’s is inadvertent. In the information society, we generate data naturally. In Orwell’s world, people were naturally anonymous; today, we leave digital footprints everywhere.

1984‘s police state was centralized; today’s is decentralized. Your phone company knows who you talk to, your credit card company knows where you shop and Netflix knows what you watch. Your ISP can read your email, your cell phone can track your movements and your supermarket can monitor your purchasing patterns. There’s no single government entity bringing this together, but there doesn’t have to be. As Neal Stephenson said, the threat is no longer Big Brother, but instead thousands of Little Brothers.

1984‘s Big Brother was run by the state; today’s Big Brother is market driven. Data brokers like ChoicePoint and credit bureaus like Experian aren’t trying to build a police state; they’re just trying to turn a profit. Of course these companies will take advantage of a national ID; they’d be stupid not to. And the correlations, data mining and precise categorizing they can do is why the U.S. government buys commercial data from them.

1984-style police states required lots of people. East Germany employed one informant for every 66 citizens. Today, there’s no reason to have anyone watch anyone else; computers can do the work of people.

1984-style police states were expensive. Today, data storage is constantly getting cheaper. If some data is too expensive to save today, it’ll be affordable in a few years.

And finally, the police state of 1984 was deliberately constructed, while today’s is naturally emergent. There’s no reason to postulate a malicious police force and a government trying to subvert our freedoms. Computerized processes naturally throw off personalized data; companies save it for marketing purposes, and even the most well-intentioned law enforcement agency will make use of it.

Of course, Orwell’s Big Brother had a ruthless efficiency that’s hard to imagine in a government today. But that completely misses the point. A sloppy and inefficient police state is no reason to cheer; watch the movie Brazil and see how scary it can be. You can also see hints of what it might look like in our completely dysfunctional “no-fly” list and useless projects to secretly categorize people according to potential terrorist risk. Police states are inherently inefficient. There’s no reason to assume today’s will be any more effective.

The fear isn’t an Orwellian government deliberately creating the ultimate totalitarian state, although with the U.S.’s programs of phone-record surveillance, illegal wiretapping, massive data mining, a national ID card no one wants and Patriot Act abuses, one can make that case. It’s that we’re doing it ourselves, as a natural byproduct of the information society.We’re building the computer infrastructure that makes it easy for governments, corporations, criminal organizations and even teenage hackers to record everything we do, and—yes—even change our votes. And we will continue to do so unless we pass laws regulating the creation, use, protection, resale and disposal of personal data. It’s precisely the attitude that trivializes the problem that creates it.

This essay appeared in the May issue of Information Security, as the second half of a point/counterpoint with Marcus Ranum. Here’s his half.

Posted on May 11, 2007 at 9:19 AMView Comments

Ordinary People Being Labeled as Terrorists

By law, every business has to check their customers against a list of “specially designated nationals,” and not do business with anyone on that list.

Of course, the list is riddled with bad names and many innocents get caught up in the net. And many businesses decide that it’s easier to turn away potential customers with whose name is on the list, creating—well—a shunned class:

Tom Kubbany is neither a terrorist nor a drug trafficker, has average credit and has owned homes in the past, so the Northern California mental-health worker was baffled when his mortgage broker said lenders were not interested in him. Reviewing his loan file, he discovered something shocking. At the top of his credit report was an OFAC alert provided by credit bureau TransUnion that showed that his middle name, Hassan, is an alias for Ali Saddam Hussein, purportedly a “son of Saddam Hussein.”

The record is not clear on whether Ali Saddam Hussein was a Hussein offspring, but the OFAC list stated he was born in 1980 or 1983. Kubbany was born in Detroit in 1949.

Under OFAC guidance, the date discrepancy signals a false match. Still, Kubbany said, the broker decided not to proceed. “She just talked with a bunch of lenders over the phone and they said, ‘No,’ ” he said. “So we said, ‘The heck with it. We’ll just go somewhere else.’ ”

Kubbany and his wife are applying for another loan, though he worries that the stigma lingers. “There’s a dark cloud over us,” he said. “We will never know if we had qualified for the mortgage last summer, then we might have been in a house now.”

Saad Ali Muhammad is an African American who was born in Chicago and converted to Islam in 1980. When he tried to buy a used car from a Chevrolet dealership three years ago, a salesman ran his credit report and at the top saw a reference to “OFAC search,” followed by the names of terrorists including Osama bin Laden. The only apparent connection was the name Muhammad. The credit report, also by TransUnion, did not explain what OFAC was or what the credit report user should do with the information. Muhammad wrote to TransUnion and filed a complaint with a state human rights agency, but the alert remains on his report, Sinnar said.

Colleen Tunney-Ryan, a TransUnion spokeswoman, said in an e-mail that clients using the firm’s credit reports are solely responsible for any action required by federal law as a result of a potential match and that they must agree they will not take any adverse action against a consumer based solely on the report.

The lawyers’ committee documented other cases, including that of a couple in Phoenix who were about to close on their first home, only to be told the sale could not proceed because the husband’s first and last names—common Hispanic names—matched an entry on the OFAC list. The entry did not include a date or place of birth, which could have helped distinguish the individuals.

In another case, a Roseville, Calif., couple wanted to buy a treadmill from a home fitness store on a financing plan. A bank representative told the salesperson that because the husband’s first name was Hussein, the couple would have to wait 72 hours while they were investigated. Though the couple eventually received the treadmill, they were so embarrassed by the incident they did not want their names in the report, Sinnar said.

This is the same problem as the no-fly list, only in a larger context. And it’s no way to combat terrorism. Thankfully, many businesses don’t know to check this list and people whose names are similar to suspected terrorists’ can still lead mostly normal lives. But the trend here is not good.

Posted on April 10, 2007 at 6:23 AMView Comments

The U.S. Terrorist Database

Interesting article about the terrorist database: Terrorist Identities Datamart Environment (TIDE).

It’s huge:

Ballooning from fewer than 100,000 files in 2003 to about 435,000, the growing database threatens to overwhelm the people who manage it. “The single biggest worry that I have is long-term quality control,” said Russ Travers, in charge of TIDE at the National Counterterrorism Center in McLean. “Where am I going to be, where is my successor going to be, five years down the road?”

TIDE has also created concerns about secrecy, errors and privacy. The list marks the first time foreigners and U.S. citizens are combined in an intelligence database. The bar for inclusion is low, and once someone is on the list, it is virtually impossible to get off it. At any stage, the process can lead to “horror stories” of mixed-up names and unconfirmed information, Travers acknowledged.

Mostly the article tells you things you already know: the list is riddled with errors, and there’s no defined process for getting on or off the list. But the most surreal quote is at the end, from Rick Kopel, the center’s acting director:

The center came in for ridicule last year when CBS’s “60 Minutes” noted that 14 of the 19 Sept. 11 hijackers were listed—five years after their deaths. Kopel defended the listings, saying that “we know for a fact that these people will use names that they believe we are not going to list because they’re out of circulation—either because they’re dead or incarcerated. . . . It’s not willy-nilly. Every name on the list, there’s a reason that it’s on there.”

Get that? There’s someone who deliberately puts wrong names on the list because they think the terrorists might use aliases, and they want to catch them. Given that reasoning, wouldn’t you want to put the entire phone book on the list?

Posted on March 26, 2007 at 2:05 PMView Comments

Find Out if You're on the "No Fly List"

I’m not. Are you?

Soundex works, generally, by removing vowels from names and then assigning numerical values to the remaining consonants.

This has been the basis for the Computer Assisted Passenger Pre-Screening System (CAPPS) and it is horrendously inadequate and matches far too many names. To see just how poorly Soundex performs, visit nofly.s3.com and type in your name to assess your chances of being on the No Fly or Watch List. This is the only known publicly available site for checking your name against potential terrorist identities and databases. It was developed by S3 Matching Technologies of Austin, Texas. The company’s database technicians merged the best known data on terrorists with the Soundex system to create the site.

Posted on March 14, 2007 at 7:51 AMView Comments

OneDOJ

Yet another massive U.S. government database—OneDOJ:

The Justice Department is building a massive database that allows state and local police officers around the country to search millions of case files from the FBI, Drug Enforcement Administration and other federal law enforcement agencies, according to Justice officials.

The system, known as “OneDOJ,” already holds approximately 1 million case records and is projected to triple in size over the next three years, Justice officials said. The files include investigative reports, criminal-history information, details of offenses, and the names, addresses and other information of criminal suspects or targets, officials said.

The database is billed by its supporters as a much-needed step toward better information-sharing with local law enforcement agencies, which have long complained about a lack of cooperation from the federal government.

But civil-liberties and privacy advocates say the scale and contents of such a database raise immediate privacy and civil rights concerns, in part because tens of thousands of local police officers could gain access to personal details about people who have not been arrested or charged with crimes.

The little-noticed program has been coming together over the past year and a half. It already is in use in pilot projects with local police in Seattle, San Diego and a handful of other areas, officials said. About 150 separate police agencies have access, officials said.

But in a memorandum sent last week to the FBI, U.S. attorneys and other senior Justice officials, Deputy Attorney General Paul J. McNulty announced that the program will be expanded immediately to 15 additional regions and that federal authorities will “accelerate . . . efforts to share information from both open and closed cases.”

Eventually, the department hopes, the database will be a central mechanism for sharing federal law enforcement information with local and state investigators, who now run checks individually, and often manually, with Justice’s five main law enforcement agencies: the FBI, the DEA, the U.S. Marshals Service, the Bureau of Prisons and the Bureau of Alcohol, Tobacco, Firearms and Explosives.

Within three years, officials said, about 750 law enforcement agencies nationwide will have access.

Computerizing this stuff is a good idea, but any new systems need privacy safeguards built-in. We need to ensure that:

  • Inaccurate data can be corrected.
  • Data is deleted when it is no longer needed, especially investigative data on people who have turned out to be innocent.
  • Protections are in place to prevent abuse of the data, both by people in their official capacity and people acting unofficially or fraudulently.

ln our rush to computerize these records, we’re ignoring these safeguards and building systems that will make us all less secure.

Posted on January 2, 2007 at 11:55 AMView Comments

CATO Report on Data Mining and Terrorism

Definitely worth reading:

Though data mining has many valuable uses, it is not well suited to the terrorist discovery problem. It would be unfortunate if data mining for terrorism discovery had currency within national security, law enforcement, and technology circles because pursuing this use of data mining would waste taxpayer dollars, needlessly infringe on privacy and civil liberties, and misdirect the valuable time and energy of the men and women in the national security community.

Posted on December 13, 2006 at 1:38 PMView Comments

1 7 8 9 10 11 14

Sidebar photo of Bruce Schneier by Joe MacInnis.