Entries Tagged "anonymity"

Page 7 of 8

The NSA Teams Up with the Chinese Government to Limit Internet Anonymity

Definitely strange bedfellows:

A United Nations agency is quietly drafting technical standards, proposed by the Chinese government, to define methods of tracing the original source of Internet communications and potentially curbing the ability of users to remain anonymous.

The U.S. National Security Agency is also participating in the “IP Traceback” drafting group, named Q6/17, which is meeting next week in Geneva to work on the traceback proposal. Members of Q6/17 have declined to release key documents, and meetings are closed to the public.

[…]

A second, apparently leaked ITU document offers surveillance and monitoring justifications that seem well-suited to repressive regimes:

A political opponent to a government publishes articles putting the government in an unfavorable light. The government, having a law against any opposition, tries to identify the source of the negative articles but the articles having been published via a proxy server, is unable to do so protecting the anonymity of the author.

This is being sold as a way to go after the bad guys, but it won’t help. Here’s Steve Bellovin on that issue:

First, very few attacks these days use spoofed source addresses; the real IP address already tells you where the attack is coming from. Second, in case of a DDoS attack, there are too many sources; you can’t do anything with the information. Third, the machine attacking you is almost certainly someone else’s hacked machine and tracking them down (and getting them to clean it up) is itself time-consuming.

TraceBack is most useful in monitoring the activities of large masses of people. But of course, that’s why the Chinese and the NSA are so interested in this proposal in the first place.

It’s hard to figure out what the endgame is; the U.N. doesn’t have the authority to impose Internet standards on anyone. In any case, this idea is counter to the U.N. Universal Declaration of Human Rights, Article 19: “Everyone has the right to freedom of opinion and expression; this right includes freedom to hold opinions without interference and to seek, receive and impart information and ideas through any media and regardless of frontiers.” In the U.S., it’s counter to the First Amendment, which has long permitted anonymous speech. On the other hand, basic human and constitutional rights have been jettisoned left and right in the years after 9/11; why should this be any different?

But when the Chinese government and the NSA get together to enhance their ability to spy on us all, you have to wonder what’s gone wrong with the world.

Posted on September 18, 2008 at 6:34 AMView Comments

Anonymity and the Netflix Dataset

Last year, Netflix published 10 million movie rankings by 500,000 customers, as part of a challenge for people to come up with better recommendation systems than the one the company was using. The data was anonymized by removing personal details and replacing names with random numbers, to protect the privacy of the recommenders.

Arvind Narayanan and Vitaly Shmatikov, researchers at the University of Texas at Austin, de-anonymized some of the Netflix data by comparing rankings and timestamps with public information in the Internet Movie Database, or IMDb.

Their research (.pdf) illustrates some inherent security problems with anonymous data, but first it’s important to explain what they did and did not do.

They did not reverse the anonymity of the entire Netflix dataset. What they did was reverse the anonymity of the Netflix dataset for those sampled users who also entered some movie rankings, under their own names, in the IMDb. (While IMDb’s records are public, crawling the site to get them is against the IMDb’s terms of service, so the researchers used a representative few to prove their algorithm.)

The point of the research was to demonstrate how little information is required to de-anonymize information in the Netflix dataset.

On one hand, isn’t that sort of obvious? The risks of anonymous databases have been written about before, such as in this 2001 paper published in an IEEE journal. The researchers working with the anonymous Netflix data didn’t painstakingly figure out people’s identities—as others did with the AOL search database last year—they just compared it with an already identified subset of similar data: a standard data-mining technique.

But as opportunities for this kind of analysis pop up more frequently, lots of anonymous data could end up at risk.

Someone with access to an anonymous dataset of telephone records, for example, might partially de-anonymize it by correlating it with a catalog merchants’ telephone order database. Or Amazon’s online book reviews could be the key to partially de-anonymizing a public database of credit card purchases, or a larger database of anonymous book reviews.

Google, with its database of users’ internet searches, could easily de-anonymize a public database of internet purchases, or zero in on searches of medical terms to de-anonymize a public health database. Merchants who maintain detailed customer and purchase information could use their data to partially de-anonymize any large search engine’s data, if it were released in an anonymized form. A data broker holding databases of several companies might be able to de-anonymize most of the records in those databases.

What the University of Texas researchers demonstrate is that this process isn’t hard, and doesn’t require a lot of data. It turns out that if you eliminate the top 100 movies everyone watches, our movie-watching habits are all pretty individual. This would certainly hold true for our book reading habits, our internet shopping habits, our telephone habits and our web searching habits.

The obvious countermeasures for this are, sadly, inadequate. Netflix could have randomized its dataset by removing a subset of the data, changing the timestamps or adding deliberate errors into the unique ID numbers it used to replace the names. It turns out, though, that this only makes the problem slightly harder. Narayanan’s and Shmatikov’s de-anonymization algorithm is surprisingly robust, and works with partial data, data that has been perturbed, even data with errors in it.

With only eight movie ratings (of which two may be completely wrong), and dates that may be up to two weeks in error, they can uniquely identify 99 percent of the records in the dataset. After that, all they need is a little bit of identifiable data: from the IMDb, from your blog, from anywhere. The moral is that it takes only a small named database for someone to pry the anonymity off a much larger anonymous database.

Other research reaches the same conclusion. Using public anonymous data from the 1990 census, Latanya Sweeney found that 87 percent of the population in the United States, 216 million of 248 million, could likely be uniquely identified by their five-digit ZIP code, combined with their gender and date of birth. About half of the U.S. population is likely identifiable by gender, date of birth and the city, town or municipality in which the person resides. Expanding the geographic scope to an entire county reduces that to a still-significant 18 percent. “In general,” the researchers wrote, “few characteristics are needed to uniquely identify a person.”

Stanford University researchers reported similar results using 2000 census data. It turns out that date of birth, which (unlike birthday month and day alone) sorts people into thousands of different buckets, is incredibly valuable in disambiguating people.

This has profound implications for releasing anonymous data. On one hand, anonymous data is an enormous boon for researchers—AOL did a good thing when it released its anonymous dataset for research purposes, and it’s sad that the CTO resigned and an entire research team was fired after the public outcry. Large anonymous databases of medical data are enormously valuable to society: for large-scale pharmacology studies, long-term follow-up studies and so on. Even anonymous telephone data makes for fascinating research.

On the other hand, in the age of wholesale surveillance, where everyone collects data on us all the time, anonymization is very fragile and riskier than it initially seems.

Like everything else in security, anonymity systems shouldn’t be fielded before being subjected to adversarial attacks. We all know that it’s folly to implement a cryptographic system before it’s rigorously attacked; why should we expect anonymity systems to be any different? And, like everything else in security, anonymity is a trade-off. There are benefits, and there are corresponding risks.

Narayanan and Shmatikov are currently working on developing algorithms and techniques that enable the secure release of anonymous datasets like Netflix’s. That’s a research result we can all benefit from.

This essay originally appeared on Wired.com.

Posted on December 18, 2007 at 5:53 AMView Comments

Dan Egerstad Arrested

I previously wrote about Dan Egerstad, a security researcher who ran a Tor anonymity network and was able to sniff some pretty impressive usernames and passwords.

Swedish police arrested him:

About 9am Egerstad walked downstairs to move his car when he was accosted by the officers in a scene “taken out of a bad movie”, he said in an email interview.

“I got a couple of police IDs in my face while told that they are taking me in for questioning,” he said.

But not before the agents, who had staked out his house in undercover blue and grey Saabs (“something that screams cop to every person in Sweden from miles away”), searched his apartment and confiscated computers, CDs and portable hard drives.

“They broke my wardrobe, short cutted my electricity, pulled out my speakers, phone and other cables having nothing to do with this and been touching my bookkeeping, which they have no right to do,” he said.

While questioning Egerstad at the station, the police “played every trick in the book, good cop, bad cop and crazy mysterious guy in the corner not wanting to tell his name and just staring at me”.

“Well, if they want to try to manipulate, I can play that game too. [I] gave every known body signal there is telling of lies … covered my mouth, scratched my elbow, looked away and so on.”

No charges have been filed. I’m not sure there’s anything wrong with what he did.

Here’s a good article on what he did; it was published just before the arrest.

Posted on November 16, 2007 at 2:27 PMView Comments

Redefining Privacy

This kind of thinking can do enormous damage to a free society:

As Congress debates new rules for government eavesdropping, a top intelligence official says it is time that people in the United States change their definition of privacy.

Privacy no longer can mean anonymity, says Donald Kerr, the principal deputy director of national intelligence. Instead, it should mean that government and businesses properly safeguard people’s private communications and financial information.

[…]

“Our job now is to engage in a productive debate, which focuses on privacy as a component of appropriate levels of security and public safety,” Kerr said. “I think all of us have to really take stock of what we already are willing to give up, in terms of anonymity, but [also] what safeguards we want in place to be sure that giving that doesn’t empty our bank account or do something equally bad elsewhere.”

Anonymity, privacy, and security are intertwined; you can’t just separate them out like that. And privacy isn’t opposed to security; privacy is part of security. And the value of privacy in a free society is enormous.

Other comments.

EDITED TO ADD (11/15): His actual comments are more nuanced. Steve Bellovin has some comments.

Posted on November 14, 2007 at 12:51 PMView Comments

Anonymity and the Tor Network

As the name implies, Alcoholics Anonymous meetings are anonymous. You don’t have to sign anything, show ID or even reveal your real name. But the meetings are not private. Anyone is free to attend. And anyone is free to recognize you: by your face, by your voice, by the stories you tell. Anonymity is not the same as privacy.

That’s obvious and uninteresting, but many of us seem to forget it when we’re on a computer. We think “it’s secure,” and forget that secure can mean many different things.

Tor is a free tool that allows people to use the internet anonymously. Basically, by joining Tor you join a network of computers around the world that pass internet traffic randomly amongst each other before sending it out to wherever it is going. Imagine a tight huddle of people passing letters around. Once in a while a letter leaves the huddle, sent off to some destination. If you can’t see what’s going on inside the huddle, you can’t tell who sent what letter based on watching letters leave the huddle.

I’ve left out a lot of details, but that’s basically how Tor works. It’s called “onion routing,” and it was first developed at the Naval Research Laboratory. The communications between Tor nodes are encrypted in a layered protocol—hence the onion analogy—but the traffic that leaves the Tor network is in the clear. It has to be.

If you want your Tor traffic to be private, you need to encrypt it. If you want it to be authenticated, you need to sign it as well. The Tor website even says:

Yes, the guy running the exit node can read the bytes that come in and out there. Tor anonymizes the origin of your traffic, and it makes sure to encrypt everything inside the Tor network, but it does not magically encrypt all traffic throughout the internet.

Tor anonymizes, nothing more.

Dan Egerstad is a Swedish security researcher; he ran five Tor nodes. Last month, he posted a list of 100 e-mail credentials—server IP addresses, e-mail accounts and the corresponding passwords—for
embassies and government ministries
around the globe, all obtained by sniffing exit traffic for usernames and passwords of e-mail servers.

The list contains mostly third-world embassies: Kazakhstan, Uzbekistan, Tajikistan, India, Iran, Mongolia—but there’s a Japanese embassy on the list, as well as the UK Visa Application Center in Nepal, the Russian Embassy in Sweden, the Office of the Dalai Lama and several Hong Kong Human Rights Groups. And this is just the tip of the iceberg; Egerstad sniffed more than 1,000 corporate accounts this way, too. Scary stuff, indeed.

Presumably, most of these organizations are using Tor to hide their network traffic from their host countries’ spies. But because anyone can join the Tor network, Tor users necessarily pass their traffic to organizations they might not trust: various intelligence agencies, hacker groups, criminal organizations and so on.

It’s simply inconceivable that Egerstad is the first person to do this sort of eavesdropping; Len Sassaman published a paper on this attack earlier this year. The price you pay for anonymity is exposing your traffic to shady people.

We don’t really know whether the Tor users were the accounts’ legitimate owners, or if they were hackers who had broken into the accounts by other means and were now using Tor to avoid being caught. But certainly most of these users didn’t realize that anonymity doesn’t mean privacy. The fact that most of the accounts listed by Egerstad were from small nations is no surprise; that’s where you’d expect weaker security practices.

True anonymity is hard. Just as you could be recognized at an AA meeting, you can be recognized on the internet as well. There’s a lot of research on breaking anonymity in general—and Tor specifically—but sometimes it doesn’t even take much. Last year, AOL made 20,000 anonymous search queries public as a research tool. It wasn’t very hard to identify people from the data.

A research project called Dark Web, funded by the National Science Foundation, even tried to identify anonymous writers by their style:

One of the tools developed by Dark Web is a technique called Writeprint, which automatically extracts thousands of multilingual, structural, and semantic features to determine who is creating “anonymous” content online. Writeprint can look at a posting on an online bulletin board, for example, and compare it with writings found elsewhere on the Internet. By analyzing these certain features, it can determine with more than 95 percent accuracy if the author has produced other content in the past.

And if your name or other identifying information is in just one of those writings, you can be identified.

Like all security tools, Tor is used by both good guys and bad guys. And perversely, the very fact that something is on the Tor network means that someone—for some reason—wants to hide the fact he’s doing it.

As long as Tor is a magnet for “interesting” traffic, Tor will also be a magnet for those who want to eavesdrop on that traffic—especially because more than 90 percent of Tor users don’t encrypt.

This essay previously appeared on Wired.com.

Posted on September 20, 2007 at 5:38 AMView Comments

Torpark

Torpark is a free anonymous web browser. It sounds good:

A group of computer hackers and human rights workers have launched a specially-crafted version of Firefox that claims to give users complete anonymity when they surf the Web.

Dubbed “Torpark” and based on a portable version of Firefox 1.5.0.7, the browser will run from a USB drive, so it leaves no installation tracks on the PC. It protects the user’s privacy by encrypting all in- and outbound data, and also anonymizes the connection by passing all data through the TOR network, which masks the true IP address of the machine.

From the website:

Torpark is a program which allows you to surf the internet anonymously. Download Torpark and put it on a USB Flash keychain. Plug it into any internet terminal whether at home, school, work, or in public. Torpark will launch a Tor circuit connection, which creates an encrypted tunnel from your computer indirectly to a Tor exit computer, allowing you to surf the internet anonymously.

More details here.

Posted on September 28, 2006 at 6:51 AMView Comments

New Anonymous Browser

According to Computerworld and InfoWorld, there’s a new Web browser specifically designed not to retain information.

Browzar automatically deletes Internet caches, histories, cookies and auto-complete forms. Auto-complete is the feature that anticipates the search term or Web address a user might enter by relying on information previously entered into the browser.

I know nothing else about this. If you want, download it here.

EDITED TO ADD (9/1): This browser seems to be both fake and full of adware.

Posted on September 1, 2006 at 8:23 AMView Comments

Skype Call Traced

Kobi Alexander fled the United States ten days ago. He was tracked down in Sri Lanka via a Skype call:

According to the report, Alexander was located after making a one-minute call via the online telephone Skype service. The call, made from the Sri Lankan capital Colombo, alerted intelligence agencies to his presence in the country.

Ars Technica explains:

The fugitive former CEO may have been convinced that using Skype made him safe from tracking, but he—and everyone else that believes VoIP is inherently more secure than a landline—was wrong. Tracking anonymous peer-to-peer VoIP traffic over the Internet is possible (PDF). In fact, it can be done even if the parties have taken some steps to disguise the traffic.

Let this be a warning to all of you who thought Skype was anonymous.

Posted on August 24, 2006 at 1:45 PMView Comments

TrackMeNot

In the wake of AOL’s publication of search data, and the New York Times article demonstrating how easy it is to figure out who did the searching, we have TrackMeNot:

TrackMeNot runs in Firefox as a low-priority background process that periodically issues randomized search-queries to popular search engines, e.g., AOL, Yahoo!, Google, and MSN. It hides users’ actual search trails in a cloud of indistinguishable ‘ghost’ queries, making it difficult, if not impossible, to aggregate such data into accurate or identifying user profiles. TrackMeNot integrates into the Firefox ‘Tools’ menu and includes a variety of user-configurable options.

Let’s count the ways this doesn’t work.

One, it doesn’t hide your searches. If the government wants to know who’s been searching on “al Qaeda recruitment centers,” it won’t matter that you’ve made ten thousand other searches as well—you’ll be targeted.

Two, it’s too easy to spot. There are only 1,673 search terms in the program’s dictionary. Here, as a random example, are the program’s “G” words:

gag, gagged, gagging, gags, gas, gaseous, gases, gassed, gasses, gassing, gen, generate, generated, generates, generating, gens, gig, gigs, gillion, gillions, glass, glasses, glitch, glitched, glitches, glitching, glob, globed, globing, globs, glue, glues, gnarlier, gnarliest, gnarly, gobble, gobbled, gobbles, gobbling, golden, goldener, goldenest, gonk, gonked, gonking, gonks, gonzo, gopher, gophers, gorp, gorps, gotcha, gotchas, gribble, gribbles, grind, grinding, grinds, grok, grokked, grokking, groks, ground, grovel, groveled, groveling, grovelled, grovelling, grovels, grue, grues, grunge, grunges, gun, gunned, gunning, guns, guru, gurus

The program’s authors claim that this list is temporary, and that there will eventually be a TrackMeNot server with an ever-changing word list. Of course, that list can be monitored by any analysis program—as could any queries to that server.

In any case, every twelve seconds—exactly—the program picks a random pair of words and sends it to either AOL, Yahoo, MSN, or Google. My guess is that your searches contain more than two words, you don’t send them out in precise twelve-second intervals, and you favor one search engine over the others.

Three, some of the program’s searches are worse than yours. The dictionary includes:

HIV, atomic, bomb, bible, bibles, bombing, bombs, boxes, choke, choked, chokes, choking, chain, crackers, empire, evil, erotics, erotices, fingers, knobs, kicking, harier, hamster, hairs, legal, letterbomb, letterbombs, mailbomb, mailbombing, mailbombs, rapes, raping, rape, raper, rapist, virgin, warez, warezes, whack, whacked, whacker, whacking, whackers, whacks, pistols

Does anyone reall think that searches on “erotic rape,” “mailbombing bibles,” and “choking virgins” will make their legitimate searches less noteworthy?

And four, it wastes a whole lot of bandwidth. A query every twelve seconds translates into 2,400 queries a day, assuming an eight-hour workday. A typical Google response is about 25K, so we’re talking 60 megabytes of additional traffic daily. Imagine if everyone in the company used it.

I suppose this kind of thing would stop someone who has a paper printout of your searches and is looking through them manually, but it’s not going to hamper computer analysis very much. Or anyone who isn’t lazy. But it wouldn’t be hard for a computer profiling program to ignore these searches.

As one commentator put it:

Imagine a cop pulls you over for speeding. As he approaches, you realize you left your wallet at home. Without your driver’s license, you could be in a lot of trouble. When he approaches, you roll down your window and shout. “Hello Officer! I don’t have insurance on this vehicle! This car is stolen! I have weed in my glovebox! I don’t have my driver’s license! I just hit an old lady minutes ago! I’ve been running stop lights all morning! I have a dead body in my trunk! This car doesn’t pass the emissions tests! I’m not allowed to drive because I am under house arrest! My gas tank runs on the blood of children!” You stop to catch a breath, confident you have supplied so much information to the cop that you can’t possibly be caught for not having your license now.

Yes, data mining is a signal-to-noise problem. But artificial noise like this isn’t going to help much. If I were going to improve on this idea, I would make the plugin watch the user’s search patterns. I would make it send queries only to the search engines the user does, only when he is actually online doing things. I would randomize the timing. (There’s a comment to that effect in the code, so presumably this will be fixed in a later version of the program.) And I would make it monitor the web pages the user looks at, and send queries based on keywords it finds on those pages. And I would make it send queries in the form the user tends to use, whether it be single words, pairs of words, or whatever.

But honestly, I don’t know that I would use it even then. The way serious people protect their web-searching privacy is through anonymization. Use Tor for serious web anonymization. Or Black Box Search for simple anonymous searching (here’s a Greasemonkey extension that does that automatically.) And set your browser to delete search engine cookies regularly.

Posted on August 23, 2006 at 6:53 AMView Comments

Sidebar photo of Bruce Schneier by Joe MacInnis.