Entries Tagged "de-anonymization"

Page 5 of 5

Dan Egerstad Arrested

I previously wrote about Dan Egerstad, a security researcher who ran a Tor anonymity network and was able to sniff some pretty impressive usernames and passwords.

Swedish police arrested him:

About 9am Egerstad walked downstairs to move his car when he was accosted by the officers in a scene “taken out of a bad movie”, he said in an email interview.

“I got a couple of police IDs in my face while told that they are taking me in for questioning,” he said.

But not before the agents, who had staked out his house in undercover blue and grey Saabs (“something that screams cop to every person in Sweden from miles away”), searched his apartment and confiscated computers, CDs and portable hard drives.

“They broke my wardrobe, short cutted my electricity, pulled out my speakers, phone and other cables having nothing to do with this and been touching my bookkeeping, which they have no right to do,” he said.

While questioning Egerstad at the station, the police “played every trick in the book, good cop, bad cop and crazy mysterious guy in the corner not wanting to tell his name and just staring at me”.

“Well, if they want to try to manipulate, I can play that game too. [I] gave every known body signal there is telling of lies … covered my mouth, scratched my elbow, looked away and so on.”

No charges have been filed. I’m not sure there’s anything wrong with what he did.

Here’s a good article on what he did; it was published just before the arrest.

Posted on November 16, 2007 at 2:27 PMView Comments

Anonymity and the Tor Network

As the name implies, Alcoholics Anonymous meetings are anonymous. You don’t have to sign anything, show ID or even reveal your real name. But the meetings are not private. Anyone is free to attend. And anyone is free to recognize you: by your face, by your voice, by the stories you tell. Anonymity is not the same as privacy.

That’s obvious and uninteresting, but many of us seem to forget it when we’re on a computer. We think “it’s secure,” and forget that secure can mean many different things.

Tor is a free tool that allows people to use the internet anonymously. Basically, by joining Tor you join a network of computers around the world that pass internet traffic randomly amongst each other before sending it out to wherever it is going. Imagine a tight huddle of people passing letters around. Once in a while a letter leaves the huddle, sent off to some destination. If you can’t see what’s going on inside the huddle, you can’t tell who sent what letter based on watching letters leave the huddle.

I’ve left out a lot of details, but that’s basically how Tor works. It’s called “onion routing,” and it was first developed at the Naval Research Laboratory. The communications between Tor nodes are encrypted in a layered protocol — hence the onion analogy — but the traffic that leaves the Tor network is in the clear. It has to be.

If you want your Tor traffic to be private, you need to encrypt it. If you want it to be authenticated, you need to sign it as well. The Tor website even says:

Yes, the guy running the exit node can read the bytes that come in and out there. Tor anonymizes the origin of your traffic, and it makes sure to encrypt everything inside the Tor network, but it does not magically encrypt all traffic throughout the internet.

Tor anonymizes, nothing more.

Dan Egerstad is a Swedish security researcher; he ran five Tor nodes. Last month, he posted a list of 100 e-mail credentials — server IP addresses, e-mail accounts and the corresponding passwords — for
embassies and government ministries
around the globe, all obtained by sniffing exit traffic for usernames and passwords of e-mail servers.

The list contains mostly third-world embassies: Kazakhstan, Uzbekistan, Tajikistan, India, Iran, Mongolia — but there’s a Japanese embassy on the list, as well as the UK Visa Application Center in Nepal, the Russian Embassy in Sweden, the Office of the Dalai Lama and several Hong Kong Human Rights Groups. And this is just the tip of the iceberg; Egerstad sniffed more than 1,000 corporate accounts this way, too. Scary stuff, indeed.

Presumably, most of these organizations are using Tor to hide their network traffic from their host countries’ spies. But because anyone can join the Tor network, Tor users necessarily pass their traffic to organizations they might not trust: various intelligence agencies, hacker groups, criminal organizations and so on.

It’s simply inconceivable that Egerstad is the first person to do this sort of eavesdropping; Len Sassaman published a paper on this attack earlier this year. The price you pay for anonymity is exposing your traffic to shady people.

We don’t really know whether the Tor users were the accounts’ legitimate owners, or if they were hackers who had broken into the accounts by other means and were now using Tor to avoid being caught. But certainly most of these users didn’t realize that anonymity doesn’t mean privacy. The fact that most of the accounts listed by Egerstad were from small nations is no surprise; that’s where you’d expect weaker security practices.

True anonymity is hard. Just as you could be recognized at an AA meeting, you can be recognized on the internet as well. There’s a lot of research on breaking anonymity in general — and Tor specifically — but sometimes it doesn’t even take much. Last year, AOL made 20,000 anonymous search queries public as a research tool. It wasn’t very hard to identify people from the data.

A research project called Dark Web, funded by the National Science Foundation, even tried to identify anonymous writers by their style:

One of the tools developed by Dark Web is a technique called Writeprint, which automatically extracts thousands of multilingual, structural, and semantic features to determine who is creating “anonymous” content online. Writeprint can look at a posting on an online bulletin board, for example, and compare it with writings found elsewhere on the Internet. By analyzing these certain features, it can determine with more than 95 percent accuracy if the author has produced other content in the past.

And if your name or other identifying information is in just one of those writings, you can be identified.

Like all security tools, Tor is used by both good guys and bad guys. And perversely, the very fact that something is on the Tor network means that someone — for some reason — wants to hide the fact he’s doing it.

As long as Tor is a magnet for “interesting” traffic, Tor will also be a magnet for those who want to eavesdrop on that traffic — especially because more than 90 percent of Tor users don’t encrypt.

This essay previously appeared on Wired.com.

Posted on September 20, 2007 at 5:38 AMView Comments

Another E-Voting Problem: Not-Secret Ballots

Uh-oh:

Ohio law permits anyone to walk into a county election office and obtain two crucial documents: a list of voters in the order they voted, and a time-stamped list of the actual votes. “We simply take the two pieces of paper together, merge them, and then we have which voter voted and in which way,” said James Moyer, a longtime privacy activist and poll worker who lives in Columbus, Ohio.

EDITED TO ADD (9/13): Commentary by Ed Felton.

Posted on August 21, 2007 at 7:01 AMView Comments

Privacy Risks of Public Mentions

Interesting paper: “You are what you say: privacy risks of public mentions,” Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 2006.

Abstract:

In today’s data-rich networked world, people express many aspects of their lives online. It is common to segregate different aspects in different places: you might write opinionated rants about movies in your blog under a pseudonym while participating in a forum or web site for scholarly discussion of medical ethics under your real name. However, it may be possible to link these separate identities, because the movies, journal articles, or authors you mention are from a sparse relation space whose properties (e.g., many items related to by only a few users) allow re-identification. This re-identification violates people’s intentions to separate aspects of their life and can have negative consequences; it also may allow other privacy violations, such as obtaining a stronger identifier like name and address.This paper examines this general problem in a specific setting: re-identification of users from a public web movie forum in a private movie ratings dataset. We present three major results. First, we develop algorithms that can re-identify a large proportion of public users in a sparse relation space. Second, we evaluate whether private dataset owners can protect user privacy by hiding data; we show that this requires extensive and undesirable changes to the dataset, making it impractical. Third, we evaluate two methods for users in a public forum to protect their own privacy, suppression and misdirection. Suppression doesn’t work here either. However, we show that a simple misdirection strategy works well: mention a few popular items that you haven’t rated.

Unfortunately, the paper is only available to ACM members.

EDITED TO ADD (8/24): Paper is here.

Posted on August 23, 2006 at 2:11 PMView Comments

AOL Releases Massive Amount of Search Data

From TechCrunch:

AOL has released very private data about its users without their permission. While the AOL username has been changed to a random ID number, the ability to analyze all searches by a single user will often lead people to easily determine who the user is, and what they are up to. The data includes personal names, addresses, social security numbers and everything else someone might type into a search box.

The most serious problem is the fact that many people often search on their own name, or those of their friends and family, to see what information is available about them on the net. Combine these ego searches with porn queries and you have a serious embarrassment. Combine them with “buy ecstasy” and you have evidence of a crime. Combine it with an address, social security number, etc., and you have an identity theft waiting to happen. The possibilities are endless.

This is search data for roughly 658,000 anonymized users over a three month period from March to May — about 1/3 of 1 per cent of their total data for that period.

Now AOL says it was all a mistake. They pulled the data, but it’s still still out there — and probably will be forever. And there’s some pretty scary stuff in it.

You can read more on Slashdot and elsewhere.

Anyone who wants to play NSA can start datamining for terrorists. Let us know if you find anything.

EDITED TO ADD (8/9): The New York Times:

And search by search, click by click, the identity of AOL user No. 4417749 became easier to discern. There are queries for “landscapers in Lilburn, Ga,” several people with the last name Arnold and “homes sold in shadow lake subdivision gwinnett county georgia.”

It did not take much investigating to follow that data trail to Thelma Arnold, a 62-year-old widow who lives in Lilburn, Ga., frequently researches her friends’ medical ailments and loves her three dogs. “Those are my searches,” she said, after a reporter read part of the list to her.

Posted on August 8, 2006 at 11:02 AMView Comments

Digital Cameras Have Unique Fingerprints

Interesting research:

Fridrich’s technique is rooted in the discovery by her research group of this simple fact: Every original digital picture is overlaid by a weak noise-like pattern of pixel-to-pixel non-uniformity.

Although these patterns are invisible to the human eye, the unique reference pattern or “fingerprint” of any camera can be electronically extracted by analyzing a number of images taken by a single camera.

That means that as long as examiners have either the camera that took the image or multiple images they know were taken by the same camera, an algorithm developed by Fridrich and her co-inventors to extract and define the camera’s unique pattern of pixel-to-pixel non-uniformity can be used to provide important information about the origins and authenticity of a single image.

The limitation of the technique is that it requires either the camera or multiple images taken by the same camera, and isn’t informative if only a single image is available for analysis.

Like actual fingerprints, the digital “noise” in original images is stochastic in nature ­ that is, it contains random variables ­ which are inevitably created during the manufacturing process of the camera and its sensors. This virtually ensures that the noise imposed on the digital images from any particular camera will be consistent from one image to the next, even while it is distinctly different.

In preliminary tests, Fridrich’s lab analyzed 2,700 pictures taken by nine digital cameras and with 100 percent accuracy linked individual images with the camera that took them.

There’s one important aspect of this fingerprint that the article did not talk about: how easy is it to forge? Can someone analyze 100 images from a given camera, and then doctor a pre-existing picture so that it appeared to come from that camera?

My guess is that it can be done relatively easily.

Posted on April 25, 2006 at 2:09 PMView Comments

Remote Physical Device Fingerprinting

Here’s the abstract:

We introduce the area of remote physical device fingerprinting, or fingerprinting a physical device, as opposed to an operating system or class of devices, remotely, and without the fingerprinted device’s known cooperation. We accomplish this goal by exploiting small, microscopic deviations in device hardware: clock skews. Our techniques do not require any modification to the fingerprinted devices. Our techniques report consistent measurements when the measurer is thousands of miles, multiple hops, and tens of milliseconds away from the fingerprinted device, and when the fingerprinted device is connected to the Internet from different locations and via different access technologies. Further, one can apply our passive and semi-passive techniques when the fingerprinted device is behind a NAT or firewall, and also when the device’s system time is maintained via NTP or SNTP. One can use our techniques to obtain information about whether two devices on the Internet, possibly shifted in time or IP addresses, are actually the same physical device. Example applications include: computer forensics; tracking, with some probability, a physical device as it connects to the Internet from different public access points; counting the number of devices behind a NAT even when the devices use constant or random IP IDs; remotely probing a block of addresses to determine if the addresses correspond to virtual hosts, e.g., as part of a virtual honeynet; and unanonymizing anonymized network traces.

And an article. Really nice work.

Posted on March 7, 2005 at 3:02 PMView Comments

1 3 4 5

Sidebar photo of Bruce Schneier by Joe MacInnis.