Entries Tagged "de-anonymization"

Page 6 of 6

Privacy Risks of Public Mentions

Interesting paper: “You are what you say: privacy risks of public mentions,” Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 2006.

Abstract:

In today’s data-rich networked world, people express many aspects of their lives online. It is common to segregate different aspects in different places: you might write opinionated rants about movies in your blog under a pseudonym while participating in a forum or web site for scholarly discussion of medical ethics under your real name. However, it may be possible to link these separate identities, because the movies, journal articles, or authors you mention are from a sparse relation space whose properties (e.g., many items related to by only a few users) allow re-identification. This re-identification violates people’s intentions to separate aspects of their life and can have negative consequences; it also may allow other privacy violations, such as obtaining a stronger identifier like name and address.This paper examines this general problem in a specific setting: re-identification of users from a public web movie forum in a private movie ratings dataset. We present three major results. First, we develop algorithms that can re-identify a large proportion of public users in a sparse relation space. Second, we evaluate whether private dataset owners can protect user privacy by hiding data; we show that this requires extensive and undesirable changes to the dataset, making it impractical. Third, we evaluate two methods for users in a public forum to protect their own privacy, suppression and misdirection. Suppression doesn’t work here either. However, we show that a simple misdirection strategy works well: mention a few popular items that you haven’t rated.

Unfortunately, the paper is only available to ACM members.

EDITED TO ADD (8/24): Paper is here.

Posted on August 23, 2006 at 2:11 PMView Comments

AOL Releases Massive Amount of Search Data

From TechCrunch:

AOL has released very private data about its users without their permission. While the AOL username has been changed to a random ID number, the ability to analyze all searches by a single user will often lead people to easily determine who the user is, and what they are up to. The data includes personal names, addresses, social security numbers and everything else someone might type into a search box.

The most serious problem is the fact that many people often search on their own name, or those of their friends and family, to see what information is available about them on the net. Combine these ego searches with porn queries and you have a serious embarrassment. Combine them with “buy ecstasy” and you have evidence of a crime. Combine it with an address, social security number, etc., and you have an identity theft waiting to happen. The possibilities are endless.

This is search data for roughly 658,000 anonymized users over a three month period from March to May—about 1/3 of 1 per cent of their total data for that period.

Now AOL says it was all a mistake. They pulled the data, but it’s still still out there—and probably will be forever. And there’s some pretty scary stuff in it.

You can read more on Slashdot and elsewhere.

Anyone who wants to play NSA can start datamining for terrorists. Let us know if you find anything.

EDITED TO ADD (8/9): The New York Times:

And search by search, click by click, the identity of AOL user No. 4417749 became easier to discern. There are queries for “landscapers in Lilburn, Ga,” several people with the last name Arnold and “homes sold in shadow lake subdivision gwinnett county georgia.”

It did not take much investigating to follow that data trail to Thelma Arnold, a 62-year-old widow who lives in Lilburn, Ga., frequently researches her friends’ medical ailments and loves her three dogs. “Those are my searches,” she said, after a reporter read part of the list to her.

Posted on August 8, 2006 at 11:02 AMView Comments

Digital Cameras Have Unique Fingerprints

Interesting research:

Fridrich’s technique is rooted in the discovery by her research group of this simple fact: Every original digital picture is overlaid by a weak noise-like pattern of pixel-to-pixel non-uniformity.

Although these patterns are invisible to the human eye, the unique reference pattern or “fingerprint” of any camera can be electronically extracted by analyzing a number of images taken by a single camera.

That means that as long as examiners have either the camera that took the image or multiple images they know were taken by the same camera, an algorithm developed by Fridrich and her co-inventors to extract and define the camera’s unique pattern of pixel-to-pixel non-uniformity can be used to provide important information about the origins and authenticity of a single image.

The limitation of the technique is that it requires either the camera or multiple images taken by the same camera, and isn’t informative if only a single image is available for analysis.

Like actual fingerprints, the digital “noise” in original images is stochastic in nature ­ that is, it contains random variables ­ which are inevitably created during the manufacturing process of the camera and its sensors. This virtually ensures that the noise imposed on the digital images from any particular camera will be consistent from one image to the next, even while it is distinctly different.

In preliminary tests, Fridrich’s lab analyzed 2,700 pictures taken by nine digital cameras and with 100 percent accuracy linked individual images with the camera that took them.

There’s one important aspect of this fingerprint that the article did not talk about: how easy is it to forge? Can someone analyze 100 images from a given camera, and then doctor a pre-existing picture so that it appeared to come from that camera?

My guess is that it can be done relatively easily.

Posted on April 25, 2006 at 2:09 PMView Comments

Remote Physical Device Fingerprinting

Here’s the abstract:

We introduce the area of remote physical device fingerprinting, or fingerprinting a physical device, as opposed to an operating system or class of devices, remotely, and without the fingerprinted device’s known cooperation. We accomplish this goal by exploiting small, microscopic deviations in device hardware: clock skews. Our techniques do not require any modification to the fingerprinted devices. Our techniques report consistent measurements when the measurer is thousands of miles, multiple hops, and tens of milliseconds away from the fingerprinted device, and when the fingerprinted device is connected to the Internet from different locations and via different access technologies. Further, one can apply our passive and semi-passive techniques when the fingerprinted device is behind a NAT or firewall, and also when the device’s system time is maintained via NTP or SNTP. One can use our techniques to obtain information about whether two devices on the Internet, possibly shifted in time or IP addresses, are actually the same physical device. Example applications include: computer forensics; tracking, with some probability, a physical device as it connects to the Internet from different public access points; counting the number of devices behind a NAT even when the devices use constant or random IP IDs; remotely probing a block of addresses to determine if the addresses correspond to virtual hosts, e.g., as part of a virtual honeynet; and unanonymizing anonymized network traces.

And an article. Really nice work.

Posted on March 7, 2005 at 3:02 PMView Comments

1 4 5 6

Sidebar photo of Bruce Schneier by Joe MacInnis.