Entries Tagged "anonymity"
Page 2 of 8
Google’s new ways to violate your privacy and — more importantly — how to opt out.
In this article, detailing the Australian and then worldwide investigation of a particularly heinous child-abuse ring, there are a lot of details of the pedophile security practices and the police investigative techniques. The abusers had a detailed manual on how to scrub metadata and avoid detection, but not everyone was perfect. The police used information from a single camera to narrow down the suspects. They also tracked a particular phrase one person used to find him.
This story shows an increasing sophistication of the police using small technical clues combined with standard detective work to investigate crimes on the Internet. A highly painful read, but interesting nonetheless.
Here’s the story of how it was done. First, a fake ad on torrent listings linked the site to a Latvian bank account, an e-mail address, and a Facebook page.
Using basic website-tracking services, Der-Yeghiayan was able to uncover (via a reverse DNS search) the hosts of seven apparent KAT website domains: kickasstorrents.com, kat.cr, kickass.to, kat.ph, kastatic.com, thekat.tv and kickass.cr. This dug up two Chicago IP addresses, which were used as KAT name servers for more than four years. Agents were then able to legally gain a copy of the server’s access logs (explaining why it was federal authorities in Chicago that eventually charged Vaulin with his alleged crimes).
Using similar tools, Homeland Security investigators also performed something called a WHOIS lookup on a domain that redirected people to the main KAT site. A WHOIS search can provide the name, address, email and phone number of a website registrant. In the case of kickasstorrents.biz, that was Artem Vaulin from Kharkiv, Ukraine.
Der-Yeghiayan was able to link the email address found in the WHOIS lookup to an Apple email address that Vaulin purportedly used to operate KAT. It’s this Apple account that appears to tie all of pieces of Vaulin’s alleged involvement together.
On July 31st 2015, records provided by Apple show that the me.com account was used to purchase something on iTunes. The logs show that the same IP address was used on the same day to access the KAT Facebook page. After KAT began accepting Bitcoin donations in 2012, $72,767 was moved into a Coinbase account in Vaulin’s name. That Bitcoin wallet was registered with the same me.com email address.
Interesting paper: “Anonymization and Risk,” by Ira S. Rubinstein and Woodrow Hartzog:
Abstract: Perfect anonymization of data sets has failed. But the process of protecting data subjects in shared information remains integral to privacy practice and policy. While the deidentification debate has been vigorous and productive, there is no clear direction for policy. As a result, the law has been slow to adapt a holistic approach to protecting data subjects when data sets are released to others. Currently, the law is focused on whether an individual can be identified within a given set. We argue that the better locus of data release policy is on the process of minimizing the risk of reidentification and sensitive attribute disclosure. Process-based data release policy, which resembles the law of data security, will help us move past the limitations of focusing on whether data sets have been “anonymized.” It draws upon different tactics to protect the privacy of data subjects, including accurate deidentification rhetoric, contracts prohibiting reidentification and sensitive attribute disclosure, data enclaves, and query-based strategies to match required protections with the level of risk. By focusing on process, data release policy can better balance privacy and utility where nearly all data exchanges carry some risk.
At the Apple Worldwide Developers Conference earlier this week, Apple talked about something called “differential privacy.” We know very little about the details, but it seems to be an anonymization technique designed to collect user data without revealing personal information.
What we know about anonymization is that it’s much harder than people think, and it’s likely that this technique will be full of privacy vulnerabilities. (See, for example, the excellent work of Latanya Sweeney.) As expected, security experts are skeptical. Here’s Matt Green trying to figure it out.
So while I applaud Apple for trying to improve privacy within its business models, I would like some more transparency and some more public scrutiny.
EDITED TO ADD (6/17): Here’s a slide deck on privacy from the WWDC.
Jonathan Mayer, Patrick Mutchler, and John C. Mitchell, “Evaluating the privacy properties of telephone metadata“:
Abstract: Since 2013, a stream of disclosures has prompted reconsideration of surveillance law and policy. One of the most controversial principles, both in the United States and abroad, is that communications metadata receives substantially less protection than communications content. Several nations currently collect telephone metadata in bulk, including on their own citizens. In this paper, we attempt to shed light on the privacy properties of telephone metadata. Using a crowdsourcing methodology, we demonstrate that telephone metadata is densely interconnected, can trivially be reidentified, and can be used to draw sensitive inferences.
New research, but not a new result. There have been several similar studies over the years. This one uses only anonymized call and SMS metadata to identify people who volunteered for the study.
This research shows how to track e-commerce users better across multiple sessions, even when they do not provide unique identifiers such as user IDs or cookies.
Abstract: Targeting individual consumers has become a hallmark of direct and digital marketing, particularly as it has become easier to identify customers as they interact repeatedly with a company. However, across a wide variety of contexts and tracking technologies, companies find that customers can not be consistently identified which leads to a substantial fraction of anonymous visits in any CRM database. We develop a Bayesian imputation approach that allows us to probabilistically assign anonymous sessions to users, while ac- counting for a customer’s demographic information, frequency of interaction with the firm, and activities the customer engages in. Our approach simultaneously estimates a hierarchical model of customer behavior while probabilistically imputing which customers made the anonymous visits. We present both synthetic and real data studies that demonstrate our approach makes more accurate inference about individual customers’ preferences and responsiveness to marketing, relative to common approaches to anonymous visits: nearest- neighbor matching or ignoring the anonymous visits. We show how companies who use the proposed method will be better able to target individual customers, as well as infer how many of the anonymous visits are made by new customers.
Interesting blog post:
We are able to de-anonymize executable binaries of 20 programmers with 96% correct classification accuracy. In the de-anonymization process, the machine learning classifier trains on 8 executable binaries for each programmer to generate numeric representations of their coding styles. Such a high accuracy with this small amount of training data has not been reached in previous attempts. After scaling up the approach by increasing the dataset size, we de-anonymize 600 programmers with 52% accuracy. There has been no previous attempt to de-anonymize such a large binary dataset. The abovementioned executable binaries are compiled without any compiler optimizations, which are options to make binaries smaller and faster while transforming the source code more than plain compilation. As a result, compiler optimizations further normalize authorial style. For the first time in programmer de-anonymization, we show that we can still identify programmers of optimized executable binaries. While we can de-anonymize 100 programmers from unoptimized executable binaries with 78% accuracy, we can de-anonymize them from optimized executable binaries with 64% accuracy. We also show that stripping and removing symbol information from the executable binaries reduces the accuracy to 66%, which is a surprisingly small drop. This suggests that coding style survives complicated transformations.
Here’s the paper.
And here’s their previous paper, de-anonymizing programmers from their source code.
Nicholas Weaver guessed this back in January.
The behavior of the researchers is reprehensible, but the real issue is that CERT Coordination Center (CERT/CC) has lost its credibility as an honest broker. The researchers discovered this vulnerability and submitted it to CERT. Neither the researchers nor CERT disclosed this vulnerability to the Tor Project. Instead, the researchers apparently used this vulnerability to deanonymize a large number of hidden service visitors and provide the information to the FBI.
Does anyone still trust CERT to behave in the Internet’s best interests?
EDITED TO ADD (12/14): I was wrong. CERT did disclose to Tor.
Sidebar photo of Bruce Schneier by Joe MacInnis.