Reidentifying Anonymous Data
Latanya Sweeney has demonstrated how easy it can be to identify people from their birth date, gender, and zip code. The anonymous data she reidentified happened to be DNA data, but that’s not relevant to her methods or results.
Of the 1,130 volunteers Sweeney and her team reviewed, about 579 provided zip code, date of birth and gender, the three key pieces of information she needs to identify anonymous people combined with information from voter rolls or other public records. Of these, Sweeney succeeded in naming 241, or 42% of the total. The Personal Genome Project confirmed that 97% of the names matched those in its database if nicknames and first name variations were included.
Her results are described here.
Madeleine Ball • May 8, 2013 5:38 PM
Unfortunately the article has several inaccuracies, the most important of which is a misrepresentation of the Personal Genome Project: we describe our project as “non-anonymous”, participants should understand their data is highly identifiable. See my blog post here:
http://blog.personalgenomes.org/2013/05/02/a-very-personal-genome-project/
We do not encourage participants to scrub data, if they are uncomfortable with being identifiable we strongly advise them to consider leaving the project. Melissa Gymrek and Yaniv Erlich have demonstrated that genetic data alone is highly identifying. Thus our participants are a select group that are comfortable with the risk of being identified. (Indeed, many skip the suspense and identify themselves from the outset.) It’s a radically different approach to sharing biomedical research data, I’d be happy to tell you more about it.
Reporting on the Sweeney group’s work was also criticized here: http://blogs.law.harvard.edu/infolaw/2013/05/01/reporting-fail-the-reidentification-of-personal-genome-project-participants/