Schneier on Security
A blog covering security and security technology.
« TrackMeNot |
| What the Terrorists Want »
August 23, 2006
Privacy Risks of Public Mentions
Interesting paper: "You are what you say: privacy risks of public mentions," Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 2006.
In today's data-rich networked world, people express many aspects of their lives online. It is common to segregate different aspects in different places: you might write opinionated rants about movies in your blog under a pseudonym while participating in a forum or web site for scholarly discussion of medical ethics under your real name. However, it may be possible to link these separate identities, because the movies, journal articles, or authors you mention are from a sparse relation space whose properties (e.g., many items related to by only a few users) allow re-identification. This re-identification violates people's intentions to separate aspects of their life and can have negative consequences; it also may allow other privacy violations, such as obtaining a stronger identifier like name and address.This paper examines this general problem in a specific setting: re-identification of users from a public web movie forum in a private movie ratings dataset. We present three major results. First, we develop algorithms that can re-identify a large proportion of public users in a sparse relation space. Second, we evaluate whether private dataset owners can protect user privacy by hiding data; we show that this requires extensive and undesirable changes to the dataset, making it impractical. Third, we evaluate two methods for users in a public forum to protect their own privacy, suppression and misdirection. Suppression doesn't work here either. However, we show that a simple misdirection strategy works well: mention a few popular items that you haven't rated.
Unfortunately, the paper is only available to ACM members.
EDITED TO ADD (8/24): Paper is here.
Posted on August 23, 2006 at 2:11 PM
• 23 Comments
To receive these entries once a month by e-mail, sign up for the Crypto-Gram Newsletter.
... available only to ACM members that are also either SIGIR members or ACM Digital Library subscribers, that is.
It's hard to tell from the abstract just how the "misdirection strategy" would solve the problem.
just a guess, the misdirection strategy works by populating the sparse relation space with spurious relations.
I'm looking into getting access via my employer; if I get it, I'll try to post my thoughts.
@ underpaid ACM member
Solve the problem? Is that really stated in the paper? I can imagine misdirection making the sparse relational space much more cluttered, thus introducing more variation and lower likelihood of re-identification. Yikes, some ugly terms in there...
I was wondering if you could take this a step further and make the links between the separate identities based on idiosyncrasies of the persons grammar, vocabulary, spelling errors, terminology, etc. Most people tend to speak and write in distinct patterns, but are those patterns unique enough to derive any relationships from one blog to another?
This posting is under my "standard pseudonym", which I use here and several other places. I have wondered at various times how hard it would be to connect this to my real name - I have at various points left clues* as to my real identity - which country I live in, information about my job etc. Access to private data from some websites would lead directly to my real-world e-mail address, so it would pose little challenge if (e.g.) the CIA for some reason wanted to find out who I was.
* Not deliberately left as clues, but because it was relevant to some point I was making.
Solve the problem? Strike no man, do no man wrong, be content with your wages. Problem solved.
how is this news?
it is blindingly obvious that using a similar name in various places will allow you to be linked.
or the fact that you discuss your school in one place, and your teachers in other, and then classmates in one more. that can all be linked? what a unique and interesting discovery... wow.
You're forgetting one thing: be innocent of thoughtcrime.
You're forgetting one thing: be innocent of thoughtcrime.
The misdirection strategy has an obvious downside, though: Suddenly Netflix is recommending movies you can't stand.
... oh and Britney Spears is the best!
"it is blindingly obvious that using a similar name in various places will allow you to be linked."
Yes, this is true but surely some pseudonym names will be easier to work with than others. If you have a highly unique online name then anybody could Google it to find your postings. If the name has lots of unrelated, irrelevant Google hits then picking out related information becomes rather harder.
To me, it seems that using online names that are common should be a little safer.
Perhaps I should use a one-shot disposable ID for my future comments like umhh ... http 404.
Why not modify this site to offer the option of a one-off, machine generated, random ID for truly anonymous comments that avoids the problems of multiple "anonymous" authors?
Yes, I know the idea is a bit of a gimmick but simply offering the choice might make people think a bit more about what exactly they are doing when they express opinons online.
Summary of paper:
In attempting to obfuscate your identity on ratings forums, if you recommend *unpopular crap*, you stand out like a sore thumb.
And Britney Spears Rocks!
Given the subject, isn't this only a case of cleverly avoiding shame? i guess everybody did some rambling somewhere, it is what makes us human you know: make mistakes and learn from them? I posted under many names, even under my own name on forums and boards.
If you are a politician and on some board you ramble about your private ideas and such, which will be used against you, i think this says more about our society as a whole then about the person itself.
Why not make mistakes, it proves you're human, not a machine.
So the obvious tactic is to avoid making posts. No posts, no cross-references, no trail of breadcrumbs back to you.
I may of forgot more than one thing but that's life.
"This posting is under my "standard pseudonym", which I use here and several other places. I have wondered at various times how hard it would be to connect this to my real name..."
Dunno how hard it is, but I had lots of fun trying to do it "(s)low-tech". Even if one of the possible names that came out of the attempts isn't you, the process still generated a list of readers of this blog (that might be put off by seeing their real names appear) that are close to your "profile". Great fun nonetheless. Unless you are well-known blogger yourself, I don't think I zeroed in on you.
I thought of this too.
We can't hear it anymore, but it's a tradeoff.
Avoiding every hint to my real interests would be too hard.
Better an Hieronymus Bosch face than none. ;)
I have no blog of my own, but write comments on those of others.
After my previous post, I tried Googling "Filias Cupio", and I was surprised at how many hits there were (540, I think.) For some of the hits, it looks like people have randomly harvested text (including my psudonym) from blogs and put it into spam as noise to confuse filters.
Hi, I'm the author. Thought I'd respond a little for fun.
"I was wondering if you could take this a step further and make the links between the separate identities based on idiosyncrasies of the persons grammar, vocabulary, spelling errors, terminology, etc."
These people did some work on that, and it did pretty well:
Novak, J., Raghavan, P., and Tomkins, A. 2004. Anti-Aliasing on the Web. In Proc. WWW04, pp. 30-39.
"how is this news?
it is blindingly obvious that using a similar name in various places will allow you to be linked."
True. However, the paper is about being identified by what you mention, not by what your name is.
"The misdirection strategy has an obvious downside, though: Suddenly Netflix is recommending movies you can't stand."
I agree. Personally, I don't like the misdirection strategy, and it didn't work 100%. It just seemed reasonable and interesting to study it.
I live in a giant bucket.
Schneier.com is a personal website. Opinions expressed are not necessarily those of BT.