Bruce Schneier

 
 

Schneier on Security

A blog covering security and security technology.

« TrackMeNot | Main | What the Terrorists Want »

August 23, 2006

Privacy Risks of Public Mentions

Interesting paper: "You are what you say: privacy risks of public mentions," Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 2006.

Abstract:

In today's data-rich networked world, people express many aspects of their lives online. It is common to segregate different aspects in different places: you might write opinionated rants about movies in your blog under a pseudonym while participating in a forum or web site for scholarly discussion of medical ethics under your real name. However, it may be possible to link these separate identities, because the movies, journal articles, or authors you mention are from a sparse relation space whose properties (e.g., many items related to by only a few users) allow re-identification. This re-identification violates people's intentions to separate aspects of their life and can have negative consequences; it also may allow other privacy violations, such as obtaining a stronger identifier like name and address.This paper examines this general problem in a specific setting: re-identification of users from a public web movie forum in a private movie ratings dataset. We present three major results. First, we develop algorithms that can re-identify a large proportion of public users in a sparse relation space. Second, we evaluate whether private dataset owners can protect user privacy by hiding data; we show that this requires extensive and undesirable changes to the dataset, making it impractical. Third, we evaluate two methods for users in a public forum to protect their own privacy, suppression and misdirection. Suppression doesn't work here either. However, we show that a simple misdirection strategy works well: mention a few popular items that you haven't rated.

Unfortunately, the paper is only available to ACM members.

EDITED TO ADD (8/24): Paper is here.

Posted on August 23, 2006 at 02:11 PM23 CommentsView Blog Reactions

To receive these entries once a month by e-mail, sign up for the Crypto-Gram Newsletter.

Comments

... available only to ACM members that are also either SIGIR members or ACM Digital Library subscribers, that is.

It's hard to tell from the abstract just how the "misdirection strategy" would solve the problem.

Posted by: underpaid ACM member at August 23, 2006 04:04 PM


just a guess, the misdirection strategy works by populating the sparse relation space with spurious relations.

Posted by: another_bruce at August 23, 2006 04:16 PM


I'm looking into getting access via my employer; if I get it, I'll try to post my thoughts.

@ underpaid ACM member
Solve the problem? Is that really stated in the paper? I can imagine misdirection making the sparse relational space much more cluttered, thus introducing more variation and lower likelihood of re-identification. Yikes, some ugly terms in there...

Posted by: zencoder at August 23, 2006 04:24 PM


I was wondering if you could take this a step further and make the links between the separate identities based on idiosyncrasies of the persons grammar, vocabulary, spelling errors, terminology, etc. Most people tend to speak and write in distinct patterns, but are those patterns unique enough to derive any relationships from one blog to another?

Posted by: Anonymous at August 23, 2006 04:45 PM


Or you could just download the PDF from the author's web site: http://www-users.cs.umn.edu/~dfrankow/pubs.htm

Posted by: Evan at August 23, 2006 04:51 PM


or get the video of his talk at http://tinyurl.com/l3r2u (google vid 78mb avi)

Posted by: Savek at August 23, 2006 05:27 PM


This posting is under my "standard pseudonym", which I use here and several other places. I have wondered at various times how hard it would be to connect this to my real name - I have at various points left clues* as to my real identity - which country I live in, information about my job etc. Access to private data from some websites would lead directly to my real-world e-mail address, so it would pose little challenge if (e.g.) the CIA for some reason wanted to find out who I was.

* Not deliberately left as clues, but because it was relevant to some point I was making.

Posted by: Filias Cupio at August 23, 2006 06:22 PM


Solve the problem? Strike no man, do no man wrong, be content with your wages. Problem solved.

Posted by: Jim at August 23, 2006 06:47 PM


how is this news?

it is blindingly obvious that using a similar name in various places will allow you to be linked.

or the fact that you discuss your school in one place, and your teachers in other, and then classmates in one more. that can all be linked? what a unique and interesting discovery... wow.

Posted by: silkio at August 23, 2006 07:16 PM


@Jim

You're forgetting one thing: be innocent of thoughtcrime.

Posted by: Dido Sevilla at August 23, 2006 10:12 PM


@Jim

You're forgetting one thing: be innocent of thoughtcrime.

Posted by: Dido Sevilla at August 23, 2006 10:13 PM


The misdirection strategy has an obvious downside, though: Suddenly Netflix is recommending movies you can't stand.

Posted by: Michael Hampton at August 24, 2006 01:27 AM


... oh and Britney Spears is the best!

Posted by: Savek at August 24, 2006 01:52 AM


@silkio
"it is blindingly obvious that using a similar name in various places will allow you to be linked."
Yes, this is true but surely some pseudonym names will be easier to work with than others. If you have a highly unique online name then anybody could Google it to find your postings. If the name has lots of unrelated, irrelevant Google hits then picking out related information becomes rather harder.
To me, it seems that using online names that are common should be a little safer.

@Bruce
Perhaps I should use a one-shot disposable ID for my future comments like umhh ... http 404.
Why not modify this site to offer the option of a one-off, machine generated, random ID for truly anonymous comments that avoids the problems of multiple "anonymous" authors?
Yes, I know the idea is a bit of a gimmick but simply offering the choice might make people think a bit more about what exactly they are doing when they express opinons online.

Posted by: http 404 at August 24, 2006 02:46 AM


Summary of paper:

In attempting to obfuscate your identity on ratings forums, if you recommend *unpopular crap*, you stand out like a sore thumb.

And Britney Spears Rocks!

Posted by: news@11 at August 24, 2006 02:46 AM


Given the subject, isn't this only a case of cleverly avoiding shame? i guess everybody did some rambling somewhere, it is what makes us human you know: make mistakes and learn from them? I posted under many names, even under my own name on forums and boards.

If you are a politician and on some board you ramble about your private ideas and such, which will be used against you, i think this says more about our society as a whole then about the person itself.

Why not make mistakes, it proves you're human, not a machine.

Posted by: Jungsonn at August 24, 2006 04:39 AM


So the obvious tactic is to avoid making posts. No posts, no cross-references, no trail of breadcrumbs back to you.

Oh, crud...

Posted by: Panzerkirchetort at August 24, 2006 08:52 AM


I may of forgot more than one thing but that's life.

Posted by: Jim at August 24, 2006 08:55 AM


@Filias
"This posting is under my "standard pseudonym", which I use here and several other places. I have wondered at various times how hard it would be to connect this to my real name..."

Dunno how hard it is, but I had lots of fun trying to do it "(s)low-tech". Even if one of the possible names that came out of the attempts isn't you, the process still generated a list of readers of this blog (that might be put off by seeing their real names appear) that are close to your "profile". Great fun nonetheless. Unless you are well-known blogger yourself, I don't think I zeroed in on you.

Posted by: Bloodhound at August 24, 2006 04:01 PM


I thought of this too.
We can't hear it anymore, but it's a tradeoff.
Avoiding every hint to my real interests would be too hard.
Better an Hieronymus Bosch face than none. ;)

Posted by: Stefan Wagner at August 24, 2006 11:00 PM


@Bloodhound:
I have no blog of my own, but write comments on those of others.

After my previous post, I tried Googling "Filias Cupio", and I was surprised at how many hits there were (540, I think.) For some of the hits, it looks like people have randomly harvested text (including my psudonym) from blogs and put it into spam as noise to confuse filters.

Posted by: Filias Cupio at August 24, 2006 11:18 PM


Hi, I'm the author. Thought I'd respond a little for fun.

"I was wondering if you could take this a step further and make the links between the separate identities based on idiosyncrasies of the persons grammar, vocabulary, spelling errors, terminology, etc."

These people did some work on that, and it did pretty well:

Novak, J., Raghavan, P., and Tomkins, A. 2004. Anti-Aliasing on the Web. In Proc. WWW04, pp. 30-39.


"how is this news?

it is blindingly obvious that using a similar name in various places will allow you to be linked."

True. However, the paper is about being identified by what you mention, not by what your name is.

"The misdirection strategy has an obvious downside, though: Suddenly Netflix is recommending movies you can't stand."

I agree. Personally, I don't like the misdirection strategy, and it didn't work 100%. It just seemed reasonable and interesting to study it.

Dan

Posted by: Anonymous at September 15, 2006 09:45 AM


I live in a giant bucket.

Posted by: Iamabanana at October 22, 2007 11:10 PM


Post a comment



Real names aren't required, but please give us something to call you. Conversations among several people called "Anonymous" get too confusing.



E-mail is optional and will not be displayed on the site.


Remember Me?


Powered by Movable Type 3.2. Photo at top by Steve Woit.

Schneier.com is a personal website. Opinions expressed are not necessarily those of BT Counterpane.

 
Bruce Schneier