De-Anonymizing Social Network Users

Interesting paper: “A Practical Attack to De-Anonymize Social Network Users.”

Abstract. Social networking sites such as Facebook, LinkedIn, and Xing have been reporting exponential growth rates. These sites have millions of registered users, and they are interesting from a security and privacy point of view because they store large amounts of sensitive personal user data.

In this paper, we introduce a novel de-anonymization attack that exploits group membership information that is available on social networking sites. More precisely, we show that information about the group memberships of a user (i.e., the groups of a social network to which a user belongs) is often sufficient to uniquely identify this user, or, at least, to significantly reduce the set of possible candidates. To determine the group membership of a user, we leverage well-known web browser history stealing attacks. Thus, whenever a social network user visits a malicious website, this website can launch our de-anonymization attack and learn the identity of its visitors.

The implications of our attack are manifold, since it requires a low effort and has the potential to affect millions of social networking users. We perform both a theoretical analysis and empirical measurements to demonstrate the feasibility of our attack against Xing, a medium-sized social network with more than eight million members that is mainly used for business relationships. Our analysis suggests that about 42% of the users that use groups can be uniquely identified, while for 90%, we can reduce the candidate set to less than 2,912 persons. Furthermore, we explored other, larger social networks and performed experiments that suggest that users of Facebook and LinkedIn are equally vulnerable (although attacks would require more resources on the side of the attacker). An analysis of an additional five social networks indicates that they are also prone to our attack.

News article. Moral: anonymity is really, really hard—but we knew that already.

Tags: academic papers, anonymity, de-anonymization, Facebook, identification, LinkedIn, privacy, social media, web privacy

Posted on March 8, 2010 at 6:13 AM • 30 Comments

Comments

Randal • March 8, 2010 6:56 AM

No Facebook or other such account. History stealing attacks protected against (NoScript, SafeHistory, common sense in avoiding malicious sites, updated software). Ever-changing, deliberately common nicknames firewalled from each other. Group memberships that do not have anyone in common (and few such things to begin with).

Good luck. I think they’ll have a hard time figuring out who I am. Fortunately, nobody has much of a reason to care.

Winter • March 8, 2010 7:29 AM

@Randal:
“I think they’ll have a hard time figuring out who I am.”

I think they are only interested in people with a social life. Maybe (generic) you do not qualify as interesting.

But in general, if you want to compartmentalize your online life, that is not that difficult. It simply requires consistency. Oh wait, that is difficult 🙂

But a browser that leaks browser history to attack sites is bad on all levels.

Winter

BF Skinner • March 8, 2010 7:35 AM

We’ve got an election coming up and if history gives us any baseline then the presidential campaign starts 1/1/11.

The Obama campaign was very sucessful in exploiting information given to them.

FB is forever seeking new revenue. They might tie something up in a nice neat app.

Even Randal may be of interest to these folks.

AlanS • March 8, 2010 8:30 AM

As you say “we knew that already”. Let’s move on to what this means.

Here’s an important paper from last year that looks at the larger implications and possible responses to the failure of anonymization:
Ohm, Paul. 2009. “Broken Promises of Privacy: Responding to the Surprising Failure of Anonymization.” SSRN eLibrary. http://papers.ssrn.com/sol3/papers.cfm?abstract_id=1450006

AlanS • March 8, 2010 8:42 AM

Ohm’s summary of what’s happened:

“Reidentification science disrupts the privacy policy landscape by undermining the faith that we have placed in anonymization. This is no small faith, for technologists rely on it to justify sharing data indiscriminately and storing data perpetually, all while promising their users (and the world) that they are protecting privacy. Advances in reidentification expose these promises as too often illusory. These advances should trigger a sea change in the law, because nearly every information privacy law or regulation grants a get-out-of-jail free card to those who anonymize their data….Yet reidentification science exposes the underlying promise made by these laws—that anonymization protects privacy—as an empty one, as broken as the technologists’ promises. At the very least, lawmakers must reexamine every privacy law, asking whether the power of reidentification and fragility of anonymization have thwarted their original designs.

The power of reidentification also transforms the public policy debate over information privacy. Today, this debate centers almost entirely on squabbles over magical phrases like “personally identifiable information” (PII) or “personal data.” Advances in reidentification expose how thoroughly these phrases miss the point. Although it is true that a malicious adversary can use PII like a name or social security number to link data to identity, as it turns out, the adversary can do the same thing using information that nobody would classify as personally identifiable…. These studies [Sweeny, Netflix study, etc.] and others like them sound the death knell for the idea that we protect privacy when we remove PII from our databases. This idea, which has served as the central focus of information privacy law for almost forty years, is a fallacy that has run its course and must now yield to something else.”

David • March 8, 2010 8:54 AM

My Facebook policy is that I make no attempt to hide who I am, and nothing goes on FB that I don’t want the whole world to know (if it cares). This limits its usefulness a touch, as there are things I happily discuss with friends and family that I don’t want the whole planet to know about, but it’s still useful to me.

Sometimes the best way to hide a secret is to talk a lot, in public, about everything else.

GregW • March 8, 2010 9:05 AM

I was talking to a mental health professional this weekend who was extremely concerned about the sensitivity of data being required for them to put into online computer systems and she asked me if it can be kept securely.

I had to say no (actually “we’re doomed!”), and I didn’t even get into the perils of de-anonymization (Walgreens knows you bought/took drug X which often indicates the diagnosis or condition involved) and difficulties of securing information.

How do we protect the vast volumes of healthcare data that are starting to go online, e.g. about mental healthcare drugs or prescriptions taken? (The HIPPA regulations seem even less prescriptive than the CHIP&PIN spec, so I haven’t put much trust in them.) How do we make sure health insurance companies or other interested parties don’t get visibility into this information and abuse that? (Via denial or rate-setting for preexisting conditions, etc.) Are there any good answers here? Am I being too skeptical/cynical?

John • March 8, 2010 9:31 AM

@Randal

“Good luck. I think they’ll have a hard time figuring out who I am. Fortunately, nobody has much of a reason to care.”

Oddly enough, if you put “Randal”, “cryptography”, and “privacy” into Google, a large portion of the hits point to a “Randall Atkinson” who seems to be a top-level researcher in the field (who evidently had a lot to do with IPv6). I’m not suggesting that you are actually the same person since you obviously could have chosen the handle “Randal” at random. However, it is weird how just a few terms pointed to a single individual.

Winter • March 8, 2010 9:33 AM

@GregW
“My Facebook policy is that I make no attempt to hide who I am, and nothing goes on FB that I don’t want the whole world to know (if it cares).”

You missed the point. This is definitely not about what you post on social sites.

This is about identifying you, who read this comment, by name, by only using your browsing habits.

The point is that Bruce would be able to determine who you are on return visits by extracting information from your browser (eg, browser id string, fonts loaded etc) and link this unique visitor to the real world by browser history (what sites did you visit).

If you have visited any social sites, Bruce might get information on the individuals whose pages you read. From knowing (some of) your friends, you could be identified from your FB friends list.

Solutions: Forge your browser ID string and referer, randomly. Clean out your history every half hour. Use NoScript, RedirectCleaner, and Tor whenever you can. Use separate browsers for separate tasks.

In short, if you want to remain anonymous, you have to make your online life miserable. 😉

Winter

Winter • March 8, 2010 9:35 AM

@GregW should have been @David
Stupid error of me

travs90 • March 8, 2010 9:56 AM

@GregW

HIPAA. Not HIPPA. Sorry – doesn’t really matter. But, either way, hopefully no one can identify who I am.

Clive Robinson • March 8, 2010 10:13 AM

Once upon a time if you wanted a private conversation you stepped out back and walked a ways with the person.

If you wanted a little time to think or reflect you again stepped out back and walked a ways.

Non of this is possible in London, virtually every where you are allowed to walk without suspicion has CCTV on it.

You can not use a phone without details of where you are who you are calling and where they are going into a Data Base.

Worse still in the UK they are just about to roll out another “computer health care system” where your medical records are made available to just about anybody who wants to look.

I doubt the UK Gov are going to allow the fact that an individual has no privacy stand in the way of their grandious ideas.

So it is unlikley they will stop others either.

Sadly it’s a case of “nothing new to see move on, hurry up or we’ll arest you”.

HJohn • March 8, 2010 10:35 AM

@John: “Oddly enough, if you put “Randal”, “cryptography”, and “privacy” into Google, a large portion of the hits point to a “Randall Atkinson” who seems to be a top-level researcher in the field (who evidently had a lot to do with IPv6). I’m not suggesting that you are actually the same person since you obviously could have chosen the handle “Randal” at random. However, it is weird how just a few terms pointed to a single individual.”

Good observation.

Mathematically, what seems like a lot gets trimmed down fairly easily. Sort of like how a binary search can find 1 record in a million by looking at about 20, since every look eliminates half the population. Similarly, you can start in a country with 300 million, and as you learn more about the person you eliminate 50% to 99% of the remaining population with each inquiry. How many men named Robert are: 1) 45 years old, 2) married to a 42 year old named Jane, 3) have a 13 year old daughter named Kim, 4) live in Ocala, FL, 5) work for a municipal government. Etc. Narrows it down quickly.

I wrote an article about child safety online several years ago that explained this to youngsters. They may think they are hard for people that talk to online to find, but small bits of information can do the trick. My example was “How many 10 year old kids named Billy wear number 23 and play right field for the Cubs in a town’s little league?” Knowing the town wouldn’t make this kid hard to find. (P.S. I wrote it under a pen name to follow my own advice.)

Summer • March 8, 2010 10:39 AM

I think there’s a weakness in the approach the authors use to identify network participants. In particular: they are matching up nodes with similar numbers of edges, looking for congruent graphs in both data sets. It seems pretty easy to defeat such an algorithm by compartmentalizing your social network (friends on Facebook, business colleagues on LinkedIn), or by maintaining multiple accounts on various social networks.

So, I’m not that worried. I think this particular paper isn’t as worrisome as other more basic de-anonymizing practices.

Clive Robinson • March 8, 2010 10:42 AM

@ GregW,

“Walgreens knows you bought/took drug X which often indicates the diagnosis or condition involved”

Oh that it was just that…

The medical proffesion has a habit ot changing the use of a drug.

For instance there are people taking v-i-agra for what it was originaly intended (not what it is mainly used for).

Some once quite serious sounding anti-depresents are now routienly prescriped for other afflictions such as shingles.

Some people have drugs for pain managment of old injuries that are basicaly opiates under fancy names. Which can cause them problems with some employers (who insist on drugs checks).

On having to go into hospital myself very recently, I was very surprised to see what appeared on my drugs chart and had to tell the doctors I was most definatly not taking those particular drugs.

After a conversation it turns out the pharmacy computer had added them when printing out my computerised drugs list because they where on my records.

I had to tell them that something was most definatly wrong as I had not ever taken some of them, and and atleast one of them I had a known alergy to.

Turns out it had come from a “computer records update”…

Which begs not just the question of who else will get drugs they have never had before…

But who else has my wrong drugs information…

To say it is scary is an understatment.

Charles • March 8, 2010 10:53 AM

Dear god. If you try to read this PDF you’ll go blind. It’s an interesting proof-of-concept though, and it’s all the more reason to avoid joining groups like “1 million strong for “. To take it a step further, you can basically do something like this…

Get Facebook user to join a group or data mine groups that already exist for that user (if they’re hidden you can find out if they exist)
Compare intersection of those group results to figure out a more interesting personality and perhaps friend profile of the user.
Attempt to get said user to click on malicious URL links that can tell you a lot more about their physical location, where they are coming from, etc.
Friend their friends, use a “hot girl attack”, etc to be accepted into their friends list, etc to stalk them further.

pseud-anonymous dummy • March 8, 2010 12:28 PM

@Charles 10:53: I don’t really understand what you’re saying. Could you spell it out for a dummy?
In connection with the issue of FB etc and privacy, I recently had the following problem. I had received some spam email from a company with an actual name (i.e. it exists) WITH WHICH I HAD NEVER DEALT EVER. I clicked the unsubscribe link. Then I get an invitation generated by a staff member of the company join Facebook. That FB invitation lists three of my other email correspondents as members of FB who are known to me. After some investigation I find out that what appears to have happened is that those three email correspondents had accepted to have FB scan their email accounts and that FB has archived ALL their email correspondent data. Then when the fourth person generated the invitation, FB scanned its database and came up with the three people on FB with whom I had corresponded. Voilà: three friends waiting for me to join them on FB. The amount of social networking data that these sites must have is scary!

Craig • March 8, 2010 1:37 PM

This is a great article because social networking will be very prominent in the future, and information from these sites will be invaluable.

Modelling of users preferences and habits, etc will be extracted legally or illegally with the touch of a button.

Google is analysing millions of users habits with their google analytics, one part of me is afraid of what this could possibly lead to, but the other half of me has an overwhelming feeling to know because it is so interesting I can’t help but look through the information it provides?

Social networking works and will be here to stay whatever may be the security issues.

Craig Lawton • March 8, 2010 6:53 PM

The cat’s out of the bag – regulation to follow down the track somewhere. Be interesting how it gets handled at an international level with all this data in the US.

Peter E Retep • March 8, 2010 7:37 PM

Once technology can do it,
regulations only forbid volunteers from doing it,
or the government from openly doing it to provide legal evidence.
Every other government and NGO interested will do it.
If European citizens imagine their laws and regulations prevent
anyone else from vacuuming and correlating their correspondance,
or decrypting it for gain or advantage,
they’re more deluded than we imagine.

Clive Robinson • March 8, 2010 11:20 PM

@ Peter E Retep,

“If European citizens imagine their laws and regulations prevent anyone else from vacuuming and correlating their correspondance, or decrypting it for gain or advantage, they’re more deluded than we imagine.”

Deluded no, lied to yes.

The best law principle to deal with all “personal data” not just PII is the “data subject owns the data”.

It was how it was supposed to be in the UK and EU but in the US it’s always been the “data holder owns the data”.

In the UK for instance most private health care providers have two small clause at the end of the contract you sign stating that,

1, You give them access to “any and all data” they “think” they need to verify use of the service.

2, You allow them to share the data with other organisations as part of the verifying process.

What they don’t make clear is that one such organisation is an information clearing house in the US…

Thus on signing the form you have consented to give up “any and all your personal information” to the US based clearing house…

So much for “safe harbour” laws. The worst part of it is in the UK certain Government Ministers (Patrica Hewitt and the other Blair “puke babes”) activly sought to undermine the Data Protection Legislation for what is effectivly personal gain.

And of course we had certain US three letter agencies come around to scare the rest of the EU into “terrorists under the bed”.

And of course their is “call center out sourcing” where banks amongst others have call centers in countries around the other side of the glob who might have data protection laws but they are irrelavant, because one set of “Data Subject PII) is worth about the same as a days wages at the data center and about 10 times that of a low end manual labour worker…

Data protection of personal information is a Pandora’s box where the lid has not just been lifted a little, it has been irreparably blown off by those who chose to base their business model on trading all personal information not just PII. And they care not it it is legaly or illegaly, as the industry “launders” and cross trades it all. And you as an individual have no choice you have to hand it over in order to be a part of modern society…

And for those living in the US just remember the FOI is a double edged weapon, there are those using it to get at Government data on you that legaly you have to supply to the US Government…

Someone • March 9, 2010 12:01 AM

I have an idea: why not just stop using social networking sites? Remember a time when we had friends that we actually met in person and talked to in person without an e-mail program or a web browser in between us?

Clive Robinson • March 9, 2010 3:53 AM

@ Someone,

“Remember a time when we had friends that we actually met in person and talked to in person without an e-mail program or a web browser in between us?”

Guess what we were actually looking forward to our “connected world” back then…

We even had songs,

We’ll always be together,
No matter how far it seams,
We’ll always be together,
Living in electric dreams.

Phil Oakly “Electric Dreams”.

Randal • March 9, 2010 3:58 AM

@John

Weird. This is only the second time I have ever used the name Randal that I can remember (it’s bad enough that it took me a moment to remember that I wrote that first comment). I didn’t intentionally make it look like that. Even when I post here, I generally pick a common first name at random. Now that you mention it, though, I do see one minor flaw in my selections so far, but it’s easily correctable.

For those saying that “don’t use Facebook” is the answer, the problem with “post nothing on FB that you don’t want others to see” fails when you have friend who do it for you.

Even I have a FB group named after me (or rather, a unique pseudonym), of which I am not a member. Which is really weird considering that I’m about as much of a nobody as you can be.

Sometimes, part of me wonders if I attract more attention by trying not to attract attention than I would ever warrant in the first place…

Vipul S. Chawathe • March 9, 2010 10:56 AM

Why do individuals, who want to be anonymous, socialize on networks?

Eve • March 9, 2010 5:00 PM

@Vipul

Because they’re human.

martinr • March 9, 2010 6:14 PM

One does not need a lot of information to dinstinguish — and therefore recognize individuals.

The number of characteristics of a thumbprint is low (13 to 15 I believe) and the number of characteristics of a DNA fingerprint is low (9 to 13 I believe).

I’m pretty sure that you can do fingerprinting for many of weekly shopping baskets — if you regularly buy more than 10 items and at least 7 of them are fairly constant. The supermarket does not need an of the customer affiliation programs (or “payback” here in germany) in order to recognize you. And if you pay one of these with a bank or credit card, they can even attach a name to that shopping profile.

To be recognizable, you only need to keep visiting a certain amount of web-sites regularly (~10-15) with the same browser. Being a member of specific social networking sites may facilitate attaching a real name to your characteristics (in the exact same fashion for a large group of users), but that is definitely not a prerequisite, it will work with any single site that requires membership, as well as with any site where you post/publish/upload information and supply personally identifying information voluntarily.

Peter E Retep • March 9, 2010 8:41 PM

@ Clive

I was simply alluding to the difference
between permission and potential praxis,
rather than rendering a diagnosis on European psyches, individually.
There is room, I think, for collective delusions in the sef-conversations of nations.
Thanks for fledging out a legal loophole that aligns with the technical reality.

James • March 12, 2010 11:45 AM

@Randal,

You mention using SafeHistory. As far as I can tell this isn’t compatible with any version of Firefox from 3.0 onwards. Are you using it with 3.x without issues? Or are you using an out-of-date version of Firefox to do your browsing – something that could introduce security holes and make your browser characteristics more distinctive?

Randal • March 16, 2010 3:17 PM

“Moral: anonymity is really, really hard — but we knew that already.”

I think it’s easier than this over wrought sensationalism. Any of these identifying vectors like the ‘social networking/browser’ ID concept stunt can be forged or randomized, which will give a would be attacker even worse info than no result, a false positive ID. Anonymity is hard in real life, but trivial for someone that knows what they’re doing online.

Schneier on Security

Comments

Leave a comment Cancel reply