Schneier on Security
A blog covering security and security technology.
« Learning About Giant Squid From Sperm Whale Stomachs |
| Definition of "Weapon of Mass Destruction" »
April 6, 2009
Identifying People using Anonymous Social Networking Data
Computer scientists Arvind Narayanan and Dr Vitaly Shmatikov, from the University of Texas at Austin, developed the algorithm which turned the anonymous data back into names and addresses.
The data sets are usually stripped of personally identifiable information, such as names, before it is sold to marketing companies or researchers keen to plumb it for useful information.
Before now, it was thought sufficient to remove this data to make sure that the true identities of subjects could not be reconstructed.
The algorithm developed by the pair looks at relationships between all the members of a social network -- not just the immediate friends that members of these sites connect to.
Social graphs from Twitter, Flickr and Live Journal were used in the research.
The pair found that one third of those who are on both Flickr and Twitter can be identified from the completely anonymous Twitter graph. This is despite the fact that the overlap of members between the two services is thought to be about 15%.
The researchers suggest that as social network sites become more heavily used, then people will find it increasingly difficult to maintain a veil of anonymity.
In "De-anonymizing social networks," Narayanan and Shmatikov take an anonymous graph of the social relationships established through Twitter and find that they can actually identify many Twitter accounts based on an entirely different data source—in this case, Flickr.
One-third of users with accounts on both services could be identified on Twitter based on their Flickr connections, even when the Twitter social graph being used was completely anonymous. The point, say the authors, is that "anonymity is not sufficient for privacy when dealing with social networks," since their scheme relies only on a social network's topology to make the identification.
The issue is of more than academic interest, as social networks now routinely release such anonymous social graphs to advertisers and third-party apps, and government and academic researchers ask for such data to conduct research. But the data isn't nearly as "anonymous" as those releasing it appear to think it is, and it can easily be cross-referenced to other data sets to expose user identities.
It's not just about Twitter, either. Twitter was a proof of concept, but the idea extends to any sort of social network: phone call records, healthcare records, academic sociological datasets, etc.
Here's the paper.
Posted on April 6, 2009 at 6:51 AM
• 41 Comments
To receive these entries once a month by e-mail, sign up for the Crypto-Gram Newsletter.
Anonymity and privacy are myth's perpetrated by an industry of fanatics or philisophical purists. We never had either, we will never have either. My suggestion is that we should live a life that we are not ashamed of or else bear the consequenses of our actions. We should also remember that we are generally not as fascinating to others as we may think we are.
It is exactly this issue that has worried me for over ten years about RFIDs, Contactless Payment cards and Mobile Phones.
The simple fact is when RFIDs get put in what you wear and what you carry by the manufacture you effectivly become tagged.
With enough of these tags (around ten on each person) you become uniquly identifiable where ever you go.
This means that cheap door frame scanners (to prevent stock loss) can also be used to build up a vast amount of information on individuals habits (shopping etc).
Contary to what you think,
"We should also remember that we are generally not as fascinating to others as we may think we are."
The increasing use of "directed/personalised" junk mail" etc says otherwise.
Further the ideas behind DNA and other DBs some Governments want to set up clearly indicate a change of a citizens legal status.
Once you where "innocent untill proven guilty" and "secure in your homes and papers from unwarented search and seizure". Arguably this is most definatly not the case in the UK and likewise the US.
It does not take a great leap of the imagination to realise that such DBs containing "social networks" (in the general sense) and time based location information will become steadily of more use to LEA's and their associates.
In London we have already seen the Police use "Oyster Travel Cards" issued to 11 year olds and up for free transport being used as methods of identification. Also some cases indicate that some Police have used the fact that an adolecent is not carrying one as a sign of suspicion...
I would've loved to have been on that IRB commitee. They probably had some rather interesting conversations.
"My suggestion is that we should live a life that we are not ashamed of or else bear the consequenses of our actions." This is just another way of saying "if you have nothing to hide..." . Unfortunately (or rather: fortunately) even the most innocent men and women have something to hide. It's called privacy sphere and it is the reason we wear cloths even when it is warm. Only the peeping toms want to tell us it's our fault when we feel embarrassed because someone is spying on our bedroom. Our private life is called private life rather than public life for a reason.
Can it help anonymity if value of personal information is going to zero by mixing it with disinformation on regular basis?
@Vladimir: "Can it help anonymity if value of personal information is going to zero by mixing it with disinformation on regular basis?"
Perhaps, but I'd be careful about the disinformation. It's one thing to use fake names and junk email addresses for certain things, but I'd hesistate to associate my actual name with disinformation. May be tough to disprove disinformation when it appears with one's signature or online acknowledgement.
Once, my name was mispelled on paperwork, and I started receiving junk mail for the misspelled name. I started changing how I presented my name at times (i.e., John Q Doe, J Quincy Doe, Jon Doe, Dough, etc.) just to see who sold my information to others. It was quite interesting to say the least. My dogs also receive junk mail.
It is not so much that the data is out there all about you, and that it doesn't represent anything you have to hide...now. When later someone comes along and changes what existing (old) data means, so you can now be intimidated, controlled, or jailed, that the realization comes home to roost. If there were no secret data, that is, if the data of the officials (et al) were as public as our data, then we might stand half a chance.
"Anonymity and privacy are myth's perpetrated by an industry of fanatics or philisophical purists. We never had either, we will never have either."
Philosophical purists? Hmmmm...
"Only a Sith Lord deals in absolutes." Obi Wan said, absolutely.
A good reason to not use Facebook or Myspace for any reason whatsoever. Only a handful of people need to know every single detail of your personal life, and putting all this information up on a big virtual bulletin board is just asking for pain.
For internet privacy:
AN.ON / JonDo offers mix cascades, a type of proxy chaining with layered encryption. The free ones exit on ports 80 and 443. If you pay, you can use the faster and more flexible ones. JonDos certifies the mixes - i.e. they sign a contract not to log data unless legally compelled to do so, and JonDos verifies their identities and tries to enforce the contract. Many mixes are reputable universities.
Tor offers software to build circuits out of volunteer Tor nodes from around the world. This is also a form of proxy chaining with layered encryption. Volunteers are not certified. Can exit on any port which the nodes permit exiting on. Also offers the possibility to anonymously host .onion sites.
I2p is in some ways similar to Tor, although many design decisions were different. Also, there is only one out-proxy: i2p's main function is to communicate with other i2p users and anonymously hosted .i2p sites.
All have their weaknesses of course, and .onion and .i2p sites in particular are vulnerable to intersection attacks.
For real life, there are wigs, temporary hair dyes, coloured contacts, hats, sunglasses, and trench coats. Of course these only help in situations when you are not likely to be asked to present your ID cards, such as simple shopping. And you should pay in cash, of course. Vulnerabilities include fingerprints, but these protections are sufficiently weak that it probably does not matter that much.
"Only a handful of people need to know every single detail of your personal life"
You're a very trusting person...
I view it as,
Apart from myself nobody and I do mean nobody needs to know anything more about me than I chose to tell them. And I'm a very chosey person at the best of times.
Look at it this way, if all your friends know each other and you make a silly social blunder they will all know about it and you have nowhere to hide untill the dust settles.
On a more extreme basis if you get divorced who gets custody of your "mutual friends" (if they even want to know either of you afterwards for fear of being seen to be taking sides...)
Humans are messy at the best of times their lives no less so, therefore you need private space and some friends private from each other.
So if I interpret some of the comments left here correctly. One should never register or attend any sort of social networking event (whether online or in person) or associate with any group of people because data might be collected about you that could be used to make your life less anonymous? And yet ... those comments were left on a blog posting whose main purpose is to foster discussion.
What I find even more amusing is that people here are rabidly discussing ways that you can and should protect yourself when in reality this is pretty much the case of closing the barn door after the horse has left.
It is certainly the case that social networking sites can be used to learn about people and gather information, but so can information we have far less control over: shopping habits, energy consumption, travel patterns, internet traffic, etc, etc, etc
The bulk of people aren't concerned because deep down, they realize they have no control, and rather than being the proud oak broken in the wind, they are the willow, bending and fluttering with the wind.
When the cultural winds shift, the broken wailing paranoids will still be broken and wailing, but the fluttering , willowy masses will simply bend in a new direction.
The only thing is that I don't know wether to be happy or sad about this :(
It is like closing the barn door after the horse has left only if a particular piece of private information is already public knowledge. You can still protect private information which is not yet public knowledge.
Internet usage and shopping habits are things you can keep private. (Well, not the amount of internet usage, but at least what you are doing on the internet.) See the proxy networks I posted links too, and pay for the things you shop for in cash. Sure, your security will not be absolute, but it never is. And yes, the store clerk will see what you look like (wigs, coloured contacts, et. al. will only mitigate that), but if you pay in cash, it will be very difficult for someone to build a long term profile about your shopping habits, and you run less risk of going in debt to the credit card company.
There are of course times when it is impossible to pay in cash, but doing so when possible will help.
And yes, energy consumption and travel patterns are difficult to keep private, unless you have your own generator and only walk / bike / use the bus, which is infeasible for most people.
"we should live a life that we are not ashamed of or else bear the consequences of our actions."
I cannot agree. Privacy can be safety, or at least peace of mind. What criminal (person you've disagreed with, unhappy ex, etc) is stalking your family using the phone book, social networks, and google maps?
Whats different with this compared to Government DNA databases etc, is that social networking sites are voluntary and by default, people only (should?) be putting up information which they are happy sharing in the public domain.
The internet provides the common person with a stage on which to perform. Everyone wants to be famous... but then complains when they're stalked.
People also complain about being stalked even when they are merely criticised in an entirely non-defamatory way.
I've even heard people complain about being stalked when the criticism was by private e-mail to them, rather than published on a blog or similar place.
Overall, the term stalking gets thrown around way too much, to the detriment of protesters and people who actually are stalked.
@Rodent: "A good reason to not use Facebook or Myspace for any reason whatsoever. "
I use Facebook, has been fun to get in touch with some old class mates. But some people are quite dumb on it. I have experience with a real life story in just the past month.
One of my best friend's wife came home one day and told him she wanted a divorce and forced him out of the house. She immediately removed him and all his family and friends off her Facebook friends list. But she has the seeing where "friends of friends" can view her profile, and not all her friends removed her husband from theirs (putting him in the "friends of friends" category).
So, thinking she would try to steamroll him in divorce court, he started printing her Facebook page every couple days. Among the goodies he has: her page one day with a picture of her and him and a status of "married," and her page the next day with a picture of her and another man and a status of "in a relationship." A date a week later when the other man moved in with her. A "one month anniversary" of their first time--two weeks after she told her husband she wanted a divorce (the "one month ago" was a date when she was supposed to be visiting her dying father). A few other tidbits, but you get the idea.
She cited "mental abuse" and wanted alimony, but he had proof of adultery and other comments saying "he was good to her and didn't deserve for her to cheat." Turns out, the only reason she wanted mental abuse is because she wouldn't have to wait--in their state, waiting period is 6 months for adultery.
In any case, point being that she put everything she didn't want him to know in writing, and configured the account (probably unintentionally) so he could see it. Good for him, but the point is how things can come back and bite someone.
Nice that they wrote a paper and did some research but is this really a surprise? Investigators have always leveraged a clue from one network into another network, and deduced identities from the cross-references.
@ Davi Ottenheimer,
"Investigators have always leveraged a clue from one network into another network, and deduced identities from the cross-references."
If only that was true...
It is an exceptional investigator who does this and only when the need is great enough and all other avenues of enquiry apear to be closed.
The reality is that most investigators know from simple sources (peoples own lips and easily tracable activity) the "who" and the "what".
Likewise very few people know how to, or can be bothered to hide things in a way an investigator has not seen before (simply because they assume they are not going to get caught).
Further unless the need is high the average investigator will have a case load that stops them following anything but the simplest enquires before moving on to other cases.
Fantastic post Bruce, so much informative stuff.
I manage a new social networking site called Beaver.com ( www.beaver.com ).
The entire premise of social networking is people using their real name. That's how people find you.
Hasn't everybody heard of the phone book.
I wouldn't say it's a surprise, but it's definitely significant that's it's been automated.
Now any goofball could potentially be an investigator.
Another decent proxy network for your consideration, pals.
Apparently, there are enough ppl concerned for at least two paid proxy networks and two volunteer networks to operate on market.
However, how much can one trust a paid proxy network? ;)
P.S.: Xerobank ppl has a dinner with FBI
Interesting. It appears they provide a free browser based on Firefox, which you can use either with the Tor network or their paid service. Their paid service also appears to use multiple nodes/mixes.
One thing: when using the paid service, are the different nodes/mixes in the circuits/cascades maintained by different companies? If not, then it is almost like a single-hop proxy, in that you must put all your trust in that one company.
Essentially, when there is only one company, operating either one proxy or a chain of proxies, you can trust that they do not log, but you do not actually know. If there are multiple operators (companies or individuals) in a chain of proxies, then even if one logs, he does not have enough information to de-anonymise you. However, if he co-operates with other proxies in the chain to de-anonymise you, there is nothing you can do.
Several benefits of xerobank, based on looking at their website:
* They do seem sincere.
* They have beta VOIP. Getting VOIP to work with Tor or JonDo without leaking data by mistake is near impossible.
Maybe I'm out of my depth here but is there not a danger that an evildoer can post defamatory data about a person, which then becomes part of the permanent public data?
DHS, through the PATRIOT Act, has been doing this prior to the "development" of this algorithm. Using tools like AeroText (Lockheed Martin), IdentiFinder (BBN/Verizon), Intelligent Miner for Text (IBM), NetOwl (SRA), Search Software America (SSA), and Thing Finder (Inxight), DHS accpomplishes all this and more. By increasing their sampling size through many different sites as well as collaberation components (Presence, Messaging, Discussion, Meeting, Sharing, and Virtual), it allows them to perform the following tasks relatively easily:
Collaboration in Real Time
Intelligence Systems for Detecting Terrorist Crimes
Data Mining - Embedded and Distributed
And Vladimir, don't think that an attempt at "Deniable Encryption" is any sort of protection. The style in which you write is as identifiable as a latent fingerprint. The more you put out there, the easier it is to identify you. Start putting different names with your signiture and you are sure to arouse interest. The only way to avoid scrutiny by the US Government is not to play: Live a Hermit's lifestyle and exist completely off the grid.
Now I know the reason I haven't been able to convince you to join LinkedIn, Bruce. ;-)
What's so new about this? We were taught about these techniques when I was a computer science undergrad back in the early 1980's?
@ZG: ``My suggestion is that we should live a life that we are not ashamed of or else bear the consequences of our actions.''
Some of us aren't so sure that our government will always be right, and it may get testy when called on it. Only anonymity ensures free criticism of said government. _Voila_, a life of which we are not ashamed, but for which bearing the (unconstitutional) consequences is not a socially-valuable option.
Even if an individual forum poster is anonymous, it is very difficult for a website owner to be anonymous. There are rare exceptions (see Tor and i2p .onion pseudo-domains), but these are not well-known and hence are probably not influential enough to damage a person's reputation.
So it is not a problem as long as the website owner is held accountable. Of course, it is unreasonable to hold them accountable if they are not informed of the problem. However, once they are informed, they should fix the problem. Then it will not remain permanently.
Unfortunately, website owners are often not held accountable. See Wikipedia, for example.
However, that is not a reason to prohibit anonymous web-surfing. It is only a reason to hold website owners accountable after they have received notice of problems.
We are living in the epoche/era/time of thefts of identities on large scale. And many countries had not yet developed instruments against this type of crime. On the contrary. So it is not right to look on this type of research with naive eyes.
For ZG, here's a neat illustration of the power of "Total Information Awareness":
Do we still have nothing to fear?
"..we should live a life that we are not ashamed of or else bear the consequenses of our actions."
For reasons others have given, I strongly disagree with this too. This argument almost hits me like a Jason Fortuny way of thinking.
The only way to really live a life without worry of scrutiny is to conform to mainstream cultural sensibilities and morals. For many people, including myself, living this way would deny who we are. I am not necessarily talking about closeted GBLT, but many ways of how people live life that may not be socially acceptable. Yet we still need to work and live among our employers and professional circles.
I am not ashamed of things I want to keep private. I am careful not to say anything that could irreparably ruin my life, but it could have a negative impact. Perhaps call it discretion more than shame, but still important to remain private. Others aren't so lucky.
Back to GBLT issues which more people might have sympathy for. The stakes can sometimes be very high. For example, there are still some parts of the US where people can loose their careers or even their children for being outed.
"..we should live a life that we are not ashamed of or else bear the consequenses of our actions."
It is the fairly large group of people who believe this and think this way that are the real problem? When this idea becomes institutionalized...aka Patriot act and DHS... you have a "police state" or theocracy where morality is policed? A problem being that this group can easily conduct wars and in country "purges" that are pretty much immoral as far a common human values are concerned. As we see. As we saw in WW2 Germany, etc.
My suggestion is that we should live a life that we are not ashamed of or else bear the consequenses of our actions.
Do you shower with your clothes on?
I want to warn everyone against xerobank.com!
they are the same scam outfit who is behind metropipe.net. No need to believe me, just google xerobank and metropipe. Lost my money and got the service.
One more comment regarding "We should live a life that we are not ashamed of or else bear the consequences of our actions."
I'm not ashamed of my political views, but I'm certain that I wouldn't want to publicly attach my real name to them, given the current political climate here in the U.S. For example, last week, a directive went out to the "grass roots" of the Democratic party to report "fishy" comments about the proposed healthcare legislation to an official whitehouse.gov e-mail address. Also, a citizen was beaten by union goons for opposing the one true health plan. I think I'll keep my opposition anonymous, thanks.
If I lived in China or Iran or Venezuela, I'd be even less eager to attach my real name to my on-line political discourse.
Schneier.com is a personal website. Opinions expressed are not necessarily those of BT.