Schneier on Security
A blog covering security and security technology.
« Unauthentication |
| The Problem of Vague Laws »
September 29, 2009
Predicting Characteristics of People by the Company they Keep
Turns out "gaydar" can be automated:
Using data from the social network Facebook, they made a striking discovery: just by looking at a person's online friends, they could predict whether the person was gay. They did this with a software program that looked at the gender and sexuality of a person's friends and, using statistical analysis, made a prediction. The two students had no way of checking all of their predictions, but based on their own knowledge outside the Facebook world, their computer program appeared quite accurate for men, they said. People may be effectively "outing" themselves just by the virtual company they keep.
This sort of thing can be generalized:
The work has not been published in a scientific journal, but it provides a provocative warning note about privacy. Discussions of privacy often focus on how to best keep things secret, whether it is making sure online financial transactions are secure from intruders, or telling people to think twice before opening their lives too widely on blogs or online profiles. But this work shows that people may reveal information about themselves in another way, and without knowing they are making it public. Who we are can be revealed by, and even defined by, who our friends are: if all your friends are over 45, you're probably not a teenager; if they all belong to a particular religion, it's a decent bet that you do, too. The ability to connect with other people who have something in common is part of the power of social networks, but also a possible pitfall. If our friends reveal who we are, that challenges a conception of privacy built on the notion that there are things we tell, and things we don't.
EDITED TO ADD (9/29): Better information from the MIT Newspaper.
Posted on September 29, 2009 at 7:13 AM
• 34 Comments
To receive these entries once a month by e-mail, sign up for the Crypto-Gram Newsletter.
As interesting as this is, I fear that such statistical analysis can ultimately be used against people, be it for some sort of witch hunt or character assassination. Couple this with the fact that people on facebook typically just add any person they find as a friend. Casual acquaintances, people you had a class with, random people you meet in bars, etc.
@mat - While people do typically add pretty much anyone they find that they remotely know, the 'statistics' portion of this would be that (A)it is the sum of all those people, not just the one or two who stick out, that they are measuring and (b) it predicts a *probability* that you are (whatever they are looking for).
I would say this isn't much different than someone observing your actions or travel, or asking your neighbors and coworkers about you. You're probably going to get the same information.
@mat - Perhaps the number of friends you have in common or within two degrees or such can be used to weight the connection?
Just a thought.
You might want to look a little deeper into this "research." It barely qualifies as such (and the principle researchers know it), and your two pull-quotes pretty much tell us why.
The Boston Globe was extremely irresponsible for reporting this as "science."
...The two students had no way of checking all of their predictions, but based on their own knowledge outside the Facebook world, their computer program appeared quite accurate for men, they said...
So basically they worked out an algorithm to figure out who they thought was gay, and then compared it with who they thought was gay. And it matched! Garbage in garbage out.
This is the definition of stereotyping.
Where can I run this on someone. I've been wondering if I was gay or not, this might help me find the answer.
Anyone reading Dan Brown's "The Lost Symbol"? Anyone following the R&D into lawful access tools? Anyone see the Intelligence benefits of a tool like this and others?
And if all my on0line associations are with paranoid, security obsessed, techno-geeks with no obvious social ties and a general libertarian bend - what does that say about me?
Oh my. ;)
Yes, but does it work against Communists?
That's why I always sign my posts here at www.schneier.com/blog as "anonymous.
I don't want The Man to think that I associate with the rest of you trouble-makers.
I've often thought that their is a fundamental problem with trying to preserve "good" privacy on the internet: There is just to much redundancy in the data.
The most obivious example is the release of the "anonymized" AOL search records a few years back. While each user's id had been replaced by a randomized integer, for the vast majority, it was trivial to tie the that id back to a real person. Of course, that was an extreme example, but when Netflix released a subset of customer ratings for testing, it was also shown to be trivial to tie back to an individual.
These are extreme examples, but remember when the real identity of the author of "Primary Colors" was outed because of markov based language similarity metrics. That was based a large corpus of published work, but with better data scraping it might applicable to anonymous blog posts as well. How do you strip out the identifying information without stripping out the meaning as well? Run it though bablefish translating it to French and back to English?
I fear that privacy on the internet will always be like small town privacy. It will be almost impossible without the cooperation and polite discretion of your neighbors, though you'll have a few million neighbors.
While you may deduce some things from who a friend is (I have 501 close personal friends not because I'm likable but because I like to win at pirates)
don't you learn more about a person's quality from who their enemies are? We need a social networking site for people to loath each other on.
It appears that at MIT, you can now program a computer to draw conclusions from correlations, and call it science. I can't wait until they come out with a program that predicts "IQ" based on race. Oh how the mighty have slipped.
The MIT newspaper has a much more insightful article on this:
This isn't news, and going after the researchers based on a half-baked Globe piece is not quite correct.
"I would say this isn't much different than someone observing your actions or travel, or asking your neighbors and coworkers about you. You're probably going to get the same information."
The difference is degree. Unlke asking neighbors and coworkers, this can be automated. Computers can do it against the entire population at once.
I call this kind of thing wholesale surveillance:
It's the difference between "follow that car" and "follow every car."
The problem with this kind of social network analysis is that many decision-makers in the intelligence community believe it works better than it does. Since they usually have no grounding in probability or statistics, they don't know about Bayes' theorem and have unrealistic expectations at best. At worst, innocent people are tarred by guilt with association and worse in a pure Kafka way. Maher Arar was rendered to Syria and tortured there on our behalf simply because he spoke to the brother of a suspect who asked him where to have printer cartridges refilled.
:The MIT newspaper has a much more insightful article on this: http://tech.mit.edu/V129/N39/mherdeg.html. This isn't news, and going after the researchers based on a half-baked Globe piece is not quite correct."
Interesting. Thank you.
As I remember it "birds of a feather stick together". Seems as though that is still true.
Most of my Facebook friends own a computer, and many of them are geeks. What does this say about me?
*boggles* I can write a computer program that predicts which coin tosses are heads and which are tails and just because I get a percent that matches up with "reality" doesn't mean that I've accurately identified which of the flips were heads and which were tails.
@Eric Hamilton: "Birds of a feather stick together"
Yes, but "opposites attract".
Perhaps "a man is known by the company he keeps?"
You think anybody's paying attention to which Schneier links I choose to click?
What do they do about folks like me? I'm up to four Twitter accounts so as to subdivide my favorite datastreams. Okay, maybe that's bad, but it really does help.
Don't the discussions about informational privacy and suchlike here have an implicit assumption that the people in the sample care who knows their sexual orientation?
You can buy 1000 friends from me! Totally random people -- some may even be goats.
I can promise I will sell you "new" people not recycled as they are on twitter.
sooth sayer : You can buy 1000 friends from me! Totally random people -- some may even be goats.
Hmm these thousand fiends do the come with a prearanged birthday card list etc?
If not tell me more about the goats are they the good christian goat or the bad satanic goat type? And which is cheaper?
Ummm... Isn't this pretty much the definition of social networks? People are looking for birds of a feather to socialize with. The goal of the network is to help people find like-minded people. People willingly, and legitimately, trade anonymity or privacy for society.
It seems to me that the better a social network is at identifying your particular parameters the more successful it is. It should surprise no-one that Facebook can be used to identify sub-classes of people, that's what it's for.
What would the contact profiles of
Psychiatrists, various Police types, or on-the-scene reporters be?
It offers fresh ground to let the
dimmer lights be deceived.
Imagibne this system in use
in the hands of minimum wage or sociopaths or pathological thinkers
This also makes a false assumption that people tell the truth in self reporting. Many if not most women I know changed their reported sexuality when it was cool to be a lesbian. Nearly everyone I know lies about their age and birthday. Many put what they think are ironic or sarcastic items into other fields. Real friends recognize the humor.
Of course it works against communists! Gayness is a communist plot afterall. I would figure that a fine patriot, like yourself, would know that. Why do you think we call them "pinkos" :)
There may be something to this provided, of course, that one's online "friends" accurately reflect real world actual friends and associates. I abandoned MySpace because I had too many "friends." I am not kidding when I say that most of them could walk past me on the street and I would have no idea who they were. I vowed to keep Facebook more streamlined. But sure enough, I've been added by acquaintances from work (I deal with the public on a daily basis), and don't really want to offend a customer by turning down a friend request. Result? Same old situation. One cannot infer anything very accurate about me from my Facebook friends list. Yes, there are some close connections among my online friends, but they are minor compared to the overwhelming majority of contacts about whom I know very little and care even less. In the words of Maria Helm's comment, it is actually the sum of my Facebook friends that paints an inaccurate picture, not the one or two that stick out. Those few that stick out do so because they are real connections, not tenuous ones like all of the others! The virtual world isn't real, and allocating real world resources based on virtual world profiling hardly seems a worthy endeavour to me. I'm also enough of a realist, however, to know that this kind of nonsense is only going to increase as those who never knew a world before Teh Interwebs become the majority.
A fraudster wanting to do some research on the next job may well need to join groups and create associations to enable them to get closer to the people they want to de-fraud. The fact will remain that as long as the fraudster doesn't get caught, the groups and associations will in time serve to act as legitimate "birds of a feather" relationships. Most contacts in the fraudsters list will be ordinary members of the public, there are unlikely to be known fraudster friends in online contacts unless they are working together, or just as obvious, that each one doesn't know that the other is a fraudster.
You may think that you know why you are connecting to some people, but despite what they tell you, you don't necessarily know why they want to connect to you. It may not always be as simple as a "birds of a feather" association.
Where does all this leave this monumental research?
Schneier.com is a personal website. Opinions expressed are not necessarily those of Co3 Systems, Inc.