Geotagging Twitter Users by Mining Their Social Graphs

New research: Geotagging One Hundred Million Twitter Accounts with Total Variation Minimization," by Ryan Compton, David Jurgens, and David Allen.

Abstract: Geographically annotated social media is extremely valuable for modern information retrieval. However, when researchers can only access publicly-visible data, one quickly finds that social media users rarely publish location information. In this work, we provide a method which can geolocate the overwhelming majority of active Twitter users, independent of their location sharing preferences, using only publicly-visible Twitter data.

Our method infers an unknown user's location by examining their friend's locations. We frame the geotagging problem as an optimization over a social network with a total variation-based objective and provide a scalable and distributed algorithm for its solution. Furthermore, we show how a robust estimate of the geographic dispersion of each user's ego network can be used as a per-user accuracy measure which is effective at removing outlying errors.

Leave-many-out evaluation shows that our method is able to infer location for 101,846,236 Twitter users at a median error of 6.38 km, allowing us to geotag over 80% of public tweets.

Posted on March 10, 2015 at 6:50 AM • 6 Comments

Comments

AnuraMarch 10, 2015 12:52 PM

This is a perfect example of why "metadata" is actually data - just knowing who is connected to who reveals a sigificant amount of information about people. Even if the name of a particular person is not known, correlating with other information like the phone records of known individuals can provide you with enough information to identify them.

DanielMarch 10, 2015 2:09 PM

Not impressed. 6.38 km means one thing in Montana. It means something entirely different in New York city. I can geotag 7 billion people--they are all on the planet Earth. The issue is whether the data can be refined enough for law enforcement or advertising purposes. In most cases even a single kilometer is too coarse.

Alan KaminskyMarch 10, 2015 3:04 PM

@Daniel Not impressed.

I'm more impressed than @Daniel. 6.38 kilometers was the median error. If you look at the entire error distribution (Figure 6 in the paper), a large number of users were located with an error of 1 kilometer or less.

@Daniel The issue is whether the data can be refined enough for law enforcement or advertising purposes.

Let's not forget other purposes, like divorce settlements. Even if you don't geotag your tweets, if your spouse's lawyer can locate you within 1 kilometer of "the other person" at various times based on others' geotagged tweets, that could influence the divorce decree. (We've already seen this sort of thing based on EZ-Pass toll data.)

Basically, there is no privacy on Twitter, Facebook, Google, etc., even if you turn on their "privacy controls."

albertMarch 10, 2015 3:11 PM

We're talking about Twitter? The Etch-A-Sketch of the Web?
Twitter users don't give a r*** a** about geo-tagging. And they won't, even after they get a visit from the FBI regarding that nice boy/girl from Iran they met at Starbucks, and dated a few times. Even after they're arrested, they won't care, and if they did, it will have been too late.
.
It's the TLA 'tools' vs. the Twitter 'tools'. There are no winners in this race.
.
....

AnuraMarch 10, 2015 3:36 PM

@Alan Kaminsky

Let's not forget other purposes, like divorce settlements. Even if you don't geotag your tweets, if your spouse's lawyer can locate you within 1 kilometer of "the other person" at various times based on others' geotagged tweets, that could influence the divorce decree. (We've already seen this sort of thing based on EZ-Pass toll data.)

This doesn't provide that level of detail. It provides the home of the user within a reasonable margin of error based on other users they are connected to; it does not provide information about where individual messages were broadcast from.

4March 11, 2015 9:53 AM

What is this about privacy? The purpose of Twitter is framing fake terrorists. Sure, it's depressing to pay attention to the USA's rigged neo-Soviet judicial system, but buried in all the totalitarian pus there are some priceless gems of pure Three Stooges slapstick as the government frantically makes shit up to salvage its fabricated case. CIA better hope they put enough slack-jawed morons on that jury, because that's all that can save them now.

Dicking around with a fake twitter feed after the patsy is locked up? That's so sad. At least with 9/11 CIA made their terrorists conspicuous ahead of time, as when one of their Attas blocked a runway at Miami international and walked away all inconspicuous and casual-like. Now that's entertainment.

Leave a comment

Allowed HTML: <a href="URL"> • <em> <cite> <i> • <strong> <b> • <sub> <sup> • <ul> <ol> <li> • <blockquote> <pre>

Photo of Bruce Schneier by Per Ervland.

Schneier on Security is a personal website. Opinions expressed are not necessarily those of IBM Resilient.