Geotagging Twitter Users by Mining Their Social Graphs
New research: Geotagging One Hundred Million Twitter Accounts with Total Variation Minimization,” by Ryan Compton, David Jurgens, and David Allen.
Abstract: Geographically annotated social media is extremely valuable for modern information retrieval. However, when researchers can only access publicly-visible data, one quickly finds that social media users rarely publish location information. In this work, we provide a method which can geolocate the overwhelming majority of active Twitter users, independent of their location sharing preferences, using only publicly-visible Twitter data.
Our method infers an unknown user’s location by examining their friend’s locations. We frame the geotagging problem as an optimization over a social network with a total variation-based objective and provide a scalable and distributed algorithm for its solution. Furthermore, we show how a robust estimate of the geographic dispersion of each user’s ego network can be used as a per-user accuracy measure which is effective at removing outlying errors.
Leave-many-out evaluation shows that our method is able to infer location for 101,846,236 Twitter users at a median error of 6.38 km, allowing us to geotag over 80% of public tweets.
Anura • March 10, 2015 12:52 PM
This is a perfect example of why “metadata” is actually data – just knowing who is connected to who reveals a sigificant amount of information about people. Even if the name of a particular person is not known, correlating with other information like the phone records of known individuals can provide you with enough information to identify them.