Schneier on Security
A blog covering security and security technology.
« Detecting Cheaters |
| Get Your Terrorist Alerts on Facebook and Twitter »
April 8, 2011
Pinpointing a Computer to Within 690 Meters
This is impressive, and scary:
Every computer connected to the web has an internet protocol (IP) address, but there is no simple way to map this to a physical location. The current best system can be out by as much as 35 kilometres.
Now, Yong Wang, a computer scientist at the University of Electronic Science and Technology of China in Chengdu, and colleagues at Northwestern University in Evanston, Illinois, have used businesses and universities as landmarks to achieve much higher accuracy.
These organisations often host their websites on servers kept on their premises, meaning the servers' IP addresses are tied to their physical location. Wang's team used Google Maps to find both the web and physical addresses of such organisations, providing them with around 76,000 landmarks. By comparison, most other geolocation methods only use a few hundred landmarks specifically set up for the purpose.
The new method zooms in through three stages to locate a target computer. The first stage measures the time it takes to send a data packet to the target and converts it into a distance -- a common geolocation technique that narrows the target's possible location to a radius of around 200 kilometres.
Wang and colleagues then send data packets to the known Google Maps landmark servers in this large area to find which routers they pass through. When a landmark machine and the target computer have shared a router, the researchers can compare how long a packet takes to reach each machine from the router; converted into an estimate of distance, this time difference narrows the search down further. "We shrink the size of the area where the target potentially is," explains Wang.
Finally, they repeat the landmark search at this more fine-grained level: comparing delay times once more, they establish which landmark server is closest to the target. The result can never be entirely accurate, but it's much better than trying to determine a location by converting the initial delay into a distance or the next best IP-based method. On average their method gets to within 690 metres of the target and can be as close as 100 metres -- good enough to identify the target computer's location to within a few streets.
Posted on April 8, 2011 at 6:22 AM
• 48 Comments
To receive these entries once a month by e-mail, sign up for the Crypto-Gram Newsletter.
Except when you've a satellite connection, and the IP reports as where the downlink station is, not where the uplink is. I hope they counted stations like that in their 'on average'.
eg. Zillow.com, using what I believe is IP geolocation, is firmly convinced I live in south Denver, roughly a thousand miles from here.
Of course, the ping time can be a dead giveaway that you're dealing with a satellite - I've seen 25,000msec pings that still somehow managed to find their way home - but that could also just be a very badly clogged network.
Anyhow, there's always exceptions...
(I suspect they count satellite downlinks as a 'proxy server'. Hmph.)
Edit: Experimenting with online Geolocation services now seems to think I'm not in Colorado, but still 200km away. And that's without even trying to hide... J.
I'm using VPN, and located way-away from the office and its IP range
Guess where I am now
In general, it's never been safe to assume that an adversary who knows your IP address can't find your physical location. (Unless you're doing things like Jon or Louis, but this research doesn't address them.)
The technique presented here is novel, but I don't understand what's scary about it.
Since I'm under the impression that I'm legally required to submitt valid information for domain registration, and I don't wish to spend extra money for a PO Box just for this purpose, so is my mailing address - aka my home address. That's a lot closer than 690 meters, with a lot lower tech - "whois".
@phred14: Whois will give you my home or office address -- but the servers are somewhere else in a computer centre.
I think one difficulty with this is being able to consistently find other servers in the area that you know are hosted on premise.
The article mentions that they use businesses and universities as benchmarks for the location but today even those organizations host stuff off site.
If there were some way to programmatically determine other servers near a given location, this would be a lot more powerful
Leased lines and VPN's can make a monkey out of this and many other methods, but just like satellite systems, some mobile phone networks will quite easily hide your location (and be a lot cheaper to operate ;)
A seldom mentioned fact is that mobile operators are desperately short of IP addresses, so much so that as many as 300 actuall phones can share the same IP address. The way they do it is with a mixture of NAT & PAT (which is why as a web site owner etc you should log not just the IP addresses but the port numbers as well).
"The first stage measures the time it takes to send a data packet to the target and converts it into a distance"
Bouncing your traffic through the Tor network would help defend against this; they only thing an attacker could figure out is what planet you are on - and that's assuming it's a good day for the network!
"That's a lot closer than 690 meters, with a lot lower tech - "whois"."
Sure, but that's only an option if you want to find out the location of someone who has registered a domain, and have also chosen to be public.
Wouldn't work for ADSL in the UK; all the packets are carried over ATM until it gets to the ISP and then they're routed over IP from there.
So you'd get to the ISP, and then you'd face the time circle around the ISP; the user could be anywhere on that 'circle'.
@ Ian Woollard,
"Wouldn't work for ADSL in the UK; all the packets are carried over ATM until it gets to the ISP and then they're routed over IP from there."
I've never tried it on ADSL...
But I've just tried three different IP address locators on this smart phone and they put me near Manchester in the North West of England, but... It being a nice sunny day I'm actually looking out over the English channel in all directions whilst bobing up and down...
So I'm about 360Km South East of where they think I am, that's not even close enough for a nuke ;)
Right, as long as the ISPs network is a black box for the landmarks.
But it seems to me that you could make it work with landmarks that are inside the ISP network too. Home servers with known locations? Open wifi networks perhaps?
"good enough to identify the target computer's location to within a few streets."
At that point a nuke from orbit is an easy way to be sure.
Or in the words of NATO today, ""I can assure you that we do our utmost to avoid civilian casualties." " fighting... made the situation extremely confusing and hard to track."
If you use a tunnel or VPN to connect to the Internet, I think that this could be foiled. A software tunnel could pretty trivially introduce a delay, ideally random (it could vary slowly with time, as in GPS selective availability).
However, I think that even a constant delay would still work.
Suppose I am using a tunnel, and the tunnel delay is 20 msec, and the tunnel end point is in Boston. Now, with a constant delay they can certainly find out that the lowest latency is for a probe from Boston, and so Boston is "closer" to me than LA or Seattle or Washington. But, they cannot be sure that this means that
- I am in Boston and injecting 20 msec of constant delay or
- I am somewhere else (say in upper Vermont) and my network routing goes through Boston (with or without injecting constant delay), and it just happens that there is not a "landmark" router near me from a network topology sense).
- I am 20 msec from Boston and using a tunnel.
Given that 20 msec (one way) spans the continent, and that a 40 msec round trip delay is not perceptible in VOIP, it seems that this could hide you pretty effectively.
On the network topology note above - it is not enough that there is a landmark router physically near you, it also has to be near you from a network topology sense. It doesn't help them much if the museum next door is a landmark router if the peering point between your networks is 1000 km away.
Why is this necessarily even scary?
I remember reading a story some years ago about a student working alone in a library in Finland when she had a medical emergency. Her only means of communication at the time was Internet chat, but her chat buddies were in North America.
So what do you, in the USA, do? Her buddy called 911 and explained the situation. It was not easy to get all of the various authorities lined up but eventually the connection was traced and the ambulance made it to the library in time. Very lucky.
If there was a way to trace an IP address to 690 metres back then it could have ended a lot quicker, and with a lot more certainty in the outcome.
I predict that this will not be good to anything like 1 km accuracy away from major cities.
If you are out in the country, then there is unlikely to be a landmark router near, and if there is one, it is quite possibly on a different network, with a peering point many miles away. For example, many university extension campuses in the USA connect back to the main University NREN, and all Internet traffic to and from them then goes through one or two "GigaPOPs" in the state. So, even if there is a university extension nearby, it is likely to help with geolocation much.
This will be generally true as it is not economically efficient to install peering points out in the hinterlands. Even if the traffic is going next door, it's cheaper to backhaul it to a nearby city, exchange it there, and have the other network bring it back. So, on average, I don't think that this is going to work very well out in the country.
Assume that my ISP is on that landmark list. Assume that somebody is looking for my IP. Assume that they find that (barring actual problems) my ping time to my ISP is about 40ms. Now what?
Light travels at about 300,000 km/s, so a one-way time of 20ms puts me within 6,000 km of my ISP at most. I don't think that's a very useful result.
Suppose they also know that I connect with DSL, and therefore the ping time can be divided into my ISP to the particular switch I use, and the time from there to my home wiring. Suppose they can therefore tell the distance to the switch plus the distance from my switch to my home. That means they can put me in a certain radius of my ISP, and if that's all they can do they aren't going to get within 700m, since my ISP is over two kilometers from my house.
@magetoo: seems that the gateways for open wifi networks would be another good source for landmarks, as there is quite an infrastructure for utilizing those (my eye-fi card geolocates my photos without GPS).
This is no surprise. They trace IP addresses all the time on TV cop shows, with pinpoint accuracy. I've even seen them trace IP addresses like this: "3126.96.36.199" ;-)
@David Thornley: It gets even better - I'm on a DSL link with a microwave hop in the middle. According the the phone company records, I'm 56,000 feet (17km) from the CO!
It seems like people are dwelling on the exceptions too much. Previous forms of IP address geolocation have the same well-known problems with exceptions, yet the exceptions are few enough that people find the systems useful. The same will apply with newer techniques.
@Richard Schwartz: Everybody knows that numbers bigger than 255 are for tracing IPs from the future. Duh. *grin*
So this is instant geolocation, but, correct me if I'm wrong, I've never worked for an ISP, but can't they tell which IP is leased to which customer at any given point in time and get an address from that?
Is the big thing with this just the ability to cut out the ISP "middleman"?
Just what the Chinese needed, a better way to track people online. (and with US researchers helping no less)
ok, so now you know how far away i am, now which direction am I in?
As an old wire head, I find this method prepostorous. There are many potential variable factors in measuring TTR.
TTR is affected by bandwith, media (fiber vs. Electrical wire) interface delays, variable NAP traffic route factors with BGP MEDs and local preference values caused by traffic congestion and Traffic prioritization factors. Network address translation and port address translation plainly mask the host computer. VPN/IP tunneling and use of proxies make this a nebulous activity at best.
690 meters my butt! This fella couldn't defeat a fifteen year old with an aged copy of WinProxy and the use of home made 802.11b signal gain antenna.
Those who choose to be anonymous can choose to remain anonymous.
Do you think that 4600 botnets could remain in operation if somebody could reliably geolocate?
Think about Google's potential capability in data mining Google Map/Google Earth searches with the IP addresses from which the searches originated from. For instance, how many people have not used Google Earth and did a search of their own home address from their home? Obviously, if Google could accurately correlate these searches with the originating IP addresses, then their capability would be far more accurate then the research presented. I understand this is a static based approach and the information would become stale after some time, compared to the dynamic approach presented in the research mentioned on this post; the point I am trying to make is this would require little effort on Googles part, and they potentially have all the information they would need for analysis and correlation. Furthermore, consider the ISP companies who willingly sell user account information with IP address assignment correlation, and then consider the buyers of this information who use Google Earth for a nice geolocaiton front-end; obviously if this information is being uploaded to Google Earth, then Google could mine that data if they so choose to do so. No I am not picking on Google, same perspective applies to all the big players (Microsoft, Yahoo, AOL, etc).
Tracr.net is using CIDR block registration data from ARIN. Despite the quality of the data, a megaton Hydrogen blast over the position on the google map would have missed me completly. This factor is due to the variance in the billing address from the service location.
The biggest thing I see wrong with this system is that it assumes that the ip address is valid and untampered with. If the ip address is spoofed all bets are off regardless of the accuracy.
Geolocation systems always think that I am in Scottsdale Arizona USA, while in reality I am in the Middle East on the other side of the earth. If the geo services would be any more wrong, they would be more accurate...
i would very much like to see it 'in operation'.
Does this mean that now otherwise-unacceptable latency and jitter from ISPs will become a feature because they're protecting your privacy?
Global advertisers and search engines should not assume users want by default a geolocated experience. An easy solution is to enable users to choose by making geolocation opt in.
This isn't really as accurate as it sounds. Copper carries signals only a 1/3rd the speed of fibre, and active devices (which may be invisible) worsen results still.
Ultimately, you're relying on using lots of long fibres, and few active devices or other copper pathways, using DWDM vs multiplexing would seriously affect results - and that's just for the core.
If you compare HFC vs DSL access technologies, HFC can be 5 miles of fibre, and 50 yards of copper. The same circuit for DSL is entirely copper, but would appear like 15 miles of fibre!
Even the differing encoding technologies can add odd latency affects - even if using the same media.
Though this kind of 'triangulation' would appear sound in principle, in practice the error margins would surely be undeterminable, and unpredictable.
From experience, even if you know and share the *whole* layer-3 path between target and control endpoints, the end CPE equipment will vary in response time depending on vendor, model, segment load and device load.
Getting a couple of hundred meters is about as good as you can get even with a special environment.
I think an easy solution exists at the client PC itself: introducing random or consistently inaccurate jitter into the response time. That could throw them off quite a bit. Change it up consistently and they might give up on this kind of technique altogether. However, most people wanting real anonymity send their traffic through proxies in configurations and protocols that reduce traceability. Dedicated netbook + livecd/liveusb + tor over a wifi hotspot is a common example.
tracr.net puts me next door to the houses of parliament.
Kind of scary that that computer is referred to as the "target" computer.
@ Nick P.
"However, most people wanting real anonymity send their traffic through proxies in configurations and protocols that reduce traceability. Dedicated netbook + livecd/liveusb + tor over a wifi hotspot is a common example"
Right on the money.
"These organisations often host their websites on servers kept on their premises, meaning the servers' IP addresses are tied to their physical location."
I don't recall the last time I ever worked at a company were the data servers generally (and the web servers especially) where in the same suburb as me. In many cases they're in a different state (and on at least one occasion in a different country!), to the "Head Office" address that it all might be registered to. I really can't see that component of this scheme being of much practical use.
"I don't recall the last time I ever worked at a company were the data servers generally (and the web servers especially) where in the same suburb as me."
Whilst true of very large and very small companies it is not true for quite a few medium sized companies and quite a few Universities and colleges. Yong Wang and colleagues do say at the top of the second page of their paper they got this mean of 690 meters "in an academic" environment.
Part of the novel aspect of their system is that it automaticaly harvests the geographic information from the web server.
Now I suspect that many webservers in medium sized businesses are geo-located with their advertised postal address, and I can think of several ways to augment their work using other information.
For instance if (and it's a big if these days) you could get the final hop time on a range of IP addresses. Then in a business these would very likley form definate clusters around one or few mean times. This is simply based on the fact that business tend to put people in office buildings not spread out over a suburb. Thus three distinct clusters would tend to suggest three sites etc and a look at their "recruitment page" may well tell you where they are. This is likewise true for University Campuses. Over time measuring the final hop times will "lift the signal" from the noise in ways that will provide aditional information. If this is then combined with information from other measurments such as "TCP time stamp drift" you can illicit other information such as which PC's are "mobile" from site to site or coming in over a VPN from home, sugesting who might be a more senior person etc.
OK, so I visit 2 webpages a minute. I guess each page has ~10 adverts with this "active searching". Each active search initiates an interrogation of each landmark server in the region, lets say thats 20. Then it sends another set at the finer-grained level for 40.
Thats 800 extra transoceanic datasets each minute (assuming its from China). Not much for just me, but when you multiply that times the number of users on the internet, we will need an extra complete internet just for adverlocation if this catches on.
Clifford Stoll used the same mechanism to locate the hackers in 'The Cuckoo's Egg'.
Though, granted, the connection was via dial-up: it was, after all, the late 80's ... 8)
Makes me wonder if IPv6 could provide for a more accurate geo-location method ?
35km out? I checked my Gmail activity details and there was access from Morocco....
I took the IP address and did a lookup and found it to belong to RIM in the UK.....
LOL, panic over...!
IPv6 provides worse geolocating capability because every device can function as a router. There is limitless connectivity:
bluetooth, wireless, wired etc.
TTR in IPv6 will be even more meaningless
Schneier.com is a personal website. Opinions expressed are not necessarily those of Co3 Systems, Inc.