Entries Tagged "de-anonymization"

Page 3 of 5

Surveillance and the Internet of Things

The Internet has turned into a massive surveillance tool. We’re constantly monitored on the Internet by hundreds of companies — both familiar and unfamiliar. Everything we do there is recorded, collected, and collated — sometimes by corporations wanting to sell us stuff and sometimes by governments wanting to keep an eye on us.

Ephemeral conversation is over. Wholesale surveillance is the norm. Maintaining privacy from these powerful entities is basically impossible, and any illusion of privacy we maintain is based either on ignorance or on our unwillingness to accept what’s really going on.

It’s about to get worse, though. Companies such as Google may know more about your personal interests than your spouse, but so far it’s been limited by the fact that these companies only see computer data. And even though your computer habits are increasingly being linked to your offline behavior, it’s still only behavior that involves computers.

The Internet of Things refers to a world where much more than our computers and cell phones is Internet-enabled. Soon there will be Internet-connected modules on our cars and home appliances. Internet-enabled medical devices will collect real-time health data about us. There’ll be Internet-connected tags on our clothing. In its extreme, everything can be connected to the Internet. It’s really just a matter of time, as these self-powered wireless-enabled computers become smaller and cheaper.

Lots has been written about theInternet of Things” and how it will change society for the better. It’s true that it will make a lot of wonderful things possible, but the “Internet of Things” will also allow for an even greater amount of surveillance than there is today. The Internet of Things gives the governments and corporations that follow our every move something they don’t yet have: eyes and ears.

Soon everything we do, both online and offline, will be recorded and stored forever. The only question remaining is who will have access to all of this information, and under what rules.

We’re seeing an initial glimmer of this from how location sensors on your mobile phone are being used to track you. Of course your cell provider needs to know where you are; it can’t route your phone calls to your phone otherwise. But most of us broadcast our location information to many other companies whose apps we’ve installed on our phone. Google Maps certainly, but also a surprising number of app vendors who collect that information. It can be used to determine where you live, where you work, and who you spend time with.

Another early adopter was Nike, whose Nike+ shoes communicate with your iPod or iPhone and track your exercising. More generally, medical devices are starting to be Internet-enabled, collecting and reporting a variety of health data. Wiring appliances to the Internet is one of the pillars of the smart electric grid. Yes, there are huge potential savings associated with the smart grid, but it will also allow power companies – and anyone they decide to sell the data to — to monitor how people move about their house and how they spend their time.

Drones are another “thing” moving onto the Internet. As their price continues to drop and their capabilities increase, they will become a very powerful surveillance tool. Their cameras are powerful enough to see faces clearly, and there are enough tagged photographs on the Internet to identify many of us. We’re not yet up to a real-time Google Earth equivalent, but it’s not more than a few years away. And drones are just a specific application of CCTV cameras, which have been monitoring us for years, and will increasingly be networked.

Google’s Internet-enabled glasses — Google Glass — are another major step down this path of surveillance. Their ability to record both audio and video will bring ubiquitous surveillance to the next level. Once they’re common, you might never know when you’re being recorded in both audio and video. You might as well assume that everything you do and say will be recorded and saved forever.

In the near term, at least, the sheer volume of data will limit the sorts of conclusions that can be drawn. The invasiveness of these technologies depends on asking the right questions. For example, if a private investigator is watching you in the physical world, she or he might observe odd behavior and investigate further based on that. Such serendipitous observations are harder to achieve when you’re filtering databases based on pre-programmed queries. In other words, it’s easier to ask questions about what you purchased and where you were than to ask what you did with your purchases and why you went where you did. These analytical limitations also mean that companies like Google and Facebook will benefit more from the Internet of Things than individuals — not only because they have access to more data, but also because they have more sophisticated query technology. And as technology continues to improve, the ability to automatically analyze this massive data stream will improve.

In the longer term, the Internet of Things means ubiquitous surveillance. If an object “knows” you have purchased it, and communicates via either Wi-Fi or the mobile network, then whoever or whatever it is communicating with will know where you are. Your car will know who is in it, who is driving, and what traffic laws that driver is following or ignoring. No need to show ID; your identity will already be known. Store clerks could know your name, address, and income level as soon as you walk through the door. Billboards will tailor ads to you, and record how you respond to them. Fast food restaurants will know what you usually order, and exactly how to entice you to order more. Lots of companies will know whom you spend your days — and nights — with. Facebook will know about any new relationship status before you bother to change it on your profile. And all of this information will all be saved, correlated, and studied. Even now, it feels a lot like science fiction.

Will you know any of this? Will your friends? It depends. Lots of these devices have, and will have, privacy settings. But these settings are remarkable not in how much privacy they afford, but in how much they deny. Access will likely be similar to your browsing habits, your files stored on Dropbox, your searches on Google, and your text messages from your phone. All of your data is saved by those companies — and many others — correlated, and then bought and sold without your knowledge or consent. You’d think that your privacy settings would keep random strangers from learning everything about you, but it only keeps random strangers who don’t pay for the privilege — or don’t work for the government and have the ability to demand the data. Power is what matters here: you’ll be able to keep the powerless from invading your privacy, but you’ll have no ability to prevent the powerful from doing it again and again.

This essay originally appeared on the Guardian.

EDITED TO ADD (6/14): Another article on the subject.

Posted on May 21, 2013 at 6:15 AMView Comments

Reidentifying Anonymous Data

Latanya Sweeney has demonstrated how easy it can be to identify people from their birth date, gender, and zip code. The anonymous data she reidentified happened to be DNA data, but that’s not relevant to her methods or results.

Of the 1,130 volunteers Sweeney and her team reviewed, about 579 provided zip code, date of birth and gender, the three key pieces of information she needs to identify anonymous people combined with information from voter rolls or other public records. Of these, Sweeney succeeded in naming 241, or 42% of the total. The Personal Genome Project confirmed that 97% of the names matched those in its database if nicknames and first name variations were included.

Her results are described here.

Posted on May 8, 2013 at 1:54 PMView Comments

Identifying People from their DNA

Interesting:

The genetic data posted online seemed perfectly anonymous ­- strings of billions of DNA letters from more than 1,000 people. But all it took was some clever sleuthing on the Web for a genetics researcher to identify five people he randomly selected from the study group. Not only that, he found their entire families, even though the relatives had no part in the study ­– identifying nearly 50 people.

[…]

Other reports have identified people whose genetic data was online, but none had done so using such limited information: the long strings of DNA letters, an age and, because the study focused on only American subjects, a state.

Posted on January 24, 2013 at 6:48 AMView Comments

Identifying Speakers in Encrypted Voice Communication

I’ve already written how it is possible to detect words and phrases in encrypted VoIP calls. Turns out it’s possible to detect speakers as well:

Abstract: Most of the voice over IP (VoIP) traffic is encrypted prior to its transmission over the Internet. This makes the identity tracing of perpetrators during forensic investigations a challenging task since conventional speaker recognition techniques are limited to unencrypted speech communications. In this paper, we propose techniques for speaker identification and verification from encrypted VoIP conversations. Our experimental results show that the proposed techniques can correctly identify the actual speaker for 70-75% of the time among a group of 10 potential suspects. We also achieve more than 10 fold improvement over random guessing in identifying a perpetrator in a group of 20 potential suspects. An equal error rate of 17% in case of speaker verification on the CSLU speaker recognition corpus is achieved.

Posted on September 16, 2011 at 12:31 PMView Comments

Identifying People by their Writing Style

The article is in the context of the big Facebook lawsuit, but the part about identifying people by their writing style is interesting:

Recently, a team of computer scientists at Concordia University in Montreal took advantage of an unusual set of data to test another method of determining e-mail authorship. In 2003, the Federal Energy Regulatory Commission, as part of its investigation into Enron, released into the public domain hundreds of thousands of employee e-mails, which have become an important resource for forensic research. (Unlike novels, newspapers or blogs, e-mails are a private form of communication and aren’t usually available as a sizable corpus for analysis.)

Using this data, Benjamin C. M. Fung, who specializes in data mining, and Mourad Debbabi, a cyber-forensics expert, collaborated on a program that can look at an anonymous e-mail message and predict who wrote it out of a pool of known authors, with an accuracy of 80 to 90 percent. (Ms. Chaski claims 95 percent accuracy with her syntactic method.) The team identifies bundles of linguistic features, hundreds in all. They catalog everything from the position of greetings and farewells in e-mails to the preference of a writer for using symbols (say, “$” or “%”) or words (“dollars” or “percent”). Combining all of those features, they contend, allows them to determine what they call a person’s “write-print.”

It seems reasonable that we have a linguistic fingerprint, although 1) there are far fewer of them than finger fingerprints, 2) they’re easier to fake. It’s probably not much of a stretch to take that software that “identifies bundles of linguistic features, hundreds in all” and use the data to automatically modify my writing to look like someone else’s.

EDITED TO ADD (8/3): A good criticism of the science behind author recognition, and a paper on how to evade these systems.

Posted on August 3, 2011 at 6:08 AMView Comments

Pinpointing a Computer to Within 690 Meters

This is impressive, and scary:

Every computer connected to the web has an internet protocol (IP) address, but there is no simple way to map this to a physical location. The current best system can be out by as much as 35 kilometres.

Now, Yong Wang, a computer scientist at the University of Electronic Science and Technology of China in Chengdu, and colleagues at Northwestern University in Evanston, Illinois, have used businesses and universities as landmarks to achieve much higher accuracy.

These organisations often host their websites on servers kept on their premises, meaning the servers’ IP addresses are tied to their physical location. Wang’s team used Google Maps to find both the web and physical addresses of such organisations, providing them with around 76,000 landmarks. By comparison, most other geolocation methods only use a few hundred landmarks specifically set up for the purpose.

The new method zooms in through three stages to locate a target computer. The first stage measures the time it takes to send a data packet to the target and converts it into a distance — a common geolocation technique that narrows the target’s possible location to a radius of around 200 kilometres.

Wang and colleagues then send data packets to the known Google Maps landmark servers in this large area to find which routers they pass through. When a landmark machine and the target computer have shared a router, the researchers can compare how long a packet takes to reach each machine from the router; converted into an estimate of distance, this time difference narrows the search down further. “We shrink the size of the area where the target potentially is,” explains Wang.

Finally, they repeat the landmark search at this more fine-grained level: comparing delay times once more, they establish which landmark server is closest to the target. The result can never be entirely accurate, but it’s much better than trying to determine a location by converting the initial delay into a distance or the next best IP-based method. On average their method gets to within 690 metres of the target and can be as close as 100 metres — good enough to identify the target computer’s location to within a few streets.

Posted on April 8, 2011 at 6:22 AMView Comments

Identifying Tor Users Through Insecure Applications

Interesting research: “One Bad Apple Spoils the Bunch: Exploiting P2P Applications to Trace and Profile Tor Users“:

Abstract: Tor is a popular low-latency anonymity network. However, Tor does not protect against the exploitation of an insecure application to reveal the IP address of, or trace, a TCP stream. In addition, because of the linkability of Tor streams sent together over a single circuit, tracing one stream sent over a circuit traces them all. Surprisingly, it is unknown whether this linkability allows in practice to trace a significant number of streams originating from secure (i.e., proxied) applications. In this paper, we show that linkability allows us to trace 193% of additional streams, including 27% of HTTP streams possibly originating from “secure” browsers. In particular, we traced 9% of Tor streams carried by our instrumented exit nodes. Using BitTorrent as the insecure application, we design two attacks tracing BitTorrent users on Tor. We run these attacks in the wild for 23 days and reveal 10,000 IP addresses of Tor users. Using these IP addresses, we then profile not only the BitTorrent downloads but also the websites visited per country of origin of Tor users. We show that BitTorrent users on Tor are over-represented in some countries as compared to BitTorrent users outside of Tor. By analyzing the type of content downloaded, we then explain the observed behaviors by the higher concentration of pornographic content downloaded at the scale of a country. Finally, we present results suggesting the existence of an underground BitTorrent ecosystem on Tor.

Posted on March 25, 2011 at 6:38 AMView Comments

Sidebar photo of Bruce Schneier by Joe MacInnis.