P2P Privacy

Interesting research:

The team of researchers, which includes graduate students David Choffnes (electrical engineering and computer science) and Dean Malmgren (chemical and biological engineering), and postdoctoral fellow Jordi Duch (chemical and biological engineering), studied connection patterns in the BitTorrent file-sharing network—one of the largest and most popular P2P systems today. They found that over the course of weeks, groups of users formed communities where each member consistently connected with other community members more than with users outside the community.

“This was particularly surprising because BitTorrent is designed to establish connections at random, so there is no a priori reason for such strong communities to exist,” Bustamante says. After identifying this community behavior, the researchers showed that an eavesdropper could classify users into specific communities using a relatively small number of observation points. Indeed, a savvy attacker can correctly extract communities more than 85 percent of the time by observing only 0.01 percent of the total users. Worse yet, this information could be used to launch a “guilt-by-association” attack, where an attacker need only determine the downloading behavior of one user in the community to convincingly argue that all users in the communities are doing the same.

Given the impact of this threat, the researchers developed a technique that prevents accurate classification by intelligently hiding user-intended downloading behavior in a cloud of random downloading. They showed that this approach causes an eavesdropper’s classification to be wrong the majority of the time, providing users with grounds to claim “plausible deniability” if accused.

Tags: decoys, deniability, privacy, torrents, traffic analysis

Posted on April 9, 2009 at 7:07 AM • 17 Comments

Comments

Yarp • April 9, 2009 7:41 AM

Hmm … “ad random” sounds so … random. Without reading the article this sounds like a logical result of different interest groups. People who are downloading the same tv-shows and use the same tracker are most likely to connect to each other.

“Hiding in the crowd” sounds like something that could make the download rather slow.

tcliu • April 9, 2009 7:44 AM

BT clients tend to prioritize peers with fast connections, so the selection isn’t random.

Clive Robinson • April 9, 2009 9:03 AM

Once again it shows that traffic analysis is more important than getting at the contnet of individual communications paths…

The reality is that all peer to peer networks will “group out” unless very specific steps are taken to stop it. It is the consiquence of trying to be “efficient”.

As I have said on previous occasions efficiency and security appear to be the oposit ends of the “see-saw”, the more you have of one the less you have of the other.

Wolf • April 9, 2009 9:07 AM

“Hiding in the crowd” by randomly and artifically generated traffic is a good pattern if the content is legal and you don’t want nevertheless that people are tracking what you are really looking at.

But with a high percentage of torrent files having some copyright issues you are getting into additional trouble, if you would artifically download (and share?) those copyrighted torrents.

coderpunk • April 9, 2009 10:23 AM

I’m not sure how they think they can examine the ‘BitTorrent file-sharing network’ – there is no such thing. BitTorrent has no centralized network, it is a disconnected collection of ‘information islands’ with each torrent server sharing the torrent info for files they are interested in. So generally you are going to find that the users of a torrent server are downloading similar types of files since that’s the whole point of the torrent server in the first place.

Brandioch Conner • April 9, 2009 10:29 AM

Hmmmmm …. looks like pretty obvious “research” to me.

#1. Of course the users in a “community” will connect more to each other than to systems “outside” the “community”. Because the apps have criteria for determining which machines to connect to. The machine characteristics probably won’t change much from session to session.

#2. Hiding in a crowd? By downloading potentially illegal content? Genius!

#3. Identifying the “community” by analyizing a tiny percent of it? Of course. Unless they happen upon someone downloading from multiple communities.

Example, when I download a new ISO image of Ubuntu, I’m most likely to connect to the same machines as last time. Because those are probably the machines that had the fastest connection to me last time and seeded long enough for me to connect to them.

paul • April 9, 2009 10:37 AM

If the “communities” are indeed the result of preferring faster connections, then reasoning behind any guilt-by-association claims would be completely specious. Not that that would stop them from being made.

Hygienic bedding • April 9, 2009 10:46 AM

@tcliu

Totally correct. My client actually displays the scores it keeps on its peers. Not random at all so the assumptions of the study are flawed, the solution is equally flawed. Evade the authorities by downloading more stuff from different torrents?

When conficker just dropped a payload, I fail to see how this is in any way security news. This seems more like another Kazaa style “incriminate yourself more easily so we can find you more easily” scam.

Miguel Farah • April 9, 2009 10:57 AM

“This was particularly surprising because BitTorrent is designed to establish connections at random, so there is no a priori reason for such strong communities to exist”

Perhaps there IS a reason: the content itself. For example, fans of Battlestar Galactica will all download the newest episode each week, and end up connecting to each other every time because they want the same content… meanwhile, fans of Damages won’t have that much connection with the former group, or with fans of silent movies, or with fans of manga, et cetera.

Rich Wilson • April 9, 2009 12:08 PM

This data is probably of most interest to marketing firms. The reason they have store loyalty cards is so they can track relationships between product purchases. In the same way- do people who download Battlestar Galactica also download the leaked Wolverine? Do people who prefer HD versions of files always wait for the HD version, or do they download a lower quality version first, and the HD later? Do people prefer lossless formats like FLAC, or smaller formats like MP3? Does their preference apply to only some music? And finally, how can we use this to get some money from them?

Anonymous • April 9, 2009 2:29 PM

I just don’t believe claims of “plausible deniability” by techies anymore. See Paul Ohm’s post at Freedom to Tinker: http://www.freedom-to-tinker.com/blog/paul/being-acquitted-versus-being-searched-yanal

PackagedBlue • April 9, 2009 4:09 PM

P2P privacy and bitTorrent, well read Wikipedia entry.

BitTorrent is not Tor. And BitTorrent and Tor is not FreeNET darknet.

Communications systems are complex things. Thank god for wikipedia.

Davi Ottenheimer • April 9, 2009 7:05 PM

“designed to establish connections at random”

designed to be random. there’s the first clue.

Fabian • April 10, 2009 5:23 AM

Hello,
I’m one of the guys named in the article and, after reading it, I figured I should clarify some points here – I swear over my laptop that:

- Nobody in the group is supported by RIAA (or associated organization).

- There are a number of legal scenarios you may want to use this for, not just downloading copyrighted material (think of China, Venezuela, Cuba, ...).

- Funding comes from only one only 3-letter organization of the US government - the National Science Foundation

- The fact that it is easy to identify a community is surprising since (i) connections are done randomly and kept based on performance, (ii) while you are connecting to people who want the same content they need to also be up at the same time, (iii) you don't keep a history of them, so next time you could be looking at a complete different set, (iv) people download more than, say, Battlestar Gallactica, ...

- The plugin does not download crazy random stuff; if you actually try it you'll see that it picks random torrent from a set of links you provide (e.g. free software).

- Finally, the *last* thing we want to do is to scare people off, we are just trying to help (and we have a track record showing it).

Hope this helps a bit; do please read the details :)

cheers

marco gioanola • April 10, 2009 7:07 AM

I’m not surprised that BT users form “communities”. A lot of local web forums exist, pointing to local BT trackers: these ARE communities. It is well known -and understandable- that p2p traffic is often “local”, because of evident reasons of reachability and efficiency. These are again “communities”.
Afaik, trackerless BT “networks” are not so widespread, so I’m missing what studying the “connection patterns in the BitTorrent file-sharing network” actually means. It would be interesting to get more details.

proxy user • April 12, 2009 12:10 AM

@ PackagedBlue

Do not use Tor for large downloads. Tor is not designed for that kind of traffic. Use i2p instead.

http://www.i2p2.de/

Of course you should only download things which are legal in your country.

Delaunay • April 17, 2009 3:28 AM

Research results from David Choffnes , Dean Malmgren and Jordi Duch will not change anything to probe analyser such qosmos (french product). I have used it many times to monitoring network. It is amazing what we can find whatever the encapsulation used. The only solution to fool this probe is to cypher all the traffic (except if the internet provider use a reverse proxy for any connection protocol)

P2P Privacy

Comments

Leave a comment Cancel reply