Side-Channel Attacks on Encrypted Web Traffic
Nice paper: “Side-Channel Leaks in Web Applications: a Reality Today, a Challenge Tomorrow,” by Shuo Chen, Rui Wang, XiaoFeng Wang, and Kehuan Zhang.
Abstract. With software-as-a-service becoming mainstream, more and more applications are delivered to the client through the Web. Unlike a desktop application, a web application is split into browser-side and server-side components. A subset of the application’s internal information flows are inevitably exposed on the network. We show that despite encryption, such a side-channel information leak is a realistic and serious threat to user privacy. Specifically, we found that surprisingly detailed sensitive information is being leaked out from a number of high-profile, top-of-the-line web applications in healthcare, taxation, investment and web search: an eavesdropper can infer the illnesses/medications/surgeries of the user, her family income and investment secrets, despite HTTPS protection; a stranger on the street can glean enterprise employees’ web search queries, despite WPA/WPA2 Wi-Fi encryption. More importantly, the root causes of the problem are some fundamental characteristics of web applications: stateful communication, low entropy input for better interaction, and significant traffic distinctions. As a result, the scope of the problem seems industry-wide. We further present a concrete analysis to demonstrate the challenges of mitigating such a threat, which points to the necessity of a disciplined engineering practice for side-channel mitigations in future web application developments.
We already know that eavesdropping on an SSL-encrypted web session can leak a lot of information about the person’s browsing habits. Since the size of both the page requests and the page downloads are different, an eavesdropper can sometimes infer which links the person clicked on and what pages he’s viewing.
This paper extends that work. Ed Felten explains:
The new paper shows that this inference-from-size problem gets much, much worse when pages are using the now-standard AJAX programming methods, in which a web “page” is really a computer program that makes frequent requests to the server for information. With more requests to the server, there are many more opportunities for an eavesdropper to make inferences about what you’re doing—to the point that common applications leak a great deal of private information.
Consider a search engine that autocompletes search queries: when you start to type a query, the search engine gives you a list of suggested queries that start with whatever characters you have typed so far. When you type the first letter of your search query, the search engine page will send that character to the server, and the server will send back a list of suggested completions. Unfortunately, the size of that suggested completion list will depend on which character you typed, so an eavesdropper can use the size of the encrypted response to deduce which letter you typed. When you type the second letter of your query, another request will go to the server, and another encrypted reply will come back, which will again have a distinctive size, allowing the eavesdropper (who already knows the first character you typed) to deduce the second character; and so on. In the end the eavesdropper will know exactly which search query you typed. This attack worked against the Google, Yahoo, and Microsoft Bing search engines.
Many web apps that handle sensitive information seem to be susceptible to similar attacks. The researchers studied a major online tax preparation site (which they don’t name) and found that it leaks a fairly accurate estimate of your Adjusted Gross Income (AGI). This happens because the exact set of questions you have to answer, and the exact data tables used in tax preparation, will vary based on your AGI. To give one example, there is a particular interaction relating to a possible student loan interest calculation, that only happens if your AGI is between $115,000 and $145,000—so that the presence or absence of the distinctively-sized message exchange relating to that calculation tells an eavesdropper whether your AGI is between $115,000 and $145,000. By assembling a set of clues like this, an eavesdropper can get a good fix on your AGI, plus information about your family status, and so on.
For similar reasons, a major online health site leaks information about which medications you are taking, and a major investment site leaks information about your investments.
The paper goes on to talk about mitigation—padding page requests and downloads to a constant size is the obvious one—but they’re difficult and potentially expensive.
igloo • March 26, 2010 6:36 AM
“plus ça change, plus c’est la même chose!” With direction finding replaced with IP address, traffic size and volume has been a significant side channel since WW1. In that case, traffic size and patterns were used to discover troop movements and future attack points. “Those who ignore history are condemned to repeat it” or something like that.