Detecting Browser History

Interesting research.

Main results:

[…]

We analyzed the results from over a quarter of a million people who ran our tests in the last few months, and found that we can detect browsing histories for over 76% of them. All major browsers allow their users’ history to be detected, but it seems that users of the more modern browsers such as Safari and Chrome are more affected; we detected visited sites for 82% of Safari users and 94% of Chrome users.
[…]

While our tests were quite limited, for our test of 5000 most popular websites, we detected an average of 63 visited locations (13 sites and 50 subpages on those sites); the medians were 8 and 17 respectively.
Almost 10% of our visitors had over 30 visited sites and 120 subpages detected—heavy Internet users who don’t protect themselves are more affected than others.
[…]

The ability to detect visitors’ browsing history requires just a few lines of code. Armed with a list of websites to check for, a malicious webmaster can scan over 25 thousand links per second (1.5 million links per minute) in almost every recent browser.
Most websites and pages you view in your browser can be detected as long as they are kept in your history. Almost every address that was in your browser’s address bar can be detected (this includes most pages, including those retrieved using https and some forms with potentialy private information such as your zipcode or search query). Pages won’t be detected when they expire from your history (usually after a month or two), or if you manually clear it.

For now, the only way to fix the issue is to constantly clear browsing history or use private browsing modes. The first browser to prevent this trick in a default installation (Firefox 4.0) is supposed to come out in October.

Here’s a link to the paper.

Tags: browsers, Chrome, Firefox, privacy, Safari, web, web privacy

Posted on May 20, 2010 at 1:28 PM • 27 Comments

Comments

Tim • May 20, 2010 2:02 PM

Yeah but you have to already know which pages they might have visited.

A very limited attack. Might be useful for cookie-less tracking, or targeted advertising though.

Mike A. • May 20, 2010 2:21 PM

Fix the issue in Firefox 3.5 with the following preference in about:config.

layout.css.visited_links_enabled;false

Visgean Skeloru • May 20, 2010 2:29 PM

Hah, this is really old.

Barry Kelly • May 20, 2010 2:37 PM

Or turn off browsing history.

Chasmosaur • May 20, 2010 2:48 PM

Or just use your CCleaner once a week.

balls • May 20, 2010 2:50 PM

The good ol CSS/link color trick.

Roger D. • May 20, 2010 3:33 PM

Here is another project about web history
reconstruction that does not require any prior knowledge:
http://planete.inrialpes.fr/projects/private-information-disclosure-from-web-searches/

mcb • May 20, 2010 3:49 PM

Doesn’t everyone use a browser cleaner?

Pascal Forget • May 20, 2010 3:50 PM

My two cents: do they take into account that people using Chrome (a non-default browser) might be a bit more web-savvy, therefore surfing the net more (and visiting more web sites) than, say, the average Explorer users?

Davi Ottenheimer • May 20, 2010 5:14 PM

Firefox 3.7a5-pre (built yesterday) also prevents this

Davi Ottenheimer • May 20, 2010 5:20 PM

@ Pascal

“people using Chrome (a non-default browser) might be a bit more web-savvy”

Thus far a Chrome user probably should be far less web (security)-savvy…

http://www.intelligentdesign.com.au/blog/2009/04/05/why-you-shouldnt-use-google-chrome/

http://ha.ckers.org/blog/20100414/chrome-phishing/

name • May 20, 2010 8:03 PM

Even if you disable Firefox history the recently closed tabs option in recent versions of Firefox still keeps a record of sites visited in the same session. You can disable this separately, but if Firefox is tracking it no doubt someone will be able to exploit it.

tom • May 20, 2010 10:12 PM

A far more impressive way what can be done to track users on the internet is browser fingerprinting.

Link here: https://panopticlick.eff.org/

MrHadron • May 21, 2010 1:45 AM

Wow, this is really, really outstanding research. I am impressed. This issue is (was?) significant, yet underestimated. No real data on the severity, no real techniques to exploit it were found. Everybody was just ignoring this but some people were asking questions “who will do it first”, that “who”, they did.

I am actually involved in privacy research, and this research kind of crippled my own… But still, congratulations.

Clive Robinson • May 21, 2010 2:03 AM

Although this reseach might be exercising an old bug there is a message it is screaming out that we should not ignore.

Firstly – the attack is an early example in a class of enumerating / fingerprinting attack on browser history. As others have noted there are better attacks, so we know it’s following Bruce’s Maxim.

Secondly the fact that its old and of limited capability but still quite successful should be setting of alarm bells in security minded heads about what improved methods in the class of attack are capable of.

Thirdly this class of attack can be used to enumarate private networks that have no direct connection to the outside world. Thus it shows a method by which information can cross air gap security.

Forthly it gets around certain privacy enhancing technology such as TOR in ways users may not realise.

Fifthly it can be used in MITM type upstream attacks in an almost transparent manner (think of pae redirection and some of the Phorm techneiques).

Then there is the user issue. The history function is there because it is usefull to users in many ways that means giving it up on the UI side is going to be a strugle as is using “cleaning” software. The loss of functionality will not be acceptable to many power users.

This means that the issue has to be fixed on the non UI side. And this is a lowlevel in some cases protocol effecting issue.

Now I don’t know about what some of you think but it has “fire alarm” level warnings ringing in my head. The reason being that just about every tie we try to fix a protocol issue we break lots of things and they in turm open up lots of security holes…

jake • May 21, 2010 2:13 AM

I’ve known about the CSS detection trick for a long time, but it’s good to have real data on how many users it can affect. If this doesn’t convince browser developers to finally fix the problem I don’t know what will.

Olaf • May 21, 2010 5:12 AM

It found nothing on my FF setup. I don’t record history and have cookies dumped when the browser closes.

Partly paranoia about browsers holding passwords etc and partly because I never used the history and just got rid of it.

snxue-v • May 21, 2010 5:39 AM

so, use TOR is better.

‘Congratulations, we did not find anything in this category in your browser history.
Feel free to try our other browser history tests.’

Clive Robinson • May 21, 2010 7:07 AM

@ snxue-v,

“so, use TOR is better”

Possibly not for a couple of reasons.

Firstly TOR can be subverted between you and the TOR entry node and between the TOR exit node and the destination site.

Unless both you and the site you are browsing use appropriate measures to ensure the communication channel cannot be attributed (which it appears it cannot for various reasons due to things such as choke point issues). Then a national observer can firstly deduce you have been to the site, but not nessaceraly what contnet.

They can the position themselves as a Man In The Middle at some point upstream of you (say at your ISP) and when you next browse not through TOR or via a protected channel they can do a simple page intercept and use this old attack to interegate your browser to enumerate / fingerprint it’s history about which pages on the site you visited through TOR…

You will note I say “National Observer” this does not imply a Government per say but any organisation that has sufficient coverage to have the TOR entry and exit points within it’s view of the Internet.

Thus a national ISP could well have access to the entry and exit nodes of a TOR network even if the trafic in the TOR network is routed internationaly.

Traffic flow analysis aided by “timing jitter injection” will fairly easily identify the two TOR end points (and yes there is research work to show this that has been presented at PET and other conferances).

HJohn • May 21, 2010 7:39 AM

On Chrome, change the shortcut to this:
“[Drive]:[path]\chrome.exe” –incognito”

It will start in incognito mode limiting history.

I also use cccleaner on every startup.

Craig • May 23, 2010 3:05 AM

This is always a catch 22 as a security issue to erase browser history, on numerous occasions I have visited an extremely useful site and I haven’t saved the link.
And unfortunately for the life of me I can’t remember this, and there is no way that I am probably going to find the site again.
Only because of my browser history I can pull it up?

Unconcerned • May 24, 2010 9:03 AM

“Based upon the sites you visit there is a 50% chance that you are female.”

To be honest, it doesn’t bother me that they can predict my gender with this accuracy. I must be doing something right.

What I don’t understand is why the ‘browsed site’ detection is done server side rather than client side. Seems a browser could just get a set of links and compare those links to its history without having to tell the server anything.

Hoo • May 26, 2010 8:09 AM

But if there is an eye on your gateway or router to watch your browsed site, what shall we do? I guess doing something on client side won’t work.

Clive Robinson • May 27, 2010 2:01 AM

@ Hoo,

“But if there is an eye on your gateway or router…”

If they are on the gateway they will always be able to see you are activly using the network. The further the eye is up stream of you the less chance they have of seeing everything or of digging it out of the network noise.

What else they can see (ie packet size, timing, headers, content) is dependant on what software you use on your computer.

However there are issues with how much you can hide things like packet timing some are human some technical.

For instance there is a technical problem of “time out window” that is built in at various levels that decides on when the response is “lost” and thus retried or dropped.

Likewise is the human issue of not liking slow or unresponsive behavior on the network even if it is for the users protection…

Thus in theory you will nearly always leak sufficient information in an “interactive” session for a suitably attentive watcher “to nail your hide to the tree”.

There are ways this leaking issue can be reduced and also reduce latency.

For instance a minor change or two to “HTTP traffic” would reduce the amount of traffic from a site to your browser. That is if part of the protocol was to send a tagged list of objects with time stamps the browser would only need to download what was not in it’s own cache.

So for instance this blog page, if your browser only requested the data after the time of your last download it would just get this and one or two other comments not the whole page…

However to work effectivly it needs proper enforcment of objects and their tagging at the server end which is something that currently not done thus prevents it.

Arthur • June 17, 2010 11:08 AM

I went to the site, and all I got, after a considerable wait, was a black box which changes size as I scroll around. Could it be confused since I disable not only active scripting but the use of CSS? Does this mean I’m immune to this attack?

trapspam.honeypot • December 3, 2010 10:44 AM

I am currently using Firefox 4.08bPre Minefield and Firefox 3.6.14 Pre Namoroka, both in testing mode.

Also Firefox plugins AskForSanitize 2.1, BetterPrivacy 1.48.2, Beef Taco 1.3.2, Ghostery 2.4.2, HTTPS-Everywhere 0.9.2, and post browsing software:

PurgeFox Pro, PurgeIE Pro, CCleaner, EasyCleaner, and Evidence Eliminator (since Beta ver. 1.0 licensed from original source).

grishma • October 9, 2012 1:39 AM

what attacks happened on browsers?