Websites that Collect Your Data as You Type

A surprising number of websites include JavaScript keyloggers that collect everything you type as you type it, not just when you submit a form.

Researchers from KU Leuven, Radboud University, and University of Lausanne crawled and analyzed the top 100,000 websites, looking at scenarios in which a user is visiting a site while in the European Union and visiting a site from the United States. They found that 1,844 websites gathered an EU user’s email address without their consent, and a staggering 2,950 logged a US user’s email in some form. Many of the sites seemingly do not intend to conduct the data-logging but incorporate third-party marketing and analytics services that cause the behavior.

After specifically crawling sites for password leaks in May 2021, the researchers also found 52 websites in which third parties, including the Russian tech giant Yandex, were incidentally collecting password data before submission. The group disclosed their findings to these sites, and all 52 instances have since been resolved.

“If there’s a Submit button on a form, the reasonable expectation is that it does something—that it will submit your data when you click it,” says Güneş Acar, a professor and researcher in Radboud University’s digital security group and one of the leaders of the study. “We were super surprised by these results. We thought maybe we were going to find a few hundred websites where your email is collected before you submit, but this exceeded our expectations by far.”

Research paper.

Posted on May 19, 2022 at 6:23 AM29 Comments

Comments

Andrew May 19, 2022 7:59 AM

Every time I have a browser that autofills an email address when I want to use a different email address for that site, I think how its easier for those two identities to be joined together.

Jan May 19, 2022 8:35 AM

An adblocker would probably block most of the scripts responsible for this.

I’m conflicted. I used to turn off ad blocking on sites I trusted and wanted to support, but you open yourself up to so much crap from third parties. Just blocking everything seems the safe choice.

Ted May 19, 2022 8:46 AM

Furthermore, we find incidental password collection on 52 websites by third-party session replay scripts.

The incidental password collection seems rather scary to me. Does anyone know what a session replay script is?

An overwhelming majority (50/52) of these leaks were due to Yandex Metrica’s session recording feature.

Also, has anyone tried LeakInspector, the research group’s browser add-on? I’d almost like to see what details it gives about sniff and leak attempts.

I’m surprised at how many legitimate responses the group received from both the first and third parties they contacted. Interesting that the leakiest sites were categorized as “Fashion/Beauty.”

BeeKay May 19, 2022 8:48 AM

The link to the paper isn’t working for me. My browser claims it cannot find the server.

Peter A. May 19, 2022 9:23 AM

What incentives website owners receive to include all that crap on their sites? I know, sometimes it’s (petty) money from advertisements – which would be fair enough, if the advertising companies wouldn’t betray everyone’s trust by scraping, stealing, logging, and mining everything they can get, including keystrokes clearly not intended for them but for the website owner. What about all the other third party JS – “analytics”, counters and all that bullshit?

Some pages contain more than ten “analytics” outside scripts. What are the benefits (if any) for the website owner from including all this links to external code?

Gideon May 19, 2022 10:01 AM

This paper raises more questions than it answers…

NB: The researches went to an enquiry form and typed in their email address – nothing was collected that wasn’t typed.

Which begs the question – how many people fill in their email ‘accidentally’ and then decide not to send the enquiry.

To anybody who cares about such things I think we can all agree that typing private information onto other peoples’ websites is not a good plan!

The subsequent question – has anybody actually used an email aquired this way – I suspect not – it’s simply not worth the time and effort.

A few more important points:

1) It’s a 2 minute task to opt out of such things – https://yandex.com/support/metrica/general/opt-out.html

2) The developer of the website has clear, simple instruction on how not to collect this kind of data – https://yandex.com/support/metrica/webvisor-v2/settings.html

3) Lastly, the researchers appear to be completley blind to the fact that Microsoft Clarity has much the same ability – no mention of the US company’s product in their paper.

temy May 19, 2022 10:10 AM

JavaScript has long been known as a fundamental privacy/security vulnerability, though it has many legitimate uses.

most websites do not require javascript.

activate it only when actually needed.

Clive Robinson May 19, 2022 11:05 AM

@ Bruce, All,

A surprising number of websites include JavaScript keyloggers that collect everything you type as you type it, not just when you submit a form.

You should not be at all surprised about this.

I’ve tallked about Google identifing users by their typing cadence in the search box with it’s helpful auto-hints, for several years now on this blog…

In fact it’s one of the reasons I’ve advised in many places from the late 1990’s not just on this blog that Javascript and certain other features of HTML, etc that Google and Co pushed into specifications should be removed from the specifications (but that of course would cause a loss of fiscal and other benificial input to these standards bodies…).

So treat Google’s behaviour much like you would the NSA’s and that darn Dual Eliptic Curve PRBG that caused NIST the humiliating climb down and withdrawal of the specification.

Yet again academia is a decade or more behind this blog, and again it’s a Usenix paper where it gets published (a hint for junior researchers, consider Usenix and IEEE if your paper is off of syandard orthodoxie).

But…

Consider everything you type in a browser leaks information, not just what you “see with keypresses”, but hidden characters with cut-n-paste, typing cadence, phrasing, syntax and even spelling mistakes.

They are all “identifiers” some are strong enough to uniquely identify you such as usernames, some weakly such as a service.

An email address is generally a unique combination of a strong identifier (UserName) in a weaker identifing (ServName) domain.

Just the domain (ServName) is generally sufficiant to “nail your hide” by typing cadence, spelling mistakes, phraseology, etc.

The problem of course is users love features like Auto-Compleat, Auto-Suggest, Spell-check and much else… They will squark if you turn them off…

Thus those creating add blockers and the like are going to fail at stopping “Identifier-leakage” that is effectively unique to individuals…

It’s why I keep saying,

1, Turn off JavaScript.
2, Where possible disable much of HTML 5.

And quite a few other things.

One other user level way to limit your exposure is use a text editor that is fully local to your device. One that has spell checking, and other correction in it, and alows you to Cut-n-Past into the browser or other communications software.

It’s by no means a perfect solution but it does help limit your side channel leakage exposure.

lurker May 19, 2022 11:27 AM

@temy
Turning js on only when you and the site both need it, then turning it off before leaving the site, seems to be the only practical solution.

Building a whitelist is fraught with peril. They’re usually of the form,
Allow js from these sites:
XXX
YYY
ZZZ
but who hosts their own js these days? It usually comes from a cdn that hosts js for everybody including crooks.

Pasquale May 19, 2022 11:54 AM

Re: turn off Javascript, and turn it back on when needed:

The problem with that, even ignoring third-party stuff, is that many sites require JS for no good reason; hell, even this blog no longer has a working preview button for comments. For the last year, nytimes.com pages usually show nothing but “Please enable JS and disable any ad blocker” (sometimes the onion service works)—just to show a news story, i.e., the very model of what a web page could do in 1993.

Were I to enable it: If I try to search for something, but mistype the CTRL+F, the site might get the keystrokes. Maybe they can even intercept CTRL+F. They might be watching where I scroll, where the mouse points, how long I spend anywhere, which outgoing links I follow. If breaking their non-JS experience means everyone will just enable it, and the site will get more of that sweet private data, why wouldn’t they break it? This is not something I want to encourage.

One should always think carefully before reducing one’s security and privacy settings in order to view some site. This goes for things like Tor-blocking too. In practice, I have no real relationship with most sites I visit. They’re just the result of links I followed from elsewhere, and often by the time I get to that tab I hardly remember what it was supposed to be; all I see is some unknown page asking me to fill a CAPTCHA, enable scripting, log in, give them money, whatever; and I close those tabs. There’s not yet a shortage of competition.

Ted May 19, 2022 12:33 PM

@Gideon

has anybody actually used an email aquired this way

That is a good question. Especially when enough users actually submit their email. Plus, using leaked data seems like a good way to get into a fight with the EU (GDPR).

Just read that people could use an email relay to further hide their real email from online services.

Recently, Mozilla [20], Apple [18], and DuckDuckGo [19] started to offer private email relay services that give users the ability to generate and use pseudonymous (alias) email addresses.

Interesting premise.

Quantry May 19, 2022 1:14 PM

since you likely already downloaded it anyway
(from the source of the page you are now reading),
someone tell me at a glance if this 90Kb script has a keylogger,
and whether it does’t any other time, or for “certain users”:

…/jquery/3.5.1/jquery.min.js?ver=3.5.1

TAG: Dependency Hell

fallon May 19, 2022 3:32 PM

…several available free websites will quickly test any given URL for keyloggers, trackers, etc … using custom forensic browsers.

(“schneier.com” tests very clean)

Ian Mason May 19, 2022 4:16 PM

The first thought I had on first reading about this was: “Now, if they harvest emails as soon as they are entered, does this open up an exploit whereby one could stuff their databases full of whatever garbage one wished to?”.

ResearcherZero May 20, 2022 12:12 AM

@fallon, @SpaceLifeForm

If you try and draw ASCII Art, all the tiny animal penises are always formatted wrong here anyway, no matter how many hours spent trying to craft them.

Gert-Jan May 20, 2022 6:15 AM

@Gideon

NB: The researches went to an enquiry form and typed in their email address – nothing was collected that wasn’t typed.

It is not accidental, that the researchers looked at EU users entering but not submitting data on non-EU websites.

The GDPR legislation states that no personal identifiable information may be collected without the person’s consent. And that is can only be used for the purpose for which it was provided.

Those 2950 websites are violating this, at least potentially. Without evidence to the contrary, one has to assume that the collected data is actually stored, used, sold, etc.

Kim May 21, 2022 4:49 AM

@Gideon

has anybody actually used an email aquired this way

Yes. Researchers received many marketing emails on the addresses they typed into forms, but not submitted.

Which begs the question – how many people fill in their email ‘accidentally’ and then decide not to send the enquiry.

That misses the point: when you fill in a login form, do you expect your email address to be collected for tracking purposes–let alone before you submit the form?

JonKnowsNothing May 21, 2022 2:26 PM

@All

a) Folks can use open free form input fields to spell check text and never intend to send it or post it from that program. Used to cut and paste into a secondary system that does not have spell check capability. It can be from a text processor pasted into email or into a form where there are no post editing allowed to fix typos.

b) Input fields are certainly kept and parsed for validations and parameter checking, normally they are discarded if the submit doesn’t happen but they can also be held over on a failed submit. This would normally show up as an error on one of the input fields that is not checked on-the-fly but only on submit testing. The page repopulates with the same data.

c) There are recall fields in some applications, like a history text roll, that are stored in some applications or profiles. If you can scroll up or scroll down on a text box and pull in the last 99 entries that’s stored somewhere (local or server side). The Roll Forward and Roll Back on editing often stored stored in the document metadata, was flagged early on as a way for LEAs to track document editing changes, particularly for group edited documents and a security concern.

d) Audit logs, particularly on the receiver’s side would capture tons of data. It seems that this is what’s the current concern.

There is data flying everywhere. It it wasn’t there wouldn’t be a connection.

SpaceLifeForm May 21, 2022 4:50 PM

@ Quantry, ALL

Your payload size may vary.

hxtps://web.dev/gov-uk-drops-jquery/

umberto May 23, 2022 3:18 PM

Yet again academia is a decade or more behind this blog,

That may be a good thing.

Whatever problems it may have, ‘academia’ has not yet decayed to the same sorry state this blog is in, with most of it being written by a few (always the same) contributors, each of them driven by their very personal motives ranging from dubious obsessions to a never subsiding craving for attention and admiration.

- May 23, 2022 5:06 PM

@umberto:

“… written by a few (always the same) contributors, each of them driven by their very personal motives ranging from dubious obsessions to a never subsiding craving for attention…”

An apt description of yourself by yourself.

As you have never yet written anything even remotely helpfull, constructive, or on topic, you just witter on with your own pointless quest, trying to appear as something most can guess you are not.

It’s kind of sad realy, but then most can probably also guess your motivation, as you’ve been at it under different aliases for so long.

Sumadelet May 25, 2022 5:40 AM

@Gert-Jan

The GDPR legislation states that no personal identifiable information may be collected without the person’s consent. And that is can only be used for the purpose for which it was provided.

That’s not completely correct. There are 5 or 6 (depending on how you interpret the regulation) grounds for processing personally identifiable data. ONE of those grounds is consent.

One of the (deliberate) irritations of the GDPR is that you can’t do the usual trick of relying on multiple grounds – as someone responsible for the collected data, you must choose ONE ground, and be prepared to document and defend it.

The grounds (loosely) are (from: h++ps://ec.europa.eu/info/law/law-topic/data-protection/reform/rules-business-and-organisations/legal-grounds-processing-data/grounds-processing/when-can-personal-data-be-processed_en) :

1) Consent
2) Contractual Obligation
3) Legal Obligation
4) Public Interest
5) Vital Interest
6) Legitimate Interest

(Further details and examples in the link)

As many have noticed, online advertisers are attempting to use the Legitimate Interest ground instead of the Consent ground. I haven’t seen that substantively tested yet, but the regulators are keeping a watchful, if slow-moving, eye.

The UK Information Commisioner’s Office (ICO) explanation of Legitimate Interests is here:

h++ps://ico.org.uk/for-organisations/guide-to-data-protection/guide-to-the-general-data-protection-regulation-gdpr/lawful-basis-for-processing/legitimate-interests/

All use of the Legitimate Interests ground for marketing is not disallowed, in my view unfortunately, as the ICO saya:

h++ps://ico.org.uk/for-organisations/guide-to-data-protection/guide-to-the-general-data-protection-regulation-gdpr/legitimate-interests/when-can-we-rely-on-legitimate-interests/#marketing_activities

Recital 47 of the UK GDPR says:

“…The processing of personal data for direct marketing purposes may be regarded as carried out for a legitimate interest.”

This means that direct marketing may be a legitimate interest. However the UK GDPR does not say that direct marketing always constitutes a legitimate interest, and whether your processing is lawful on the basis of legitimate interests depends on the particular circumstances.

It goes on to say:

Given that individuals have the absolute right to object to direct marketing under Article 21(2), it is more difficult to pass the balancing test if you do not give individuals a clear option to opt out of direct marketing when you initially collect their details (or in your first communication, if the data was not collected directly from the individual). The lack of any proactive opportunity to opt out in advance would arguably contribute to a loss of control over their data and act as an unnecessary barrier to exercising their data protection rights.

Which is why the pop-ups allow you to Object to Legitimate Interests processing – but this is done on a one-by-one basis, and you might need to go through several dozens of organisations. So while ‘consent’ should be as easy to withhold as allow, objecting to Legitimate Interests appear to have no such condition imposed, making the pop-ups very, very irritating.

Winter May 25, 2022 8:09 AM

@Sumadelet

So while ‘consent’ should be as easy to withhold as allow, objecting to Legitimate Interests appear to have no such condition imposed, making the pop-ups very, very irritating.

Actually, rejection should be at least as easy as accepting. And it should be opt-in only, no pre-ticked boxes
ht-tps://www.privacypolicies.com/blog/cookie-consent-requirements-germany/

But the pop-ups are in breach of the GDPR in many ways.
ht-tps://www.computerweekly.com/news/252512832/Mechanism-underlying-cookie-popups-found-in-breach-of-GDPR

Sumadelet May 25, 2022 10:34 AM

@Winter

Be careful not to confuse the ePrivacy regulations (‘Cookie Law’) and GDPR. In Germany there are also additional local regulations.

Article 7 section 3 of the GDPR ( h++ps://eur-lex.europa.eu/legal-content/EN/TXT/HTML/?uri=CELEX:32016R0679&from=EN#d1e2001-1-1 ) does indeed say:

The data subject shall have the right to withdraw his or her consent at any time. The withdrawal of consent shall not affect the lawfulness of processing based on consent before its withdrawal. Prior to giving consent, the data subject shall be informed thereof. It shall be as easy to withdraw as to give consent.

however, that is dealing with the Consent ground for legal processing of personal data. The Legitimate Interests ground does not have the same conditions. Objection to processing under the Legitimate Interests ground is covered in Article 21 ‘Right to object’

h++ps://eur-lex.europa.eu/legal-content/EN/TXT/HTML/?uri=CELEX:32016R0679&from=EN#d1e2793-1-1

1. The data subject shall have the right to object, on grounds relating to his or her particular situation, at any time to processing of personal data concerning him or her which is based on point (e) or (f) of Article 6(1)[Article 6(1)f is Legitimate Interests], including profiling based on those provisions. The controller shall no longer process the personal data unless the controller demonstrates compelling legitimate grounds for the processing which override the interests, rights and freedoms of the data subject or for the establishment, exercise or defence of legal claims.

2. Where personal data are processed for direct marketing purposes, the data subject shall have the right to object at any time to processing of personal data concerning him or her for such marketing, which includes profiling to the extent that it is related to such direct marketing.

3. Where the data subject objects to processing for direct marketing purposes, the personal data shall no longer be processed for such purposes.

4. At the latest at the time of the first communication with the data subject, the right referred to in paragraphs 1 and 2 shall be explicitly brought to the attention of the data subject and shall be presented clearly and separately from any other information.

5. In the context of the use of information society services, and notwithstanding Directive 2002/58/EC, the data subject may exercise his or her right to object by automated means using technical specifications.

6. Where personal data are processed for scientific or historical research purposes or statistical purposes pursuant to Article 89(1), the data subject, on grounds relating to his or her particular situation, shall have the right to object to processing of personal data concerning him or her, unless the processing is necessary for the performance of a task carried out for reasons of public interest.

Here, there is no condition saying that objecting to Legitimate Interests processing should be as easy as acquiescing. As a result, having to go individually through each organisation’s section of the pop-up is allowed by the GDPR.

The end result is that Internet advertisers make the claim that their processing is covered by the Legitimate Interests ground for lawful processing. In certain cases it might be true, which makes things unfortunately complicated.

Winter May 25, 2022 10:59 AM

@Sumadelet

  1. In the context of the use of information society services, and notwithstanding Directive 2002/58/EC, the data subject may exercise his or her right to object by automated means using technical specifications.

This means that the industry must honor the “do not track” preference. Which it choose not to do

Furthermore, if the subject is given the option to “Accept” cookies, the option to “Reject” cookies must be given with the same ease of use and “Accept” cannot be preselected. Consent, which the cookie banner asks, must be given by a meaningful, unambiguous, action.

If there really is a legitimate interest, consent does not have to be asked. But as the industry do ask for consent, by the first comunication rule, they must give a clear and simple option to Reject it.

JonKnowsNothing May 25, 2022 11:44 AM

@ Winter, @Sumadelet, @All

re: This means that the industry must honor the “do not track” preference.

Wherein lies the problem.

As noted by Winter, Industry does not honor “do not track” any more than they honor a plethora of other rules and requirements.

All mitigations that depend on “Industry Must Do” is a rat hole of exceeding depth. You need 1 exception and the entire mitigation scheme fails. Our current situation is chock-a-block of such exceptions to the rule.

Any rule that depends on someone else to DO SOMETHING is going to be subject to failure.

As Clive and Others have stated many times, if the End User is not 100% in control of the rule, then the rule is only there to be broken.

It can be a full time employment scheme though, and lots of careers are founded on finding The Exception To The Rule.

Sumadelet May 25, 2022 12:04 PM

@Winter

The pop-ups I see vary, but becoming more frequent are ones that have a section that lists ‘Partners’, where the ground for processing is given as Legitimate Interests, with the line-by-line possibility to object to the Legitimate Interests processing. This entirely separate to the previously ubiquitous ‘consent’-based approach. As you correctly say, using the Consent ground requires withdrawal of consent to be as easy as acceptance.

The sorry history of ‘Do Not Track’ is a separate issue, which I won’t go into here.

The forthcoming ePrivacy regulations ( h++ps://ecommerce-europe.eu/news-item/eprivacy-regulation-update-on-developments-in-the-council-of-the-eu/ ) might improve matters, or simply make things more complicated. We’ll see.

Leave a comment

Login

Allowed HTML <a href="URL"> • <em> <cite> <i> • <strong> <b> • <sub> <sup> • <ul> <ol> <li> • <blockquote> <pre> Markdown Extra syntax via https://michelf.ca/projects/php-markdown/extra/

Sidebar photo of Bruce Schneier by Joe MacInnis.