E-Mail Tracking

Interesting survey paper: on the privacy implications of e-mail tracking:

Abstract: We show that the simple act of viewing emails contains privacy pitfalls for the unwary. We assembled a corpus of commercial mailing-list emails, and find a network of hundreds of third parties that track email recipients via methods such as embedded pixels. About 30% of emails leak the recipient’s email address to one or more of these third parties when they are viewed. In the majority of cases, these leaks are intentional on the part of email senders, and further leaks occur if the recipient clicks links in emails. Mail servers and clients may employ a variety of defenses, but we analyze 16 servers and clients and find that they are far from comprehensive. We propose, prototype, and evaluate a new defense, namely stripping tracking tags from emails based on enhanced versions of existing web tracking protection lists.

Blog post on the research.

Tags: academic papers, data collection, email, privacy, surveillance, tracking

Posted on October 3, 2017 at 6:45 AM • 30 Comments

Comments

Me_not_you • October 3, 2017 7:48 AM

No wonder I don’t use gmail anymore….smh.

David • October 3, 2017 8:42 AM

@Me_not_you Actually, it will happen regardless of the email client you use, and even if your email provider fetches the images using its own proxy to mask your IP address for privacy.

If the original URL for the image contained a unique ID or email address, it will still leak.

The only thing you can do is to avoid loading images in emails or clicking on links at all, unless you know they are safe. Another option is for the big email providers to recognize these types of leaks by analysing the URL parameters and refusing to load them.

Oldskool • October 3, 2017 9:02 AM

Use a proper mailer (e.g. Thunderbird) and set it to display message bodies as plain text. If it looks kosher, you can then display the message as HTML, with or without images – if you want. I read mail as text/plain by default; that nearly always conveys the full content of the message, and I hardly ever have to switch to HTML.

Or if you have to use webmail, let it be Squirrelmail – it does plain text by default (not sure if it does HTML at all).

HTML email has always been a nightmare. It’s devilishly hard to construct an HTML email that will display correctly in even a majority of readers; and the resulting markup will be full of table tags, inline styling and the rest of those bad markup habits we grew out of 17 years ago (MS Outlook is a particularly challenging target).

Tatütata • October 3, 2017 9:42 AM

E-mail clients that don’t display embedded HTML contents have been around for 20+ years, and it doesn’t take a knowledge in rocket science to avoid directly clicking on random links, especially if they are included in spam.

So what’s new?

I have a problem with this statement on p. 2:

Email providers (e.g., Gmail, employers) and email clients (e.g., Apple Mail, Thunderbird) may both employ measures to mitigate email tracking, such as proxying of images or suppressing cookies.

Gmail is by itself one giant tracker, and if they actually proxy embedded images then they help spammers track and validate e-mail adresses, albeit without revealing the client’s IP address. But there are other ways.

I notice that posting a comment on this blog often results in opening a fresh TLS handshake, so all an eavesdropper needs to do is to correlate the publication time of a comment with this event, or with the uploading of a volume of a few kB of data. A web interface to a mail server should also be vulnerable, especially if the target uploads an attachment of a conspicuous length for transmission.

Who? • October 3, 2017 10:23 AM

My mail user agent has been nmh(7) for at least ten years; before that, it was RAND MH. I had been very happy with it, at least communicating with people that knows the difference between an email and a web page.

Impossibly Stupid • October 3, 2017 10:26 AM

It really should come as no surprise that when you make the web part of everything, everything becomes part of the web. And it brings along with it all the insecurities that are part of the kitchen sink that is a web browser. When UI frameworks first started offering a WebView for easy embedding it seemed like such a great idea. Now I just want to get back to using simple text for most things. Usenet shall be reborn! 🙂

MAX • October 3, 2017 11:12 AM

@Me_not_you Actually, it will happen regardless of the email client you use, and even if your email provider fetches the images using its own proxy to mask your IP address for privacy.

As I recall, even in the early days of HTML-mail it was common for clients not to fetch external resources, or have an option to disable it. I think Outlook Express had an “only show attached images” option 2 decades ago, to guard against exactly this. Modern Outlook still sucks but even it includes referenced images as attachments when sending messages.

I’m surprised “modern” clients are so willing to fetch stuff. Maybe you can put it into some sort of “offline” mode or configure an invalid HTTP proxy? Like, set all proxy addresses to localhost:1 and “no proxy for” your mail server address(es). That can even work for webmail if you run it from its own browser profile (something like RequestPolicy might be easier).

The paper is a bit sloppy when it says “but embedded images and stylesheets are allowed”: it’s actually making no distinction between referenced external images and embedded (attached) images. This is important and should have been included in table 12.

confused • October 3, 2017 11:24 AM

About 30% of emails leak the recipient’s email address to one or more of these third parties when they are viewed. In the majority of cases, these leaks are intentional on the part of email senders

The sender already knows the email address. If they want to leak it why not just leak it? If they want to link the address to a cookie the sender can tag the email with a random unique identifier and give the (unique id, email pair) to the third part. So leaking email addresses is just a subset of tracking. This has implications for any possible defense.

I find it idiotic to treat addresses as secret information. To be useful they must be known by others. Except in the rare case where every party is given a different address, protection of addresses cannot rely on secrecy nor on any technical measures that protect secrecy. Something else is needed.

Doug Coulter • October 3, 2017 1:06 PM

@confused
Of course they know where they sent mail. Now they know which addresses are being read and when. That’s valuable to many marketing outfits.
Other more nefarious things are possible but that’s probably the main easy one.

Peter A. • October 3, 2017 2:27 PM

Am I the only one that still uses text-based email client? I feel soooo old.

In fact the main reason is not security, it is just a habit – I am so used to it and can use it so effectively that I see no reason to switch to a GUI client. Most of HTML-only email is spam, if I can’t see the plain text I just hit delete.

I get a small amount of HTML-only email from legitimate senders, but even then I can see the contents in a text-based browser with no problem at all. It is very rare that I actually want to see the pictures linked from an email as opposed to the ones attached.

Clive Robinson • October 3, 2017 3:27 PM

@ Peter A.,

Am I the only one that still uses text-based email client? I feel soooo old.

I used to before I realised that I was getting so much I blocked, that nothing was making it through. So I simply stopped using personal email…

I worked a while at a University quite a while ago, and back then the majority of people were still using 7bit clean text. The mail admin kind of got into trouble when he would not alow 8bit text… But when he opened up as ordered the place got hit by all sorts of nasties. The mail admins assesment was correct, the hierarchy above him however was more interested in kiss-a55ing senior academices and their little wants. Ultimately it was the poor bods who had to clean up the mess that suffered as their workload “Went North of Kathmandu” by the hard road.

The lesson learned is a weak sycophantic hierarchy is one to get out from underneath as fast as you can. Because those brown noses of theirs will not head where commonsense goes…

@ ALL,

With web pages now over three megs on average, how long before a couple of lines of text in an email becomes likewise larger than three paperbacks?

You can not secure it with that level of redundance, as their are way way to many places to hide…

confused • October 3, 2017 3:53 PM

@Doug Coulter

Of course, but that’s not what the paper is about.

For example the paper says that hashing email addresses is easily reversed, since email addresses have low entropy. But that’s irrelevant. Even if the email address was cryptographically strong the second party has the address and hence the hash, and they are cooperating with the third party.

There seems to be some concept of the second party being evil (by embedding links for a third party), but drawing the line at being just a little bit more evil (and giving the third party the actual address). This is what has me confused.

jc • October 3, 2017 4:57 PM

embedded pixels

Either go text-only or do not load external images. There is an option for that in almost all e-mail clients.

About 30% of emails leak the recipient’s email address to one or more of these third parties when they are viewed. In the majority of cases, these leaks are intentional on the part of email senders, and further leaks occur if the recipient clicks links in emails.

Do not load external images.
Do not click on links in e-mails.
Get a good spam filter or change your e-mail provider.
Select the offending e-mail message and move it to the spam folder to help “teach” the aforementioned spam filter.
Don’t sign up for “special deals,” click on said “deals” which appear in your inbox following your signup, shop at the online store that offered said “deals,” and then complain when they try to sell you stuff.
Don’t visit those seedy “grey market” websites that advertise or sell male enhancement pills from some Canadian pharmacy.
Be very clear in setting your limits, especially your communication preferences when you sign up for various accounts online.
Don’t go online drunk or under the influence.

Mark S Hewitt • October 3, 2017 5:26 PM

You need to treat email as open communications, even when encrypted. Our firm uses applications like Signal to transmit documents etc when our internal Network is not possible.

r • October 3, 2017 7:19 PM

“when you make the web part of everything, everything becomes part of the web.”

Should be added to proverbs.

wag • October 3, 2017 7:30 PM

@r:

“when you make the web part of everything, everything becomes part of the web.”

Should be added to proverbs.

Where? I suggest insert that in Proverbs 15, right after “The eyes of the Lord are in every place, beholding the evil and the good. 😉

The scriptures are like the Nostradamus prophecies, you can easily twist them to suit any context.

Lou Katz • October 3, 2017 7:37 PM

This is why I only read mail with a text-based reader (mutt) that is incapable of triggering
any web-based links.

Mike • October 3, 2017 7:55 PM

From the article: “Blocking images by default provides complete protection from tracking when emails are viewed, but can often result in unreadable emails.”

This is why pressure needs to be brought to bear via litigation on these companies, who are actually required by the American’s with Disability Act to provide all of the information in these emails in a form other than pictures.

sitaram • October 4, 2017 12:29 AM

@Peter. A.

I switched from Thunderbird to mutt a couple of years ago (switched back, I should say, since I used mutt for several years in the 90s and early 2000s).

I do get a lot of HTML mail — corporate mail being what it is. I can see the plain text version of course, but often, I do need to see it in its proper formatting for best effect.

So I have mutt set to copy the HTML to /tmp, and open it in “dillo” – an extremely fast browser that doesn’t even have Javascript. Dillo’s config is set to use a non-existent proxy server, so there’s no chance of it succeeding in talking to anyone outside.

Works a treat. I get just the formatting advantages of HTML without all the other junk.

gilby • October 4, 2017 1:05 AM

I use Adguard which operates as a local transparent proxy. This finds things to block in nearly every email when I tell Apple Mail to load remote content (I set Apple Mail, as a default, not to show me any remote content). Mostly related to social media. I suspect Adguard may well be over zealous, but I never notice anything missing from the emails.

ShavedMyWhiskers • October 4, 2017 2:27 AM

The image content and text content often do not match.
The domains of the images are commonly not the domain or account of the sender. View images is fetch and display all images.
In an evil world a message from the FBI could contain illegal images rendered as one invisible pixel leaving discoverable image data signatures. Often the CSS boiler plate is a sold link that contains more sold links that the sender cannot audit or inspect. It might be dynamic.

Browsers often prefetch images (optionally) so eve if not rendered.

The moves by browsers to require https seem to exclude this side door.

The servers can see a request and log it. The request could contain a hash of your zip code and more so then the server can fetch targeted political content bases on the hash of a religion code or a correlation tag to fill in data base omissions.

Time to inspect the messages from my elected officials .gov sites.

Clive Robinson • October 4, 2017 2:33 AM

The thing nobody has mentioned is that the addition of html javascript to do this tracking opens a “back channel” to the attacker.

Not only should email clients not open such “back channels” it should log them to build a “black list” that blocks all future messages with anything that is on the list with a big warning message and optional statistics. I know there will be users who “click through” but that likewise can be made a little harder to raise the pain threshold.

Whilst I know the serious attackers will get more tricksy with time it will slow the daft down. Opening up a gap between the two has advantages especially as it might disincentize the daft.

The point is that you will not stop serious attackers getting things past people because as that saying has it “People are people with all the human failings”. But one human failing is that people do not like their “flow disrupted” and with care it can be used as a pain barrier to modify behaviour.

No Cigar • October 4, 2017 8:46 AM

I use Thunderbird so tried to find an extension that would perform “stripping tracking tags from emails”.

Apparently, there is no such animal. (Why not?) I did install Adblock which might help some, but then there’s their policy about allowing some ads (etc.) to be displayed.

Frankly, my view is email isn’t used much anymore for personal communications. (Neither is using the telephone, either for that matter.) What’s IN now is messaging.

It seems to me that’s the one that needs tweaking for privacy more than anything. Right now.

David • October 4, 2017 2:08 PM

What are these “images” and “pixels” of which you speak, and where do I find them in Alpine?

matteo • October 5, 2017 4:56 AM

thunderbird by default block remote content (like images).
the point of leaking email is to find if someone clicked at a link, for example:
i send spam mail:
i want to see if destination address exists (i’m not sure). i embed a remote image and if it is loaded the mail exists.
i send a newsletter:
i’m quite sure destination exists but i want to see if people click on links contained in it.
instead of writing:
“look this product today cost less buy it! buy.buy/niceProduct”
i write:
“look this product today cost less buy it! buy.buy/niceProduct?from=myMail@mail.mail&date=1/2/3”
in this way (more or less like cookies) i know not only that someone clicked that link but i know that you clicked it and from which marketing email.
if you visited the website before there are also cookies on it so i can correlate your email with the cookie.

this is also a security problem, in fact you can embed images that are not images but normal html pages/forms and do CRSF (cross site request forgery). for example to change router settings.
as i have tested with my old router.
thunderbird say “blocked remote content for privacy” but actuly is also a security problem

matteo • October 5, 2017 5:10 AM

@David:
images in an email can be added in three different ways:
-as attachment (you will see the typical icon and a list of them)
-embedded as part of the (HTML) mail more or less like the text.
-referenced as part of the (HTML) <–this one is problematic.

in the last case the image is not in the mail, inside it there is something like “here there should be an image and you can find it at this link” or… in html code:
this is the typical way a web browser load an image.
the problem is that if it is automatically loaded that web site see that you are connecting to it to download that image and if each message include a different image who sent the mail (and control also the website) can know that you opened it and when.

Drone • October 5, 2017 8:31 AM

@jc said: “8. Don’t go online drunk or under the influence.”

I said:”Hic!”

Jan • October 5, 2017 11:04 AM

@matteo,

images in an email can be added in three different ways:
-as attachment (you will see the typical icon and a list of them)
-embedded as part of the (HTML) mail more or less like the text.
-referenced as part of the (HTML)
in the last case the image is not in the mail

“referenced as part of the (HTML)” can include “as attachment”. Ex., Outlook puts img src=”cid:abc” and attaches an image “abc”. That’s different than ex. src=”data:” which will embed the image data in the HTML as base64.

random • October 9, 2017 3:23 AM

Why use email if you can use end to end encrypted messaging?
Most emails I receive these days are advertising, subscriptions, bills and notifications. Most of these I never have to read except for the subject line maybe. I encourage friends and personal messages to use encrypted messaging services instead. Email appears to be on the way out and so are the problems that come with it.

andrews • October 22, 2017 3:46 PM

For me, e-mail is important. Things vital to the business are, by law, required to be sent this way. You can talk to the regulators if you think there is a better way, but at the moment it mostly works pretty well for me. The old method was to send paper or a fax, and e-mail is far more satisfactory.

However, e-mail also includes tons of spam, some of which does not get strained out by the filter. For those things, I send them to spamcop, but I sometimes also look at them.

A very common form of the tracking hack seems to be embedding an image with the e-mail address base64-encoded in there. That is you have an IMG tag with SRC= their tracking pixel, with an argment normally expressed as “?sucker=[base64 yname@example.com]” so that when it is fetched it identifies the mark. The less sly will just use the equivalent of “?sucker=name@example.com” and I have seen them double-base-64 encode it as well, sometimes wrapped in some leading digits so that you cannot immediately recognize your own e-mail address under the base64.

For fun, you can make modified requests. For instance, point your [text] browser at the same, with ?sucker=other_spammer@gmail.com or the like. For those just collecting live marks, as opposed to checking off your name on a list, they have now gotten valuable data. Sure, it is not your e-mail address, but it is someone who is known to appreciate spam.

It is possible that I am not a nice person.

E-Mail Tracking

Comments

Leave a comment Cancel reply