Tracking Anonymous Web Users

This research shows how to track e-commerce users better across multiple sessions, even when they do not provide unique identifiers such as user IDs or cookies.

Abstract: Targeting individual consumers has become a hallmark of direct and digital marketing, particularly as it has become easier to identify customers as they interact repeatedly with a company. However, across a wide variety of contexts and tracking technologies, companies find that customers can not be consistently identified which leads to a substantial fraction of anonymous visits in any CRM database. We develop a Bayesian imputation approach that allows us to probabilistically assign anonymous sessions to users, while ac- counting for a customer’s demographic information, frequency of interaction with the firm, and activities the customer engages in. Our approach simultaneously estimates a hierarchical model of customer behavior while probabilistically imputing which customers made the anonymous visits. We present both synthetic and real data studies that demonstrate our approach makes more accurate inference about individual customers’ preferences and responsiveness to marketing, relative to common approaches to anonymous visits: nearest- neighbor matching or ignoring the anonymous visits. We show how companies who use the proposed method will be better able to target individual customers, as well as infer how many of the anonymous visits are made by new customers.

Posted on February 5, 2016 at 6:56 AM18 Comments

Comments

Ron W. February 5, 2016 7:47 AM

I like how you can mad-lib “companies” and “customers” with “government” and “citizens”.

SoWhatDidYouExpect February 5, 2016 8:30 AM

From the post:

“We show how companies who use the proposed method will be better able to target individual customers…”

I read this as another way of just plain trying to shove a “product” down someones throat. This is how one sells products that nobody wants (as opposed to products that people need).

By “product”, that could be something real or perhaps a mindset (point of view) that could be political, religious, or otherwise.

Primarily, the collected and associated data is likely to be used when necessary to influence, intimidate, or control various segments of the population.

By the way, “target” has more meanings than those implied for marketing.

K.S. February 5, 2016 9:24 AM

While I am skeptical that they will be successful in tracking competent anonymous users, the fact that anonymity is so rare these days could be used as an identifier. That is, the very last perfectly anonymous user will be easily trackable.

I think much better approach to maintaining anonymity is throwaway identities. Protect only key information and allow some degree of tracking for some time, this way you do not stand out as much.

Petter February 5, 2016 9:32 AM

Or you could start treating the customers with respect and begin to build a relationship based on mutual understanding and shared values. Not following them around, spying on them and trying to force feed them with what you like to sell them.

It can be a life long commitment between the person behind the keyboard and you and your brand.
But you need to think before you force yourself into their lives.

For the last 20 years a lot of people believe that tracking customers is CRM.
It’s not. There’s no “relationship” in it at all.

It’s data driven CM at its worst.

Shazbot February 5, 2016 9:39 AM

I’m getting to where anytime I want to get online to do almost anything I just spin up a VM, do what I need to do, then blow it away after. I run a VPN on the host machine and a different VPN in the VM and change the exit points each time. I also run scripts that change my real and virtual NIC MAC addresses before every connect and also run a fresh default browser each time. I suppose I need to use persistent idents for things like Amazon but at least they can’t peg my location and machine config.

I do it all more out of spite than any real requirement for anonymity. I hate being advertised to. Let them snag the hoi polloi in their marketing and leave me alone.

SoWhatDidYouExpect February 5, 2016 10:17 AM

@Shazbot:

When you perform your operations, do you still run through the same physical router (or modem) that handles your NAT activity? If so, all of your “virtual” configurations can be tracked back to the router/modem MAC address and its IP address to your ISP. Your doorway to the network probably always looks the same.

ianf February 5, 2016 10:39 AM

@ Shazbot,
                   this sounds like something that could be scripted/ automated down to a single CLI command “goweb” (say) to launch the session, and “endweb” (say) to clean it up after (I presume that you already run a hardened/ non-leaky hardware).

327543 February 5, 2016 10:44 AM

@Shazbot: That sounds like a lot of effort to set up and it has no guarantees of anonymity. Why not just use Tor Browser with security slider set to High and forget the VMs?

k15 February 5, 2016 12:24 PM

How many of these sorts of analyses are robust against deliberate attempts to subvert them? Or could that in&of itself be detected.

antigibbone February 5, 2016 12:53 PM

The article’s details show that this approach is easily defeated by anyone using basic common sense: do not follow marketing links especially from email, pollute the data a little, and use an anonymizing proxy like https://github.com/essandess/osxfortress.

Aside from time of arrival statistics, the consumer-specific parameters used are very basic. One or more of the three basic precautions mentioned will break this:

The customer-specific parameters in our multivariate probit model consist of inter- cepts, νUj,m which characterize customer Uj’s overall propensity to engage in each activity m, and coefficients, βUj,m, which characterize customer Uj’s response to visit-specific mar- keting actions for each activity m. In our retail example, νUj,shoes would be the underlying propensity for customer Uj to purchase shoes without any marketing action. If the store sends this customer an advertisement, their underlying utility for purchasing shoes would increase by βUj,shoes,ad. We also have a population-level correlation structure, Σ, among all the activities, as was done in Manchanda et al. (1999), to accommodate the possibility that some activities tend to occur together during the same visit, e.g., purchasing women’s tops and women’s skirts.

David Leppik February 5, 2016 3:27 PM

@k15

As described, this is fairly easy to defeat. The crux of what they are saying is that if two anonymous users have similar purchasing patterns, they are probably the same person.

Bruce’s introduction is misleading. The paper focuses not on online behavior, but on retail purchases, though it claims the two can be tied together. (But I’m skeptical; see below.)

It’s a lot like spam filtering, except to detect users rather than spam. Classify meaningful patterns (mostly automated), look for those patterns, and flag things that mach the patterns. In this case, look for patterns that non-anonymous users have and apply them to the pool of anonymous transactions.

As such, the result is really noisy, especially if it isn’t fine-tuned.

If you buy a standard McDonald’s meal or Starbucks drink, your purchase is no different from a dozen other customers the same day, so it can’t be de-anonymized.

On the other hand, this could be really effective at detecting me at a grocery store, since my purchases are predictable and distinctive (especially meat + kid-friendly fake meat) thereby defeating my use of anonymous payment.

However, most of my online purchases start with a non-anonymous credit card and end with home delivery, so I’m already not anonymous. There’s no overlap between what I buy at Target stores and at Target.com, so Target couldn’t use this to link my home address to an in-store purchase.

Long story short, it’s noisy, but marketers don’t need to de-anonymize with 100% success, or even 50% success. They just want to do better than the baseline, 0%. If they can estimate the number of customers a little better, that’s a win. If they have a 10% better chance of sending you a well-targeted coupon, that’s also a win.

Nobody February 5, 2016 4:51 PM

@Shazbot:
Do you have customized VMs that have slightly different characteristics? Remember the horror you felt when you ran EFF’s Panopticlick browser finger-printer and discovered how unique your machine configuration was? You might want to have a few VMs with different fonts installed, or different versions of Firefox installed just to throw off that identification method.

@David Leppik
Yes, if you visit Amazon, and first look at camping gear, then browse Beatles’ CDs every time you visit, that could leave a “profile” behind.

Or if you look at cosmetics, then browse Justin Bieber CDs, that could leave a certain demographic profile as well.

And while shopping at the grocery store, do you always shop at the same time of day, such as on the way home from work? Yes, spending cash is useful, and shopping at stores where cash is more common won’t stick out as much.

Anon Y. Mouse February 5, 2016 5:36 PM

And what about ‘browser fingerprinting?’ If companies, the government,
or anybody else who operates a remote host is willing to do a little
work on the server side, it seems they can obviate the need for
client-side storage (i.e., cookies) or user IDs. It’s a lot more
difficult for a user to obscure all the details leaking from their
browser or change them from session to session.

John Doe February 5, 2016 5:53 PM

Is this Bruce’s way of telling us to use our real names?

On another tack, Slashdot, Feb 17, 2012: http://science.slashdot.org/story/12/02/17/1927229/how-companies-learn-your-secrets

To quote: …About a year after Pole created his pregnancy-prediction model, a man walked into a Target outside Minneapolis and demanded to see the manager. He was clutching coupons that had been sent to his daughter, and he was angry. ‘My daughter got this in the mail!’ he said. ‘She’s still in high school, and you’re sending her coupons for baby clothes and cribs? Are you trying to encourage her to get pregnant?’ The manager apologized and then called a few days later to apologize again but the father was somewhat abashed. ‘It turns out there’s been some activities in my house I haven’t been completely aware of. She’s due in August. I owe you an apology.’

Y’all get the feeling this isn’t anything new under the sun…

ianf February 6, 2016 11:48 AM

@ John Doe,
                      while I don’t doubt the existence of such a statistical pregnancy-prediction model in use by the marketers at Target, the accompanying Slashdot story illustrating one of its application’s outcomes sounds made up, apocryphal at best.

As the store detects pregnant customers by their combination buying of flagged pre-birth baby products there, that 15 year old soon-to-be-mom must’ve waved around a credit or debit card in her name quite a bit. Apart from “pee-on” pregnancy tests, what could she have been buying that made her qualify? Vitamin and nutritional supplements? The Baby Names Catalog? Nothing bulky that would give her condition away to her nearest and dearest, I presume. The card must’ve been connected to her guardians’ checking bank account. Would not then the adult paying the itemized monthly CC bills have put the 2 and 2 together? (equals 4). Do 15yo in the USA carry around credit cards? Are national retail chains like the Target in the habit of up-selling (of even just diaper coupons) to named minors, esp. in such contentious areas as definitely private and potentially undisclosed, pregnancies?
                      That first aggravated-then-apologetic father figure contacting Target at the end is just too cookie-cutter-cutesy to be swallowed whole. Next attempt to pull wool over my eyes, please.

theodore February 6, 2016 8:09 PM

re: David Leppik

” There’s no overlap between what I buy at Target stores and at Target.com,…”

Yes, they can. The simple fact that there’s no overlap is, in itself, a datapoint.

Docotor Senseless February 13, 2016 2:30 AM

http://torguard.net/store/Netgear/NETGEAR-R7000-DDWRT-VPN-Router?limit=100\

Pre-configured router already flash with Advanced DD-WRT security software and pre-configured with VPN settings are a good start for new users. to remain anonymous, get Tails or JonDo Live-CD and put it in your laptop. Make a fake MAC address for your wireless adapter and disable the other adapters, then use wireless internet in town with your Jon/Do or Tails DISC.

If you know how you can connect back to your VPN network and then onto Tor so the shopping centre where you use free internet never has aa fix on you. You can also search using Tor Bundle without exposing your identity.

Leave a comment

Login

Allowed HTML <a href="URL"> • <em> <cite> <i> • <strong> <b> • <sub> <sup> • <ul> <ol> <li> • <blockquote> <pre> Markdown Extra syntax via https://michelf.ca/projects/php-markdown/extra/

Sidebar photo of Bruce Schneier by Joe MacInnis.