Entries Tagged "data poisoning"

Page 1 of 1

Waze Data Poisoning

People who don’t want Waze routing cars through their neighborhoods are feeding it false data.

It was here that Connor learned that some Waze warriors had launched concerted campaigns to fool the app. Neighbors filed false reports of blockages, sometimes with multiple users reporting the same issue to boost their credibility. But Waze was way ahead of them.

It’s not possible to fool the system for long, according to Waze officials. For one thing, the system knows if you’re not actually in motion. More important, it constantly self-corrects, based on data from other drivers.

“The nature of crowdsourcing is that if you put in a fake accident, the next 10 people are going to report that it’s not there,” said Julie Mossler, Waze’s head of communications. The company will suspend users they suspect of “tampering with the map,” she said.

Posted on June 9, 2016 at 6:17 AMView Comments

Apple Patents Data-Poisoning

It’s not a new idea, but Apple Computer has received a patent on “Techniques to pollute electronic profiling”:

Abstract: Techniques to pollute electronic profiling are provided. A cloned identity is created for a principal. Areas of interest are assigned to the cloned identity, where a number of the areas of interest are divergent from true interests of the principal. One or more actions are automatically processed in response to the assigned areas of interest. The actions appear to network eavesdroppers to be associated with the principal and not with the cloned identity.

Claim 1:

A device-implemented method, comprising: cloning, by a device, an identity for a principal to form a cloned identity; configuring, by the device, areas of interest to be associated with the cloned identity, the areas of interest are divergent from true areas of interest for a true identity for the principal; and automatically processing actions associated with the areas of interest for the cloned identity over a network to pollute information gathered by eavesdroppers performing dataveillance on the principal and refraining from processing the actions when the principal is detected as being logged onto the network and also refraining from processing the actions when the principal is unlikely to be logged onto the network.

EDITED TO ADD (7/12): Similar technology and concept has already been developed by Breadcrumbs Solutions, and will be out as a free beta software in a few months.

Posted on June 21, 2012 at 5:51 AMView Comments


In the wake of AOL’s publication of search data, and the New York Times article demonstrating how easy it is to figure out who did the searching, we have TrackMeNot:

TrackMeNot runs in Firefox as a low-priority background process that periodically issues randomized search-queries to popular search engines, e.g., AOL, Yahoo!, Google, and MSN. It hides users’ actual search trails in a cloud of indistinguishable ‘ghost’ queries, making it difficult, if not impossible, to aggregate such data into accurate or identifying user profiles. TrackMeNot integrates into the Firefox ‘Tools’ menu and includes a variety of user-configurable options.

Let’s count the ways this doesn’t work.

One, it doesn’t hide your searches. If the government wants to know who’s been searching on “al Qaeda recruitment centers,” it won’t matter that you’ve made ten thousand other searches as well — you’ll be targeted.

Two, it’s too easy to spot. There are only 1,673 search terms in the program’s dictionary. Here, as a random example, are the program’s “G” words:

gag, gagged, gagging, gags, gas, gaseous, gases, gassed, gasses, gassing, gen, generate, generated, generates, generating, gens, gig, gigs, gillion, gillions, glass, glasses, glitch, glitched, glitches, glitching, glob, globed, globing, globs, glue, glues, gnarlier, gnarliest, gnarly, gobble, gobbled, gobbles, gobbling, golden, goldener, goldenest, gonk, gonked, gonking, gonks, gonzo, gopher, gophers, gorp, gorps, gotcha, gotchas, gribble, gribbles, grind, grinding, grinds, grok, grokked, grokking, groks, ground, grovel, groveled, groveling, grovelled, grovelling, grovels, grue, grues, grunge, grunges, gun, gunned, gunning, guns, guru, gurus

The program’s authors claim that this list is temporary, and that there will eventually be a TrackMeNot server with an ever-changing word list. Of course, that list can be monitored by any analysis program — as could any queries to that server.

In any case, every twelve seconds — exactly — the program picks a random pair of words and sends it to either AOL, Yahoo, MSN, or Google. My guess is that your searches contain more than two words, you don’t send them out in precise twelve-second intervals, and you favor one search engine over the others.

Three, some of the program’s searches are worse than yours. The dictionary includes:

HIV, atomic, bomb, bible, bibles, bombing, bombs, boxes, choke, choked, chokes, choking, chain, crackers, empire, evil, erotics, erotices, fingers, knobs, kicking, harier, hamster, hairs, legal, letterbomb, letterbombs, mailbomb, mailbombing, mailbombs, rapes, raping, rape, raper, rapist, virgin, warez, warezes, whack, whacked, whacker, whacking, whackers, whacks, pistols

Does anyone reall think that searches on “erotic rape,” “mailbombing bibles,” and “choking virgins” will make their legitimate searches less noteworthy?

And four, it wastes a whole lot of bandwidth. A query every twelve seconds translates into 2,400 queries a day, assuming an eight-hour workday. A typical Google response is about 25K, so we’re talking 60 megabytes of additional traffic daily. Imagine if everyone in the company used it.

I suppose this kind of thing would stop someone who has a paper printout of your searches and is looking through them manually, but it’s not going to hamper computer analysis very much. Or anyone who isn’t lazy. But it wouldn’t be hard for a computer profiling program to ignore these searches.

As one commentator put it:

Imagine a cop pulls you over for speeding. As he approaches, you realize you left your wallet at home. Without your driver’s license, you could be in a lot of trouble. When he approaches, you roll down your window and shout. “Hello Officer! I don’t have insurance on this vehicle! This car is stolen! I have weed in my glovebox! I don’t have my driver’s license! I just hit an old lady minutes ago! I’ve been running stop lights all morning! I have a dead body in my trunk! This car doesn’t pass the emissions tests! I’m not allowed to drive because I am under house arrest! My gas tank runs on the blood of children!” You stop to catch a breath, confident you have supplied so much information to the cop that you can’t possibly be caught for not having your license now.

Yes, data mining is a signal-to-noise problem. But artificial noise like this isn’t going to help much. If I were going to improve on this idea, I would make the plugin watch the user’s search patterns. I would make it send queries only to the search engines the user does, only when he is actually online doing things. I would randomize the timing. (There’s a comment to that effect in the code, so presumably this will be fixed in a later version of the program.) And I would make it monitor the web pages the user looks at, and send queries based on keywords it finds on those pages. And I would make it send queries in the form the user tends to use, whether it be single words, pairs of words, or whatever.

But honestly, I don’t know that I would use it even then. The way serious people protect their web-searching privacy is through anonymization. Use Tor for serious web anonymization. Or Black Box Search for simple anonymous searching (here’s a Greasemonkey extension that does that automatically.) And set your browser to delete search engine cookies regularly.

Posted on August 23, 2006 at 6:53 AMView Comments

Sidebar photo of Bruce Schneier by Joe MacInnis.