Surveillance by Algorithm
Increasingly, we are watched not by people but by algorithms. Amazon and Netflix track the books we buy and the movies we stream, and suggest other books and movies based on our habits. Google and Facebook watch what we do and what we say, and show us advertisements based on our behavior. Google even modifies our web search results based on our previous behavior. Smartphone navigation apps watch us as we drive, and update suggested route information based on traffic congestion. And the National Security Agency, of course, monitors our phone calls, emails and locations, then uses that information to try to identify terrorists.
Documents provided by Edward Snowden and revealed by the Guardian today show that the UK spy agency GHCQ, with help from the NSA, has been collecting millions of webcam images from innocent Yahoo users. And that speaks to a key distinction in the age of algorithmic surveillance: is it really okay for a computer to monitor you online, and for that data collection and analysis only to count as a potential privacy invasion when a person sees it? I say it’s not, and the latest Snowden leaks only make more clear how important this distinction is.
The robots-vs-spies divide is especially important as we decide what to do about NSA and GCHQ surveillance. The spy community and the Justice Department have reported back early on President Obama’s request for changing how the NSA “collects” your data, but the potential reforms—FBI monitoring, holding on to your phone records and more—still largely depend on what the meaning of “collects” is.
Indeed, ever since Snowden provided reporters with a trove of top secret documents, we’ve been subjected to all sorts of NSA word games. And the word “collect” has a very special definition, according to the Department of Defense (DoD). A 1982 procedures manual (pdf; page 15) says: “information shall be considered as ‘collected’ only when it has been received for use by an employee of a DoD intelligence component in the course of his official duties.” And “data acquired by electronic means is ‘collected’ only when it has been processed into intelligible form.”
Director of National Intelligence James Clapper likened the NSA’s accumulation of data to a library. All those books are stored on the shelves, but very few are actually read. “So the task for us in the interest of preserving security and preserving civil liberties and privacy,” says Clapper, “is to be as precise as we possibly can be when we go in that library and look for the books that we need to open up and actually read.” Only when an individual book is read does it count as “collection,” in government parlance.
So, think of that friend of yours who has thousands of books in his house. According to the NSA, he’s not actually “collecting” books. He’s doing something else with them, and the only books he can claim to have “collected” are the ones he’s actually read.
This is why Clapper claims—to this day—that he didn’t lie in a Senate hearing when he replied “no” to this question: “Does the NSA collect any type of data at all on millions or hundreds of millions of Americans?”
If the NSA collects—I’m using the everyday definition of the word here—all of the contents of everyone’s e-mail, it doesn’t count it as being collected in NSA terms until someone reads it. And if it collects—I’m sorry, but that’s really the correct word—everyone’s phone records or location information and stores it in an enormous database, that doesn’t count as being collected—NSA definition—until someone looks at it. If the agency uses computers to search those emails for keywords, or correlates that location information for relationships between people, it doesn’t count as collection, either. Only when those computers spit out a particular person has the data—in NSA terms—actually been collected.
If the modern spy dictionary has you confused, maybe dogs can help us understand why this legal workaround, by big tech companies and the government alike, is still a serious invasion of privacy.
Back when Gmail was introduced, this was Google’s defense, too, about its context-sensitive advertising. Google’s computers examine each individual email and insert an advertisement nearby, related to the contents of your email. But no person at Google reads any Gmail messages; only a computer does. In the words of one Google executive: “Worrying about a computer reading your email is like worrying about your dog seeing you naked.”
But now that we have an example of a spy agency seeing people naked—there are a surprising number of sexually explicit images in the newly revealed Yahoo image collection—we can more viscerally understand the difference.
To wit: when you’re watched by a dog, you know that what you’re doing will go no further than the dog. The dog can’t remember the details of what you’ve done. The dog can’t tell anyone else. When you’re watched by a computer, that’s not true. You might be told that the computer isn’t saving a copy of the video, but you have no assurance that that’s true. You might be told that the computer won’t alert a person if it perceives something of interest, but you can’t know if that’s true. You do know that the computer is making decisions based on what it receives, and you have no way of confirming that no human being will access that decision.
When a computer stores your data, there’s always a risk of exposure. There’s the risk of accidental exposure, when some hacker or criminal breaks in and steals the data. There’s the risk of purposeful exposure, when the organization that has your data uses it in some manner. And there’s the risk that another organization will demand access to the data. The FBI can serve a National Security Letter on Google, demanding details on your email and browsing habits. There isn’t a court order in the world that can get that information out of your dog.
Of course, any time we’re judged by algorithms, there’s the potential for false positives. You are already familiar with this; just think of all the irrelevant advertisements you’ve been shown on the Internet, based on some algorithm misinterpreting your interests. In advertising, that’s okay. It’s annoying, but there’s little actual harm, and you were busy reading your email anyway, right? But that harm increases as the accompanying judgments become more important: our credit ratings depend on algorithms; how we’re treated at airport security does, too. And most alarming of all, drone targeting is partly based on algorithmic surveillance.
The primary difference between a computer and a dog is that the computer interacts with other people in the real world, and the dog does not. If someone could isolate the computer in the same way a dog is isolated, we wouldn’t have any reason to worry about algorithms crawling around in our data. But we can’t. Computer algorithms are intimately tied to people. And when we think of computer algorithms surveilling us or analyzing our personal data, we need to think about the people behind those algorithms. Whether or not anyone actually looks at our data, the very fact that they even could is what makes it surveillance.
This is why Yahoo called GCHQ’s webcam-image collection “a whole new level of violation of our users’ privacy.” This is why we’re not mollified by attempts from the UK equivalent of the NSA to apply facial recognition algorithms to the data, or to limit how many people viewed the sexually explicit images. This is why Google’s eavesdropping is different than a dog’s eavesdropping, and why the NSA’s definition of “collect” makes no sense whatsoever.
This essay previously appeared on theguardian.com.