Entries Tagged "algorithms"
Page 4 of 4
Increasingly, we are watched not by people but by algorithms. Amazon and Netflix track the books we buy and the movies we stream, and suggest other books and movies based on our habits. Google and Facebook watch what we do and what we say, and show us advertisements based on our behavior. Google even modifies our web search results based on our previous behavior. Smartphone navigation apps watch us as we drive, and update suggested route information based on traffic congestion. And the National Security Agency, of course, monitors our phone calls, emails and locations, then uses that information to try to identify terrorists.
Documents provided by Edward Snowden and revealed by the Guardian today show that the UK spy agency GHCQ, with help from the NSA, has been collecting millions of webcam images from innocent Yahoo users. And that speaks to a key distinction in the age of algorithmic surveillance: is it really okay for a computer to monitor you online, and for that data collection and analysis only to count as a potential privacy invasion when a person sees it? I say it’s not, and the latest Snowden leaks only make more clear how important this distinction is.
The robots-vs-spies divide is especially important as we decide what to do about NSA and GCHQ surveillance. The spy community and the Justice Department have reported back early on President Obama’s request for changing how the NSA “collects” your data, but the potential reforms — FBI monitoring, holding on to your phone records and more — still largely depend on what the meaning of “collects” is.
Indeed, ever since Snowden provided reporters with a trove of top secret documents, we’ve been subjected to all sorts of NSA word games. And the word “collect” has a very special definition, according to the Department of Defense (DoD). A 1982 procedures manual (pdf; page 15) says: “information shall be considered as ‘collected’ only when it has been received for use by an employee of a DoD intelligence component in the course of his official duties.” And “data acquired by electronic means is ‘collected’ only when it has been processed into intelligible form.”
Director of National Intelligence James Clapper likened the NSA’s accumulation of data to a library. All those books are stored on the shelves, but very few are actually read. “So the task for us in the interest of preserving security and preserving civil liberties and privacy,” says Clapper, “is to be as precise as we possibly can be when we go in that library and look for the books that we need to open up and actually read.” Only when an individual book is read does it count as “collection,” in government parlance.
So, think of that friend of yours who has thousands of books in his house. According to the NSA, he’s not actually “collecting” books. He’s doing something else with them, and the only books he can claim to have “collected” are the ones he’s actually read.
This is why Clapper claims — to this day — that he didn’t lie in a Senate hearing when he replied “no” to this question: “Does the NSA collect any type of data at all on millions or hundreds of millions of Americans?”
If the NSA collects — I’m using the everyday definition of the word here — all of the contents of everyone’s e-mail, it doesn’t count it as being collected in NSA terms until someone reads it. And if it collects — I’m sorry, but that’s really the correct word — everyone’s phone records or location information and stores it in an enormous database, that doesn’t count as being collected — NSA definition — until someone looks at it. If the agency uses computers to search those emails for keywords, or correlates that location information for relationships between people, it doesn’t count as collection, either. Only when those computers spit out a particular person has the data — in NSA terms — actually been collected.
If the modern spy dictionary has you confused, maybe dogs can help us understand why this legal workaround, by big tech companies and the government alike, is still a serious invasion of privacy.
Back when Gmail was introduced, this was Google’s defense, too, about its context-sensitive advertising. Google’s computers examine each individual email and insert an advertisement nearby, related to the contents of your email. But no person at Google reads any Gmail messages; only a computer does. In the words of one Google executive: “Worrying about a computer reading your email is like worrying about your dog seeing you naked.”
But now that we have an example of a spy agency seeing people naked — there are a surprising number of sexually explicit images in the newly revealed Yahoo image collection — we can more viscerally understand the difference.
To wit: when you’re watched by a dog, you know that what you’re doing will go no further than the dog. The dog can’t remember the details of what you’ve done. The dog can’t tell anyone else. When you’re watched by a computer, that’s not true. You might be told that the computer isn’t saving a copy of the video, but you have no assurance that that’s true. You might be told that the computer won’t alert a person if it perceives something of interest, but you can’t know if that’s true. You do know that the computer is making decisions based on what it receives, and you have no way of confirming that no human being will access that decision.
When a computer stores your data, there’s always a risk of exposure. There’s the risk of accidental exposure, when some hacker or criminal breaks in and steals the data. There’s the risk of purposeful exposure, when the organization that has your data uses it in some manner. And there’s the risk that another organization will demand access to the data. The FBI can serve a National Security Letter on Google, demanding details on your email and browsing habits. There isn’t a court order in the world that can get that information out of your dog.
Of course, any time we’re judged by algorithms, there’s the potential for false positives. You are already familiar with this; just think of all the irrelevant advertisements you’ve been shown on the Internet, based on some algorithm misinterpreting your interests. In advertising, that’s okay. It’s annoying, but there’s little actual harm, and you were busy reading your email anyway, right? But that harm increases as the accompanying judgments become more important: our credit ratings depend on algorithms; how we’re treated at airport security does, too. And most alarming of all, drone targeting is partly based on algorithmic surveillance.
The primary difference between a computer and a dog is that the computer interacts with other people in the real world, and the dog does not. If someone could isolate the computer in the same way a dog is isolated, we wouldn’t have any reason to worry about algorithms crawling around in our data. But we can’t. Computer algorithms are intimately tied to people. And when we think of computer algorithms surveilling us or analyzing our personal data, we need to think about the people behind those algorithms. Whether or not anyone actually looks at our data, the very fact that they even could is what makes it surveillance.
This is why Yahoo called GCHQ’s webcam-image collection “a whole new level of violation of our users’ privacy.” This is why we’re not mollified by attempts from the UK equivalent of the NSA to apply facial recognition algorithms to the data, or to limit how many people viewed the sexually explicit images. This is why Google’s eavesdropping is different than a dog’s eavesdropping, and why the NSA’s definition of “collect” makes no sense whatsoever.
This essay previously appeared on theguardian.com.
I think this is a good move on Microsoft’s part:
Microsoft is recommending that customers and CA’s stop using SHA-1 for cryptographic applications, including use in SSL/TLS and code signing. Microsoft Security Advisory 2880823 has been released along with the policy announcement that Microsoft will stop recognizing the validity of SHA-1 based certificates after 2016.
New paper: “Physical-Layer Cryptography Through Massive MIMO.”
Abstract: We propose the new technique of physical-layer cryptography based on using a massive MIMO channel as a key between the sender and desired receiver, which need not be secret. The goal is for low-complexity encoding and decoding by the desired transmitter-receiver pair, whereas decoding by an eavesdropper is hard in terms of prohibitive complexity. The decoding complexity is analyzed by mapping the massive MIMO system to a lattice. We show that the eavesdropper’s decoder for the MIMO system with M-PAM modulation is equivalent to solving standard lattice problems that are conjectured to be of exponential complexity for both classical and quantum computers. Hence, under the widely-held conjecture that standard lattice problems are hard to solve in the worst-case, the proposed encryption scheme has a more robust notion of security than that of the most common encryption methods used today such as RSA and Diffie-Hellman. Additionally, we show that this scheme could be used to securely communicate without a pre-shared secret and little computational overhead. Thus, the massive MIMO system provides for low-complexity encryption commensurate with the most sophisticated forms of application-layer encryption by exploiting the physical layer properties of the radio channel.
MIMO stands for “multiple-input multiple-output.” I had to look that up.
In general, I’m not optimistic about the security of these sorts of systems. Whenever non-cryptographers come up with cryptographic algorithms based on some novel problem that’s hard in their area of research, invariably there are pretty easy cryptographic attacks.
So consider this a good research exercise for all budding cryptanalysts out there.
The NSA has published some new symmetric algorithms:
Abstract: In this paper we propose two families of block ciphers, SIMON and SPECK, each of which comes in a variety of widths and key sizes. While many lightweight block ciphers exist, most were designed to perform well on a single platform and were not meant to provide high performance across a range of devices. The aim of SIMON and SPECK is to fill the need for secure, flexible, and analyzable lightweight block ciphers. Each offers excellent performance on hardware and software platforms, is flexible enough to admit a variety of implementations on a given platform, and is amenable to analysis using existing techniques. Both perform exceptionally well across the full spectrum of lightweight applications, but SIMON is tuned for optimal performance in hardware, and SPECK for optimal performance in software.
It’s always fascinating to study NSA-designed ciphers. I was particularly interested in the algorithms’ similarity to Threefish, and how they improved on what we did. I was most impressed with their key schedule. I am always impressed with how the NSA does key schedules. And I enjoyed the discussion of requirements. Missing, of course, is any cryptanalytic analysis.
I don’t know anything about the context of this paper. Why was the work done, and why is it being made public? I’m curious.
Sidebar photo of Bruce Schneier by Joe MacInnis.