Algorithmic Bias
Good Q&A with Cynthia Dwork on algorithmic bias.
Page 4 of 5
Good Q&A with Cynthia Dwork on algorithmic bias.
This week, we learned about an attack called “FREAK”—”Factoring Attack on RSA-EXPORT Keys”—that can break the encryption of many websites. Basically, some sites’ implementations of secure sockets layer technology, or SSL, contain both strong encryption algorithms and weak encryption algorithms. Connections are supposed to use the strong algorithms, but in many cases an attacker can force the website to use the weaker encryption algorithms and then decrypt the traffic. From Ars Technica:
In recent days, a scan of more than 14 million websites that support the secure sockets layer or transport layer security protocols found that more than 36 percent of them were vulnerable to the decryption attacks. The exploit takes about seven hours to carry out and costs as little as $100 per site.
This is a general class of attack I call “security rollback” attacks. Basically, the attacker forces the system users to revert to a less secure version of their protocol. Think about the last time you used your credit card. The verification procedure involved the retailer’s computer connecting with the credit card company. What if you snuck around to the back of the building and severed the retailer’s phone lines? Most likely, the retailer would have still accepted your card, but defaulted to making a manual impression of it and maybe looking at your signature. The result: you’ll have a much easier time using a stolen card.
In this case, the security flaw was designed in deliberately. Matthew Green writes:
Back in the early 1990s when SSL was first invented at Netscape Corporation, the United States maintained a rigorous regime of export controls for encryption systems. In order to distribute crypto outside of the U.S., companies were required to deliberately “weaken” the strength of encryption keys. For RSA encryption, this implied a maximum allowed key length of 512 bits.
The 512-bit export grade encryption was a compromise between dumb and dumber. In theory it was designed to ensure that the NSA would have the ability to “access” communications, while allegedly providing crypto that was still “good enough” for commercial use. Or if you prefer modern terms, think of it as the original “golden master key.”
The need to support export-grade ciphers led to some technical challenges. Since U.S. servers needed to support both strong and weak crypto, the SSL designers used a “cipher suite” negotiation mechanism to identify the best cipher both parties could support. In theory this would allow “strong” clients to negotiate “strong” ciphersuites with servers that supported them, while still providing compatibility to the broken foreign clients.
And that’s the problem. The weak algorithms are still there, and can be exploited by attackers.
Fixes are coming. Companies like Apple are quickly rolling out patches. But the vulnerability has been around for over a decade, and almost has certainly used by national intelligence agencies and criminals alike.
This is the generic problem with government-mandated backdoors, key escrow, “golden keys,” or whatever you want to call them. We don’t know how to design a third-party access system that checks for morality; once we build in such access, we then have to ensure that only the good guys can do it. And we can’t. Or, to quote the Economist: “…mathematics applies to just and unjust alike; a flaw that can be exploited by Western governments is vulnerable to anyone who finds it.”
This essay previously appeared on the Lawfare blog.
EDITED TO ADD: Microsoft Windows is vulnerable.
Here’s an IDEA-variant with a 128-bit block length. While I think it’s a great idea to bring IDEA up to a modern block length, the paper has none of the cryptanalysis behind it that IDEA had. If nothing else, I would have expected more than eight rounds. If anyone wants to practice differential and linear cryptanalysis, here’s a new target for you.
This talk (and paper) describe a lattice-based public-key algorithm called Soliloquy developed by GCHQ, and a quantum-computer attack on it.
News article.
Nice article on some of the security assumptions we rely on in cryptographic algorithms.
Interesting paper: M. Bellare, K. Paterson, and P. Rogaway, “Security of Symmetric Encryption against Mass Surveillance.”
Abstract: Motivated by revelations concerning population-wide surveillance of encrypted communications, we formalize and investigate the resistance of symmetric encryption schemes to mass surveillance. The focus is on algorithm-substitution attacks (ASAs), where a subverted encryption algorithm replaces the real one. We assume that the goal of “big-brother” is undetectable subversion, meaning that ciphertexts produced by the subverted encryption algorithm should reveal plaintexts to big-brother yet be indistinguishable to users from those produced by the real encryption scheme. We formalize security notions to capture this goal and then offer both attacks and defenses. In the first category we show that successful (from the point of view of big brother) ASAs may be mounted on a large class of common symmetric encryption schemes. In the second category we show how to design symmetric encryption schemes that avoid such attacks and meet our notion of security. The lesson that emerges is the danger of choice: randomized, stateless schemes are subject to attack while deterministic, stateful ones are not.
Handycipher is a new pencil-and-paper symmetric encryption algorithm. I’d bet a gazillion dollars that it’s not secure, although I haven’t done the cryptanalysis myself.
Increasingly, we are watched not by people but by algorithms. Amazon and Netflix track the books we buy and the movies we stream, and suggest other books and movies based on our habits. Google and Facebook watch what we do and what we say, and show us advertisements based on our behavior. Google even modifies our web search results based on our previous behavior. Smartphone navigation apps watch us as we drive, and update suggested route information based on traffic congestion. And the National Security Agency, of course, monitors our phone calls, emails and locations, then uses that information to try to identify terrorists.
Documents provided by Edward Snowden and revealed by the Guardian today show that the UK spy agency GHCQ, with help from the NSA, has been collecting millions of webcam images from innocent Yahoo users. And that speaks to a key distinction in the age of algorithmic surveillance: is it really okay for a computer to monitor you online, and for that data collection and analysis only to count as a potential privacy invasion when a person sees it? I say it’s not, and the latest Snowden leaks only make more clear how important this distinction is.
The robots-vs-spies divide is especially important as we decide what to do about NSA and GCHQ surveillance. The spy community and the Justice Department have reported back early on President Obama’s request for changing how the NSA “collects” your data, but the potential reforms—FBI monitoring, holding on to your phone records and more—still largely depend on what the meaning of “collects” is.
Indeed, ever since Snowden provided reporters with a trove of top secret documents, we’ve been subjected to all sorts of NSA word games. And the word “collect” has a very special definition, according to the Department of Defense (DoD). A 1982 procedures manual (pdf; page 15) says: “information shall be considered as ‘collected’ only when it has been received for use by an employee of a DoD intelligence component in the course of his official duties.” And “data acquired by electronic means is ‘collected’ only when it has been processed into intelligible form.”
Director of National Intelligence James Clapper likened the NSA’s accumulation of data to a library. All those books are stored on the shelves, but very few are actually read. “So the task for us in the interest of preserving security and preserving civil liberties and privacy,” says Clapper, “is to be as precise as we possibly can be when we go in that library and look for the books that we need to open up and actually read.” Only when an individual book is read does it count as “collection,” in government parlance.
So, think of that friend of yours who has thousands of books in his house. According to the NSA, he’s not actually “collecting” books. He’s doing something else with them, and the only books he can claim to have “collected” are the ones he’s actually read.
This is why Clapper claims—to this day—that he didn’t lie in a Senate hearing when he replied “no” to this question: “Does the NSA collect any type of data at all on millions or hundreds of millions of Americans?”
If the NSA collects—I’m using the everyday definition of the word here—all of the contents of everyone’s e-mail, it doesn’t count it as being collected in NSA terms until someone reads it. And if it collects—I’m sorry, but that’s really the correct word—everyone’s phone records or location information and stores it in an enormous database, that doesn’t count as being collected—NSA definition—until someone looks at it. If the agency uses computers to search those emails for keywords, or correlates that location information for relationships between people, it doesn’t count as collection, either. Only when those computers spit out a particular person has the data—in NSA terms—actually been collected.
If the modern spy dictionary has you confused, maybe dogs can help us understand why this legal workaround, by big tech companies and the government alike, is still a serious invasion of privacy.
Back when Gmail was introduced, this was Google’s defense, too, about its context-sensitive advertising. Google’s computers examine each individual email and insert an advertisement nearby, related to the contents of your email. But no person at Google reads any Gmail messages; only a computer does. In the words of one Google executive: “Worrying about a computer reading your email is like worrying about your dog seeing you naked.”
But now that we have an example of a spy agency seeing people naked—there are a surprising number of sexually explicit images in the newly revealed Yahoo image collection—we can more viscerally understand the difference.
To wit: when you’re watched by a dog, you know that what you’re doing will go no further than the dog. The dog can’t remember the details of what you’ve done. The dog can’t tell anyone else. When you’re watched by a computer, that’s not true. You might be told that the computer isn’t saving a copy of the video, but you have no assurance that that’s true. You might be told that the computer won’t alert a person if it perceives something of interest, but you can’t know if that’s true. You do know that the computer is making decisions based on what it receives, and you have no way of confirming that no human being will access that decision.
When a computer stores your data, there’s always a risk of exposure. There’s the risk of accidental exposure, when some hacker or criminal breaks in and steals the data. There’s the risk of purposeful exposure, when the organization that has your data uses it in some manner. And there’s the risk that another organization will demand access to the data. The FBI can serve a National Security Letter on Google, demanding details on your email and browsing habits. There isn’t a court order in the world that can get that information out of your dog.
Of course, any time we’re judged by algorithms, there’s the potential for false positives. You are already familiar with this; just think of all the irrelevant advertisements you’ve been shown on the Internet, based on some algorithm misinterpreting your interests. In advertising, that’s okay. It’s annoying, but there’s little actual harm, and you were busy reading your email anyway, right? But that harm increases as the accompanying judgments become more important: our credit ratings depend on algorithms; how we’re treated at airport security does, too. And most alarming of all, drone targeting is partly based on algorithmic surveillance.
The primary difference between a computer and a dog is that the computer interacts with other people in the real world, and the dog does not. If someone could isolate the computer in the same way a dog is isolated, we wouldn’t have any reason to worry about algorithms crawling around in our data. But we can’t. Computer algorithms are intimately tied to people. And when we think of computer algorithms surveilling us or analyzing our personal data, we need to think about the people behind those algorithms. Whether or not anyone actually looks at our data, the very fact that they even could is what makes it surveillance.
This is why Yahoo called GCHQ’s webcam-image collection “a whole new level of violation of our users’ privacy.” This is why we’re not mollified by attempts from the UK equivalent of the NSA to apply facial recognition algorithms to the data, or to limit how many people viewed the sexually explicit images. This is why Google’s eavesdropping is different than a dog’s eavesdropping, and why the NSA’s definition of “collect” makes no sense whatsoever.
This essay previously appeared on theguardian.com.
I think this is a good move on Microsoft’s part:
Microsoft is recommending that customers and CA’s stop using SHA-1 for cryptographic applications, including use in SSL/TLS and code signing. Microsoft Security Advisory 2880823 has been released along with the policy announcement that Microsoft will stop recognizing the validity of SHA-1 based certificates after 2016.
SHA-1 isn’t broken yet in a practical sense, but the algorithm is barely hanging on and attacks will only get worse. Migrating away from SHA-1 is the smart thing to do.
New paper: “Physical-Layer Cryptography Through Massive MIMO.”
Abstract: We propose the new technique of physical-layer cryptography based on using a massive MIMO channel as a key between the sender and desired receiver, which need not be secret. The goal is for low-complexity encoding and decoding by the desired transmitter-receiver pair, whereas decoding by an eavesdropper is hard in terms of prohibitive complexity. The decoding complexity is analyzed by mapping the massive MIMO system to a lattice. We show that the eavesdropper’s decoder for the MIMO system with M-PAM modulation is equivalent to solving standard lattice problems that are conjectured to be of exponential complexity for both classical and quantum computers. Hence, under the widely-held conjecture that standard lattice problems are hard to solve in the worst-case, the proposed encryption scheme has a more robust notion of security than that of the most common encryption methods used today such as RSA and Diffie-Hellman. Additionally, we show that this scheme could be used to securely communicate without a pre-shared secret and little computational overhead. Thus, the massive MIMO system provides for low-complexity encryption commensurate with the most sophisticated forms of application-layer encryption by exploiting the physical layer properties of the radio channel.
MIMO stands for “multiple-input multiple-output.” I had to look that up.
In general, I’m not optimistic about the security of these sorts of systems. Whenever non-cryptographers come up with cryptographic algorithms based on some novel problem that’s hard in their area of research, invariably there are pretty easy cryptographic attacks.
So consider this a good research exercise for all budding cryptanalysts out there.
Sidebar photo of Bruce Schneier by Joe MacInnis.