### More MD5 Collisions

Two researchers from the Institute for Cryptology and IT-Security have generated PostScript files with identical MD5-sums but entirely different (but meaningful!) content. (Other MD5 attacks are summarized here.)

Page 19 of 19

Two researchers from the Institute for Cryptology and IT-Security have generated PostScript files with identical MD5-sums but entirely different (but meaningful!) content. (Other MD5 attacks are summarized here.)

Nice timing attack against AES.

For those of you who don’t know, timing attacks are an example of side-channel cryptanalysis: cryptanalysis using additional information about the inner workings of the cryptographic algorithm. I wrote about them here.

What’s the big idea here?

There are two ways to look at a cryptographic primitive: block cipher, digital signature function, whatever. The first is as a chunk of math. The second is a physical (or software) implementation of that math.

Traditionally, cryptanalysis has been directed solely against the math. Differential and linear cryptanalysis are good examples of this: high-powered mathematical tools that can be used to break different block ciphers.

On the other hand, timing attacks, power analysis, and fault analysis all makes assumptions about implementation, and uses additional information garnered from attacking those implementations. Failure analysis assumes a one-bit feedback from the implementation—was the message successfully decrypted—in order to break the underlying cryptographic primitive. Timing attacks assumes that an attacker knows how long a particular encryption operation takes.

Here’s a pair of valid X.509 certificates that have identical signatures. The hash function used is MD5.

And here’s a paper demonstrating a technique for finding MD5 collisions quickly: eight hours on 1.6 GHz computer.

On Tuesday, I blogged about a new cryptanalytic result—the first attack faster than brute-force against SHA-1. I wrote about SHA, and the need to replace it, last September. Aside from the details of the new attack, everything I said then still stands. I’ll quote from that article, adding new material where appropriate.

One-way hash functions are a cryptographic construct used in many applications. They are used in conjunction with public-key algorithms for both encryption and digital signatures. They are used in integrity checking. They are used in authentication. They have all sorts of applications in a great many different protocols. Much more than encryption algorithms, one-way hash functions are the workhorses of modern cryptography.

In 1990, Ron Rivest invented the hash function MD4. In 1992, he improved on MD4 and developed another hash function: MD5. In 1993, the National Security Agency published a hash function very similar to MD5, called SHA (Secure Hash Algorithm). Then, in 1995, citing a newly discovered weakness that it refused to elaborate on, the NSA made a change to SHA. The new algorithm was called SHA-1. Today, the most popular hash function is SHA-1, with MD5 still being used in older applications.

One-way hash functions are supposed to have two properties. One, they’re one way. This means that it is easy to take a message and compute the hash value, but it’s impossible to take a hash value and recreate the original message. (By “impossible” I mean “can’t be done in any reasonable amount of time.”) Two, they’re collision free. This means that it is impossible to find two messages that hash to the same hash value. The cryptographic reasoning behind these two properties is subtle, and I invite curious readers to learn more in my book

Applied Cryptography.Breaking a hash function means showing that either—or both—of those properties are not true.

Earlier this week, three Chinese cryptographers showed that SHA-1 is not collision-free. That is, they developed an algorithm for finding collisions faster than brute force.

SHA-1 produces a 160-bit hash. That is, every message hashes down to a 160-bit number. Given that there are an infinite number of messages that hash to each possible value, there are an infinite number of possible collisions. But because the number of possible hashes is so large, the odds of finding one by chance is negligibly small (one in 2^{80}, to be exact). If you hashed 2^{80} random messages, you’d find one pair that hashed to the same value. That’s the “brute force” way of finding collisions, and it depends solely on the length of the hash value. “Breaking” the hash function means being able to find collisions faster than that. And that’s what the Chinese did.

They can find collisions in SHA-1 in 2^{69} calculations, about 2,000 times faster than brute force. Right now, that is just on the far edge of feasibility with current technology. Two comparable massive computations illustrate that point.

In 1999, a group of cryptographers built a DES cracker. It was able to perform 2^{56} DES operations in 56 hours. The machine cost $250K to build, although duplicates could be made in the $50K-$75K range. Extrapolating that machine using Moore’s Law, a similar machine built today could perform 2^{60} calculations in 56 hours, and 2^{69} calculations in three and a quarter years. Or, a machine that cost $25M-$38M could do 2^{69} calculations in the same 56 hours.

On the software side, the main comparable is a 2^{64} keysearch done by distributed.net that finished in 2002. One article put it this way: “Over the course of the competition, some 331,252 users participated by allowing their unused processor cycles to be used for key discovery. After 1,757 days (4.81 years), a participant in Japan discovered the winning key.” Moore’s Law means that today the calculation would have taken one quarter the time—or have required one quarter the number of computers—so today a 2^{69} computation would take eight times as long, or require eight times the computers.

The magnitude of these results depends on who you are. If you’re a cryptographer, this is a huge deal. While not revolutionary, these results are substantial advances in the field. The techniques described by the researchers are likely to have other applications, and we’ll be better able to design secure systems as a result. This is how the science of cryptography advances: we learn how to design new algorithms by breaking other algorithms. Additionally, algorithms from the NSA are considered a sort of alien technology: they come from a superior race with no explanations. Any successful cryptanalysis against an NSA algorithm is an interesting data point in the eternal question of how good they really are in there.

For the average Internet user, this news is not a cause for panic. No one is going to be breaking digital signatures or reading encrypted messages anytime soon. The electronic world is no less secure after these announcements than it was before.

But there’s an old saying inside the NSA: “Attacks always get better; they never get worse.” Just as this week’s attack builds on other papers describing attacks against simplified versions of SHA-1, SHA-0, MD4, and MD5, other researchers will build on this result. The attack against SHA-1 will continue to improve, as others read about it and develop faster tricks, optimizations, etc. And Moore’s Law will continue to march forward, making even the existing attack faster and more affordable.

Jon Callas, PGP’s CTO, put it best: “It’s time to walk, but not run, to the fire exits. You don’t see smoke, but the fire alarms have gone off.” That’s basically what I said last August.

It’s time for us all to migrate away from SHA-1.

Luckily, there are alternatives. The National Institute of Standards and Technology already has standards for longer—and harder to break—hash functions: SHA-224, SHA-256, SHA-384, and SHA-512. They’re already government standards, and can already be used. This is a good stopgap, but I’d like to see more.

I’d like to see NIST orchestrate a worldwide competition for a new hash function, like they did for the new encryption algorithm, AES, to replace DES. NIST should issue a call for algorithms, and conduct a series of analysis rounds, where the community analyzes the various proposals with the intent of establishing a new standard.

Most of the hash functions we have, and all the ones in widespread use, are based on the general principles of MD4. Clearly we’ve learned a lot about hash functions in the past decade, and I think we can start applying that knowledge to create something even more secure.

Hash functions are the least-well-understood cryptographic primitive, and hashing techniques are much less developed than encryption techniques. Regularly there are surprising cryptographic results in hashing. I have a paper, written with John Kelsey, that describes an algorithm to find second preimages with SHA-1 —a technique that generalizes to almost all other hash functions—in 2^{106} calculations: much less than the 2^{160} calculations for brute force. This attack is completely theoretical and not even remotely practical, but it demonstrates that we still have a lot to learn about hashing.

It is clear from rereading what I wrote last September that I expected this to happen, but not nearly this quickly and not nearly this impressively. The Chinese cryptographers deserve a lot of credit for their work, and we need to get to work replacing SHA.

SHA-1 has been broken. Not a reduced-round version. Not a simplified version. The real thing.

The research team of Xiaoyun Wang, Yiqun Lisa Yin, and Hongbo Yu (mostly from Shandong University in China) have been quietly circulating a paper describing their results:

- collisions in the the full SHA-1 in 2**69 hash operations, much less than the brute-force attack of 2**80 operations based on the hash length.
- collisions in SHA-0 in 2**39 operations.
- collisions in 58-round SHA-1 in 2**33 operations.

This attack builds on previous attacks on SHA-0 and SHA-1, and is a major, major cryptanalytic result. It pretty much puts a bullet into SHA-1 as a hash function for digital signatures (although it doesn’t affect applications such as HMAC where collisions aren’t important).

The paper isn’t generally available yet. At this point I can’t tell if the attack is real, but the paper looks good and this is a reputable research team.

More details when I have them.

Update: See here

One of the most important rules of stream ciphers is to never use the same keystream to encrypt two different documents. If someone does, you can break the encryption by XORing the two ciphertext streams together. The keystream drops out, and you end up with plaintext XORed with plaintext—and you can easily recover the two plaintexts using letter frequency analysis and other basic techniques.

It’s an amateur crypto mistake. The easy way to prevent this attack is to use a unique initialization vector (IV) in addition to the key whenever you encrypt a document.

Microsoft uses the RC4 stream cipher in both Word and Excel. And they make this mistake. Hongjun Wu has details (link is a PDF).

In this report, we point out a serious security flaw in Microsoft Word and Excel. The stream cipher RC4 [9] with key length up to 128 bits is used in Microsoft Word and Excel to protect the documents. But when an encrypted document gets modified and saved, the initialization vector remains the same and thus the same keystream generated from RC4 is applied to encrypt the different versions of that document. The consequence is disastrous since a lot of information of the document could be recovered easily.

This isn’t new. Microsoft made the same mistake in 1999 with RC4 in WinNT Syskey. Five years later, Microsoft has the same flaw in other products.

Sidebar photo of Bruce Schneier by Joe MacInnis.