Schneier on Security
A blog covering security and security technology.
« Indian Call Center Sells Personal Information |
| The Adaptability of Iraqi Insurgents »
June 24, 2005
SHA Cryptanalysis Paper Online
In February, I wrote about a group of Chinese researchers who broke the SHA-1 hash function. That posting was based on short notice from the researchers. Since then, many people have written me asking about the research and the actual paper, some questioning the validity of the research because of the lack of documentation.
The paper did exist; I saw a copy. They will present it at the Crypto conference in August. I believe they didn't post it because Crypto requires that submitted papers not be previously published, and they misunderstood that to mean that it couldn't be widely distributed in any way.
Now there's a copy of the paper on the web. You can read "Finding Collisions in the Full SHA-1," by Xiaoyun Wang, Yiqun Lisa Yin, and Hongbo Yu, here.
Posted on June 24, 2005 at 12:46 PM
• 18 Comments
To receive these entries once a month by e-mail, sign up for the Crypto-Gram Newsletter.
What are the implications of this attack?
I poked at the paper, and I really don't understand the math behind it. But from what I gathered in the introduction, it seems like creating a collision for a SHA-1 hash still requires a large amount of computational power.
Is this something we ("we" meaning normal sysadmin-like people ;) ) should be concerned about, practically, if we use SHA1 for password verification or something similar?
It has been demonstrated that this type of hash collision can be used to construct (for example) two postscript files with the same hash. When displayed, they appear to be completely different documents. It is, right now, trivial to construct additional pairs of documents like this using MD5, but maybe not yet with SHA-1.
There is now an extra question mark in my head when someone says a document is "protected by digital signatures." I must understand the new threat, and consider the structure of the document that is being signed.
Josh: I'm not an expert but I don't think that your particular example is DIRECTLY affected by this attack. The attack they discuss in the paper is a collision attack - finding two pieces of data with the same hash value. In order to attack password verification schemes a pre-image attack would be needed. But of course, it might be discovered (or perhaps obvious, though not for me) that some of the weaknesses found in SHA-1 may also be used for pre-image attacks...
My previous message was to Josh Berrry, not Josh Rubin :)
One thing to think about when pondering the impact this will have on the sys admins and the likes of the world is that while its hard but not impossible to cause collisions in md5, and in sha-1, creating a document that collides with both is not exactly trivial. I.e. all technologies making use of hashes should be verifying both the md5 sum and the sha-1 sum.
Additionally, what we should be surprised about is the ease with which a collision could be done, rather than a collision can happen. To put the point simply, if we could represent every size/type of data in 128 bits, there would be no need for 129 bits. This not being true we can guarantee that there will always be collisions in hash alogorithms.
The mentioned "postscript" attack uses the fact that PS is a full procedural language. It exploits this to make it possible to have two arbitrary documents that both have the same md5 hash, but when "printed" (actually, when "executed") they display different texts. The actual hash collisions that are needed can be two random arbitrary binary strings, and need NOT be actual messages. This is because again the attack requires the document to be a procedural language, it would NOT work for a plain text file.
Any comments on the "Postscript attack" that was made public about a week ago?
"its hard but not impossible to cause collisions in md5, and in sha-1, creating a document that collides with both is not exactly trivial. I.e. all technologies making use of hashes should be verifying both the md5 sum and the sha-1 sum."
There is a 2^69 attack on sha1, and a 2^39(or thereabouts) attack on md5. What do you suppose the complexity of an attack on md5+sha1(a 288-bit hash) is? It seems like quite a few people are recommending this hash function these days.
Anonymous: Point those people to the article by Antoine Joux "Multicollisions in Iterated Hash Functions. Application to Cascaded Constructions".
I think this is the correct article but I'm not sure. If it is, it proves that using multiple hash functions does not give the security that would be expected naively. So the people who are recommending md5+sha1 probably aren't experts, or if they are, they should read this paper...
>There is now an extra question mark in >my head when someone says a >document is "protected by digital >signatures." I must understand the new >threat, and consider the structure of the >document that is being signed.
Talking about digital signature, you should definitively consider two different issues :
- the cryptographic soundness of the signing algorithm
- the trust in the signing/viewing device
using a kind of programming language like postcript or pdf to carry sensitive information may be problematic (as shown in the PS attack) : the display is the result of a rather complex computation process, so theorically even without crypto breaking someone can think to program a malicious viewer/sign tools breaking the what you see is what you sign/has been sign paradigm.
It is becoming even worst when the format of a document is *proprietary*, and potentially storing non displayable information (like the history of the document ...).
Fundamentalist should probably argue that you should sign only bitmap and pure ascii/unicode text. XML, with a detailed schema, is also a good candidate as a document format.
I have thought in a way to reinforce hash functions. The problem as I see it lies in the fact that hashes function are taking values from an infinite set into a finite set, so when the messages get suficient large the probability of finding a colision get's higher and higher.
To solve this I sugest a variable length hash function. I detail the idea in the home page above (my slashdot journal), so if anyone have any thoughts I would like to hear about it.
"Any comments on the 'Postscript attack' that was made public about a week ago?"
I'm writing something up now.
So what hash function should we be moving to?
"So what hash function should we be moving to?"
I'm getting a bit off-topic here but:
Having looked at the postscript attack, it seems to rely on some sort of limit to the block-size that MD5 works on - i.e. having two blocks with the same checksum, they were able to embed them into a larger document so that the larger document had the same checksum, whichever block it contained. Do other cryptographic checksums have this weakness? Naively, it looks like it should be easy to avoid this.
It would be more accurate to say that the "blind passenger attack"--the postscript demonstration of one practical application of collision finding--relies on the fact that having found a pair of colliding blocks, that can then be extended indefinitely by adding more material to the blocks (so long as the additional material is the same for both documents). Formally,
if H(A) = H(B) for A=/=B, then
H(A | S) = H(B | S) for any S.
Because of MD strengthening, it is also necessary that length(A) = length(B), but otherwise it wouldn't matter if the hash was block oriented, variable length block oriented, or stream oriented.
Is this feature easy to avoid? Not really. It's certainly possible, but any given construction that does so needs to be proven to actually solve the problem, and to work efficiently.
To take an example, suppose that as we hash a message we accumulate a blockwise-XOR sum, and then require that this block be appended to the end of the message, before the MD strengthening. That would completely screw up at least this version of the "blind passenger attack", because after the main hashing phase we effectively add in ~every~ block again, thus we now require our colliding blocks to collide ~again~ after an essentially arbitrary modification. Right? Well, maybe not. If the attacker can also hide junk in his last block (perhaps by making the last "sensible" part of S mean something like "end of real data, stop parsing now"), then the attacker only has to find a second collision, in the last block of S, which instead of being a collision for equality, needs some other given XOR difference.
I haven't studied the paper enough yet to be sure, but since this is basically a differential attack I for one wouldn't be willing to bet that finding a collision with a given XOR diff (other than zero) is any harder than one with a 0 XOR diff. So this construction, which avoids the property you mentioned above, may only be twice as expensive to attack as the basic hash was.
The proper answer here is not to prop up rotten timber but to migrate to a better hash. In the case of MD5, stop using it as quickly as you can manage, and be aware that you are already at risk. In the case of SHA-1, the attack is still too expensive except for very powerful opponents, so if you have low value data and significant barriers to migration, I wouldn't panic just yet--but definitely start planning on migration soon, and don't implement any new systems with SHA-1. As the saying goes, "attacks only get better, they don't get worse".
Thanks for the lengthy reply. The block-wise XOR is much along the lines I was thinking of.
In terms of "migrating to a better hash" - what I was thinking was that one that does not have the "H(A | S) = H(B | S) for any S" property would be a "better hash". For example, you could do something like a Fast Fourier Transform (an O(n log n) operation) to mix up your data on all scales before applying the hash. It is important that this transformation be reversable (i.e. a 1:1 function) so that collisions within it are impossible.
Schneier.com is a personal website. Opinions expressed are not necessarily those of BT.