Bruce Schneier | |||||||||
Schneier on SecurityA blog covering security and security technology. « Security Risks of Dungeons and Dragons | Main | Melbourne Water-Supply Security Risk » March 10, 2005More Hash Function AttacksHere's a pair of valid X.509 certificates that have identical signatures. The hash function used is MD5. And here's a paper demonstrating a technique for finding MD5 collisions quickly: eight hours on 1.6 GHz computer. Posted on March 10, 2005 at 01:19 PM • 37 Comments • View Blog Reactions To receive these entries once a month by e-mail, sign up for the Crypto-Gram Newsletter. regarding MD5... mdcrack (search google) has been a very useful tool for quickly finding positive results... but today rainbow tables fair best. see http://passcracking.com/ for online fun. Israel Torres
Posted by: Israel Torres at March 10, 2005 01:51 PM MD5, SHA-1 and the like are design by twidle. Here's how design by twidle works: 1. Design fast transform. This is why I advocate moving to designs based on the wide-trail philosophy. Because the design procedure is so much clearer. Proving large branch is like the little piggies choosing bricks to build their house with. You stand a much better chance of resisting penetration. This is why I advocate hashes such Whirpool. Simon. Posted by: Simon Johnson at March 10, 2005 02:16 PM So does this mean that md5sum for signing of packages is completely broken? Posted by: Eric at March 10, 2005 03:14 PM So, what does this mean for not very computer savvy users? Posted by: Shannah Mettaq at March 10, 2005 03:59 PM "MD5, SHA-1 and the like are design by twidle." Honestly, Whirlpool is no less twiddle -- although I prefer muddle -- than MD5 or SHA. And the fact that SHA is NSA muddle should count for something. Honestly, we in the cryptography community know very little about hash functions. Posted by: Bruce Schneier at March 10, 2005 05:20 PM As far as both the papers show (the linked paper and the original chinese paper referenced in the linked paper), this seems to be usable to find existing collisions that break the security of the hash algorithm (two seemingly arbitrary plaintexts that generate the same hash), but aren't really practically useful to say subvert package security, or the security of password hashes. The current paper just shows that its doable a lot faster than the chinese researchers' paper showed. Seems to be the same with the x509 certificates...that is, showing that it is possible to generate such a colliding pair of certificates. Perhaps thats why this article is just a brief one... Posted by: bluethought at March 10, 2005 05:22 PM "Seems to be the same with the x509 certificates...that is, showing that it is possible to generate such a colliding pair of certificates." This attack actually could be practical. - Generate a pair of certificates with identical hash, one for Innocent.org, one for Victim.com. Is there a way to get both the hashes (a combination of MD5 and SHA1 is common for certificates) signed and checked at once? Is it done by default when validating a certificate? Posted by: Shad at March 10, 2005 06:04 PM "but aren't really practically useful to say subvert package security" It is practical enough to subvert package security. Let's say you are package builder. You can able to create two packages, lets say good.tgz and evil.tgz, with same md5. Let's say good.tgz goes through some evaluation, and is ulpoaded to some official server with its md5sum, everythink ok. Now you intercept the download of specific vitim - let's say with cracked specific ftp server. And replace evil.tgz. Vitim verifies md5sums, ok,and believes he has the same package as evaluated and happily used by other users. Posted by: jk at March 11, 2005 03:18 AM Responding to Shad/jk, Just from the papers, the way they have constructed the collisions means they really have not too much control over the matching plaintexts (or matching x509 certificates) that collide. It so far doesn't seem to be possible to construct a collision for a given plaintext/x509 cert (and even if possible, the colliding plaintext would need to have very strict restrictions, given the method). Thats my interpretation of the papers and how they generated collisions of course - have you taken a look at them, and is there perhaps something in there that I've missed? Posted by: bluethought at March 11, 2005 04:29 AM Responding to Shad Just realized how that could be a possible vehicle for fraud...if the person generating the certificates is say the security admin of innocent.org, she could hand one cert over to her company for their use, and then use the other for a phishing / pharming attack as you mentioned. innocent.org would not by itself profit from such an attack, of course. Posted by: bluethought at March 11, 2005 04:45 AM Wow, this is pretty worrying -- it's as if the foundations are starting to break down. I remember reading in your book, Brian, about MD5 hashes and the fairly remote possibility of collisions (which had never occurred at the time of writing), and the importance of this. What impact do you think this revelation will have for the future of cryptography? Is it just a matter of educating the masses and developing better hash algorithms? And what about quantum computing? My understanding is, when that comes about the house of cards will really fall. But for all we know the cards might already be in a pile -- I heard that mobile phones were developed and in use by the military decades before the general public even knew about the technology.
Posted by: Daniel at March 11, 2005 05:28 AM "So, what does this mean for not very computer savvy users?" For those who wanted a decent explanation: http://unixwiz.net/techtips/iguide-crypto-hashes.html Posted by: bluethought at March 11, 2005 05:30 AM Followup: The comment preview is not very useful: 2. It suggests that the supplied email address won't be published with the comment. (Could you please remove my address immediately from this page before it is indexed by the first spambot that comes along? Thanks.) Posted by: Daniel at March 11, 2005 05:32 AM AFAIK, MD5 were initially designed to check whether the content of the given, say, file were broken during transmission. I.e. it could help to detect that some bytes of larger source were changed. At the same time, MD5 hashes for two completely different sources could easily be the same. And that's not a point to worry, because if it could be possible to generate some way a completely unique "signature" for every imaginable source, it would be the best compression method ever -- think, it could be possible to replace the arbitrary bunch of bytes with much shorter UNIQUE string. The fact that MD5 hashes for two completely different sources could be the same was heard by me years ago. Posted by: Maverick at March 11, 2005 08:28 AM Maverick - if two files have a different MD5, they're necessarily different. Posted by: Ricardo Barreira at March 11, 2005 08:43 AM Since you can't create an arbitrary evil.tgz with the same signature as good.tgz, this really isn't a threat, IMHO. Being able to create a pair of X.509 certificates with the same MD5 signature is interesting, but again, since you can't create arbitrary data and "make" it have the same MD5 signature, as long as the CN or hostname is part of the data signed, the likelyhood that another certificate having the same bits where the CN/hostname are stored AND being a valid X.509 certificate in file format AND having the same MD5 signature as a legitimate X.509 certificate -- what are the odds? Incredibly low, I'd bet. Not a practical attack. The only real "attack" here is to be able to "quietly" corrupt data by replacing it with data that yields the same MD5 signature, making MD5 less useful for ensuring integrity. But it doesn't allow an attacker to use hand-crafted arbitrary data and just "make" it have the same MD5 signature as some other data. Posted by: Dossy at March 11, 2005 09:08 AM Ricardo Barreira: No, they're not necessarily different. They're PROBABLY different. Posted by: Gustavo at March 11, 2005 09:11 AM No, the other way around. If two files have different MD5s, they are *necessarily* different by at least one bit. If two files have identical MD5s, they are *probably* identical files. Posted by: Ego at March 11, 2005 09:20 AM
Posted by: AC at March 11, 2005 09:31 AM Gustavo - so you are saying that md5's output is not completely determined by it's input? Equal files will always have the same MD5, so different MD5's implies different files. It's simple logic. Posted by: Ricardo Barreira at March 11, 2005 10:14 AM Ricardo, you have fallen into a logical trap that says if MD5(A) = MD5(A), and MD5(X) = MD5(Y), then X=Y. This does not follow. It is true that the same file will always sum to the same MD5. It's just that many other files will also sum to the same MD5. The output of MD5 is 128 bits. Which gives you 2^128 permutations. This is a very large number, but not nearly as large as all the permutations of all possible files of all possible file lengths - so there have to be collisions, by simple maths. You can't generate a unique id for all possible files in 128 bits. The thing is, until now it's been very hard to find specific collisions for specific files. We always knew they existed - we just didn't know how to find them.
Posted by: Matt Palmer at March 11, 2005 10:37 AM Ricardo - sorry, read your comment wrong. I thought you were saying that you couldn't have collisions at all, when in fact you were saying that if the MD5 is different, the files have to be different. It *is* simple logic. Posted by: matt palmer at March 11, 2005 10:46 AM >>And replace evil.tgz. Vitim verifies md5sums, ok,and believes he has the same package as evaluated and happily used by other users. << That seems like quite a stretch that evil.tgz could be sufficiently viable to be both functional not obviously distinguishable. The real world likelihood looks vanishingly slim unless I am misunderstanding something. Posted by: jayh at March 11, 2005 11:31 AM Ricardo: Sorry, I read it the other way around. I guess I understood it the same way Matt Palmer did. Posted by: Gustavo at March 11, 2005 11:57 AM "That seems like quite a stretch that evil.tgz could be sufficiently viable to be both functional not obviously distinguishable." A zip file has a file table at the end of the file that describes the zip's content. This lets you add an arbitrary amount of arbitrary bytes at the start of the file. I.e. just prepend the bytes needed to your evil.tgz so that its md5 hash matches the signature of good.tgz. If you manage to do that without changing the file size too much, nobody will ever notice. Posted by: Anonymous at March 11, 2005 12:13 PM "The comment preview is not very useful" It also erased my name from the form textfield. The above anonymous post was from me. Posted by: Pólya at March 11, 2005 12:15 PM "It is practical enough to subvert package security. Let's say you are package builder. You can able to create two packages, lets say good.tgz and evil.tgz, with same md5. Let's say good.tgz goes through some evaluation, and is ulpoaded to some official server with its md5sum, everythink ok. Now you intercept the download of specific vitim - let's say with cracked specific ftp server. And replace evil.tgz. Vitim verifies md5sums, ok,and believes he has the same package as evaluated and happily used by other users." If the attack can intercept the download, then he probably could intercept the page showing the MD5 hash as well. A more realistic scenario is downloading files from mirror sites. Posted by: Chung Leong at March 11, 2005 04:39 PM "No, the other way around. If two files have different MD5s, they are *necessarily* different by at least one bit. If two files have identical MD5s, they are *probably* identical files." Where "probably" means "almost certainly," at least if the hash function is secure. Posted by: Bruce Schneier at March 12, 2005 11:31 AM Those suggesting that two different files can't have the same md5 hash and be reasonably useful should read this: http://eprint.iacr.org/2004/357 Which discusses, and demonstrates, a system to do just that. Posted by: Michael Silk at March 12, 2005 06:26 PM May I ask why it is that people seem to trust the SHA series when they are based on MD4? No doubt plain MD4 is completely useless at this point given that MD5 which is broken is simply an extension of MD4. Wouldn't it be a better idea to base a new design on MD5 instead? This leads into my next question.. Is it possible to generate two _independant_ hashes for a particular dataset by inserting byte sequences into the data to be hashed? From my reading of the MD5 reference code, for any equally sized pair of collisions, adding any particular byte sequence to the end of both sets will result in another pair of collisions, but adding to the beginning will make them independant. I can't really test this theory since I have neither a pair of collisions nor the computing power to generate them. Perhaps someone else can figure this out. Posted by: eddy at March 13, 2005 01:38 AM I saw a nice thing done with two small blocks of data which produce the same hashvalue. michael (taken from a journal called hackin9 ) Posted by: michael at March 15, 2005 02:40 AM btw. if MD5 and SHA1 are considered broken, Posted by: michael at March 15, 2005 02:50 AM "if MD5 and SHA1 are considered broken, it means we're in trouble, at least 10 years from now. yikes Posted by: mei at March 15, 2005 05:18 AM Ребят а може на русском написать и дать ссылку где можно найти Алгорит позволяющий делать тоже самое Posted by: Дима at March 16, 2005 01:02 AM does the same attack can be done on modular arithmetic secure hash(MASH) Posted by: harry at May 31, 2007 06:34 PM Hi buddys, the fact that you can have two or more md5 or sha1 or whatever algorithm you prefer with the same hash is merely a mathematical issue. It's impossible to have a hashing algorithm with no collissions. For those concerned with altered files, i'd recommend using two or more hashing algorithms at the same time, so, since there are two keys to match, it's a lot harder for the atacker to find out a proper combination. For passwords, you could easily re-hash the result n times, using different hashing algorithms, according to the initial letter of the password. Well, they 're just ideas of my mind. Don't trust me!! hehehe Posted by: Victor at August 1, 2007 05:58 PM Post a comment
Powered by Movable Type 3.2. Photo at top by Steve Woit.
Schneier.com is a personal website. Opinions expressed are not necessarily those of BT Counterpane. |
|
Comments