More Hash Function Attacks

Here's a pair of valid X.509 certificates that have identical signatures. The hash function used is MD5.

And here's a paper demonstrating a technique for finding MD5 collisions quickly: eight hours on 1.6 GHz computer.

Posted on March 10, 2005 at 1:19 PM • 37 Comments

Comments

Israel TorresMarch 10, 2005 1:51 PM

regarding MD5...

mdcrack (search google) has been a very useful tool for quickly finding positive results... but today rainbow tables fair best.

see http://passcracking.com/ for online fun.

Israel Torres


Simon JohnsonMarch 10, 2005 2:16 PM

MD5, SHA-1 and the like are design by twidle.

Here's how design by twidle works:

1. Design fast transform.
2. Add twidles until you can't think of a way to break it.
3. Deploy cipher.

This is why I advocate moving to designs based on the wide-trail philosophy. Because the design procedure is so much clearer. Proving large branch is like the little piggies choosing bricks to build their house with. You stand a much better chance of resisting penetration.

This is why I advocate hashes such Whirpool.

Simon.

Bruce SchneierMarch 10, 2005 5:20 PM

"MD5, SHA-1 and the like are design by twidle."

Honestly, Whirlpool is no less twiddle -- although I prefer muddle -- than MD5 or SHA. And the fact that SHA is NSA muddle should count for something.

Honestly, we in the cryptography community know very little about hash functions.

bluethoughtMarch 10, 2005 5:22 PM

As far as both the papers show (the linked paper and the original chinese paper referenced in the linked paper), this seems to be usable to find existing collisions that break the security of the hash algorithm (two seemingly arbitrary plaintexts that generate the same hash), but aren't really practically useful to say subvert package security, or the security of password hashes. The current paper just shows that its doable a lot faster than the chinese researchers' paper showed.

Seems to be the same with the x509 certificates...that is, showing that it is possible to generate such a colliding pair of certificates.

Perhaps thats why this article is just a brief one...

ShadMarch 10, 2005 6:04 PM

"Seems to be the same with the x509 certificates...that is, showing that it is possible to generate such a colliding pair of certificates."

This attack actually could be practical.

- Generate a pair of certificates with identical hash, one for Innocent.org, one for Victim.com.
- Get the Innocent.org certificate signed by a well-known CA.
- Move the signature to the Victim.com certificate.
- Set up a fake secure server. Attack the DNS resolver of a major ISP, poison DNS caches, or hijack the domain name by any other way.
- Watch the money rolling in.

Is there a way to get both the hashes (a combination of MD5 and SHA1 is common for certificates) signed and checked at once? Is it done by default when validating a certificate?

jkMarch 11, 2005 3:18 AM

"but aren't really practically useful to say subvert package security"

It is practical enough to subvert package security. Let's say you are package builder. You can able to create two packages, lets say good.tgz and evil.tgz, with same md5. Let's say good.tgz goes through some evaluation, and is ulpoaded to some official server with its md5sum, everythink ok. Now you intercept the download of specific vitim - let's say with cracked specific ftp server. And replace evil.tgz. Vitim verifies md5sums, ok,and believes he has the same package as evaluated and happily used by other users.

bluethoughtMarch 11, 2005 4:29 AM

Responding to Shad/jk,

Just from the papers, the way they have constructed the collisions means they really have not too much control over the matching plaintexts (or matching x509 certificates) that collide.

It so far doesn't seem to be possible to construct a collision for a given plaintext/x509 cert (and even if possible, the colliding plaintext would need to have very strict restrictions, given the method).

Thats my interpretation of the papers and how they generated collisions of course - have you taken a look at them, and is there perhaps something in there that I've missed?

bluethoughtMarch 11, 2005 4:45 AM

Responding to Shad
" Generate a pair of certificates with identical hash, one for Innocent.org, one for Victim.com."

Just realized how that could be a possible vehicle for fraud...if the person generating the certificates is say the security admin of innocent.org, she could hand one cert over to her company for their use, and then use the other for a phishing / pharming attack as you mentioned.

innocent.org would not by itself profit from such an attack, of course.

DanielMarch 11, 2005 5:28 AM

Wow, this is pretty worrying -- it's as if the foundations are starting to break down.

I remember reading in your book, Brian, about MD5 hashes and the fairly remote possibility of collisions (which had never occurred at the time of writing), and the importance of this.

What impact do you think this revelation will have for the future of cryptography? Is it just a matter of educating the masses and developing better hash algorithms?

And what about quantum computing? My understanding is, when that comes about the house of cards will really fall.

But for all we know the cards might already be in a pile -- I heard that mobile phones were developed and in use by the military decades before the general public even knew about the technology.


(P.S. Your comments system doesn't preserve formatting, yet strips HTML -- hence the lack of paragraphs in my post.)

DanielMarch 11, 2005 5:32 AM

Followup:

The comment preview is not very useful:
1. It strips whitespace from your text, but it is preserved in the actual post (hence the confusing comment at the end of my previous post)

2. It suggests that the supplied email address won't be published with the comment. (Could you please remove my address immediately from this page before it is indexed by the first spambot that comes along? Thanks.)

MaverickMarch 11, 2005 8:28 AM

AFAIK, MD5 were initially designed to check whether the content of the given, say, file were broken during transmission. I.e. it could help to detect that some bytes of larger source were changed.

At the same time, MD5 hashes for two completely different sources could easily be the same. And that's not a point to worry, because if it could be possible to generate some way a completely unique "signature" for every imaginable source, it would be the best compression method ever -- think, it could be possible to replace the arbitrary bunch of bytes with much shorter UNIQUE string.

The fact that MD5 hashes for two completely different sources could be the same was heard by me years ago.
The only thing to remember: MD5 assures the file was not altered; it doesn't assure two files are the same if their MD5 are the same, or two files are different if their MD5 are different.

Ricardo BarreiraMarch 11, 2005 8:43 AM

Maverick - if two files have a different MD5, they're necessarily different.

DossyMarch 11, 2005 9:08 AM

Since you can't create an arbitrary evil.tgz with the same signature as good.tgz, this really isn't a threat, IMHO.

Being able to create a pair of X.509 certificates with the same MD5 signature is interesting, but again, since you can't create arbitrary data and "make" it have the same MD5 signature, as long as the CN or hostname is part of the data signed, the likelyhood that another certificate having the same bits where the CN/hostname are stored AND being a valid X.509 certificate in file format AND having the same MD5 signature as a legitimate X.509 certificate -- what are the odds? Incredibly low, I'd bet. Not a practical attack.

The only real "attack" here is to be able to "quietly" corrupt data by replacing it with data that yields the same MD5 signature, making MD5 less useful for ensuring integrity. But it doesn't allow an attacker to use hand-crafted arbitrary data and just "make" it have the same MD5 signature as some other data.

GustavoMarch 11, 2005 9:11 AM

Ricardo Barreira:

No, they're not necessarily different. They're PROBABLY different.

EgoMarch 11, 2005 9:20 AM

No, the other way around. If two files have different MD5s, they are *necessarily* different by at least one bit. If two files have identical MD5s, they are *probably* identical files.

Ricardo BarreiraMarch 11, 2005 10:14 AM

Gustavo - so you are saying that md5's output is not completely determined by it's input? Equal files will always have the same MD5, so different MD5's implies different files. It's simple logic.

Matt PalmerMarch 11, 2005 10:37 AM

Ricardo, you have fallen into a logical trap that says if MD5(A) = MD5(A), and MD5(X) = MD5(Y), then X=Y. This does not follow.

It is true that the same file will always sum to the same MD5. It's just that many other files will also sum to the same MD5.

The output of MD5 is 128 bits. Which gives you 2^128 permutations. This is a very large number, but not nearly as large as all the permutations of all possible files of all possible file lengths - so there have to be collisions, by simple maths. You can't generate a unique id for all possible files in 128 bits.

The thing is, until now it's been very hard to find specific collisions for specific files. We always knew they existed - we just didn't know how to find them.


matt palmerMarch 11, 2005 10:46 AM

Ricardo - sorry, read your comment wrong. I thought you were saying that you couldn't have collisions at all, when in fact you were saying that if the MD5 is different, the files have to be different. It *is* simple logic.

jayhMarch 11, 2005 11:31 AM

>>And replace evil.tgz. Vitim verifies md5sums, ok,and believes he has the same package as evaluated and happily used by other users.

That seems like quite a stretch that evil.tgz could be sufficiently viable to be both functional not obviously distinguishable. The real world likelihood looks vanishingly slim unless I am misunderstanding something.

GustavoMarch 11, 2005 11:57 AM

Ricardo:

Sorry, I read it the other way around. I guess I understood it the same way Matt Palmer did.

AnonymousMarch 11, 2005 12:13 PM

"That seems like quite a stretch that evil.tgz could be sufficiently viable to be both functional not obviously distinguishable."

A zip file has a file table at the end of the file that describes the zip's content. This lets you add an arbitrary amount of arbitrary bytes at the start of the file. I.e. just prepend the bytes needed to your evil.tgz so that its md5 hash matches the signature of good.tgz. If you manage to do that without changing the file size too much, nobody will ever notice.

PĆ³lyaMarch 11, 2005 12:15 PM

"The comment preview is not very useful"

It also erased my name from the form textfield.

The above anonymous post was from me.

Chung LeongMarch 11, 2005 4:39 PM

"It is practical enough to subvert package security. Let's say you are package builder. You can able to create two packages, lets say good.tgz and evil.tgz, with same md5. Let's say good.tgz goes through some evaluation, and is ulpoaded to some official server with its md5sum, everythink ok. Now you intercept the download of specific vitim - let's say with cracked specific ftp server. And replace evil.tgz. Vitim verifies md5sums, ok,and believes he has the same package as evaluated and happily used by other users."

If the attack can intercept the download, then he probably could intercept the page showing the MD5 hash as well.

A more realistic scenario is downloading files from mirror sites.

Bruce SchneierMarch 12, 2005 11:31 AM

"No, the other way around. If two files have different MD5s, they are *necessarily* different by at least one bit. If two files have identical MD5s, they are *probably* identical files."

Where "probably" means "almost certainly," at least if the hash function is secure.

eddyMarch 13, 2005 1:38 AM

May I ask why it is that people seem to trust the SHA series when they are based on MD4?

No doubt plain MD4 is completely useless at this point given that MD5 which is broken is simply an extension of MD4.

Wouldn't it be a better idea to base a new design on MD5 instead?

This leads into my next question.. Is it possible to generate two _independant_ hashes for a particular dataset by inserting byte sequences into the data to be hashed?

From my reading of the MD5 reference code, for any equally sized pair of collisions, adding any particular byte sequence to the end of both sets will result in another pair of collisions, but adding to the beginning will make them independant.

I can't really test this theory since I have neither a pair of collisions nor the computing power to generate them.

Perhaps someone else can figure this out.

michaelMarch 15, 2005 2:40 AM

I saw a nice thing done with two small blocks of data which produce the same hashvalue.
build yourself a fileextractor which uses a bit in one of these datablocks to decide which of two different executables to extract from its archive.
now use extractor, datablock1 and the archive (which contains good.executable and bad.executable) to build a selfextracting archive.
hash it and publish it.
sometime later replace it with a second selfextracting archive, which contains the second datablock, where the crucial bit is different ....
This kind of attack can be done by everybody without knowing how to create md5-collisions.
funny, isn't it ?
don't trust selfextracting archives!

michael

(taken from a journal called hackin9 )

michaelMarch 15, 2005 2:50 AM

btw.

if MD5 and SHA1 are considered broken,
what does this mean to MACs based on them ?

meiMarch 15, 2005 5:18 AM

"if MD5 and SHA1 are considered broken,
what does this mean to MACs based on them ? "

it means we're in trouble, at least 10 years from now.

yikes

ДимаMarch 16, 2005 1:02 AM

Ребят а може на русском написать и дать ссылку где можно найти Алгорит позволяющий делать тоже самое

harryMay 31, 2007 6:34 PM

does the same attack can be done on modular arithmetic secure hash(MASH)

VictorAugust 1, 2007 5:58 PM

Hi buddys, the fact that you can have two or more md5 or sha1 or whatever algorithm you prefer with the same hash is merely a mathematical issue.

It's impossible to have a hashing algorithm with no collissions. For those concerned with altered files, i'd recommend using two or more hashing algorithms at the same time, so, since there are two keys to match, it's a lot harder for the atacker to find out a proper combination.

For passwords, you could easily re-hash the result n times, using different hashing algorithms, according to the initial letter of the password.

Well, they 're just ideas of my mind. Don't trust me!! hehehe

Leave a comment

Allowed HTML: <a href="URL"> • <em> <cite> <i> • <strong> <b> • <sub> <sup> • <ul> <ol> <li> • <blockquote> <pre>

Photo of Bruce Schneier by Per Ervland.

Schneier on Security is a personal website. Opinions expressed are not necessarily those of Co3 Systems, Inc..