A New Secure Hash Standard

The U.S. National Institute of Standards and Technology is having a competition for a new cryptographic hash function.

This matters. The phrase “one-way hash function” might sound arcane and geeky, but hash functions are the workhorses of modern cryptography. They provide web security in SSL. They help with key management in e-mail and voice encryption: PGP, Skype, all the others. They help make it harder to guess passwords. They’re used in virtual private networks, help provide DNS security and ensure that your automatic software updates are legitimate. They provide all sorts of security functions in your operating system. Every time you do something with security on the internet, a hash function is involved somewhere.

Basically, a hash function is a fingerprint function. It takes a variable-length input—anywhere from a single byte to a file terabytes in length—and converts it to a fixed-length string: 20 bytes, for example.

One-way hash functions are supposed to have two properties. First, they’re one-way. This means that it is easy to take an input and compute the hash value, but it’s impossible to take a hash value and recreate the original input. By “impossible” I mean “can’t be done in any reasonable amount of time.”

Second, they’re collision-free. This means that even though there are an infinite number of inputs for every hash value, you’re never going to find two of them. Again, “never” is defined as above. The cryptographic reasoning behind these two properties is subtle, but any cryptographic text talks about them.

The hash function you’re most likely to use routinely is SHA-1. Invented by the National Security Agency, it’s been around since 1995. Recently, though, there have been some pretty impressive cryptanalytic attacks against the algorithm. The best attack is barely on the edge of feasibility, and not effective against all applications of SHA-1. But there’s an old saying inside the NSA: “Attacks always get better; they never get worse.” It’s past time to abandon SHA-1.

There are near-term alternatives—a related algorithm called SHA-256 is the most obvious—but they’re all based on the family of hash functions first developed in 1992. We’ve learned a lot more about the topic in the past 15 years, and can certainly do better.

Why the National Institute of Standards and Technology, or NIST, though? Because it has exactly the experience and reputation we want. We were in the same position with encryption functions in 1997. We needed to replace the Data Encryption Standard, but it wasn’t obvious what should replace it. NIST decided to orchestrate a worldwide competition for a new encryption algorithm. There were 15 submissions from 10 countries—I was part of the group that submitted Twofish—and after four years of analysis and cryptanalysis, NIST chose the algorithm Rijndael to become the Advanced Encryption Standard (.pdf), or AES.

The AES competition was the most fun I’ve ever had in cryptography. Think of it as a giant cryptographic demolition derby: A bunch of us put our best work into the ring, and then we beat on each other until there was only one standing. It was really more academic and structured than that, but the process stimulated a lot of research in block-cipher design and cryptanalysis. I personally learned an enormous amount about those topics from the AES competition, and we as a community benefited immeasurably.

NIST did a great job managing the AES process, so it’s the perfect choice to do the same thing with hash functions. And it’s doing just that (.pdf). Last year and the year before, NIST sponsored two workshops to discuss the requirements for a new hash function, and last month it announced a competition to choose a replacement for SHA-1. Submissions will be due in fall 2008, and a single standard is scheduled to be chosen by the end of 2011.

Yes, this is a reasonable schedule. Designing a secure hash function seems harder than designing a secure encryption algorithm, although we don’t know whether this is inherently true of the mathematics or simply a result of our imperfect knowledge. Producing a new secure hash standard is going to take a while. Luckily, we have an interim solution in SHA-256.

Now, if you’ll excuse me, the Twofish team needs to reconstitute and get to work on an Advanced Hash Standard submission.

This essay originally appeared on Wired.com.

EDITED TO ADD (2/8): Every time I write about one-way hash functions, I get responses from people claiming they can’t possibly be secure because an infinite number of texts hash to the same short (160-bit, in the case of SHA-1) hash value. Yes, of course an infinite number of texts hash to the same value; that’s the way the function works. But the odds of it happening naturally are less than the odds of all the air molecules bunching up in the corner of the room and suffocating you, and you can’t force it to happen either. Right now, several groups are trying to implement Xiaoyun Wang’s attack against SHA-1. I predict one of them will find two texts that hash to the same value this year—it will demonstrate that the hash function is broken and be really big news.

Tags: AES, contests, cryptanalysis, cryptography, hashes, NIST, SHA-1, Skein

Posted on February 8, 2007 at 9:07 AM • 77 Comments

Comments

Nicholas Weaver • February 8, 2007 9:26 AM

I can’t wait. Once the Round 1 competitors are in, I’ll have to put back on my hardware implementer’s hat and do a paper hardware-evaluation of all of them.

The AES workshop was tons of fun, I’ve GOT to make sure I get to the hash workshops. 🙂

Ilya • February 8, 2007 9:41 AM

This is a great news, because a hash research area will finally get a boost. Hopefully we’ll see some interesting new attacks and designs during this competition.

Peter Pearson • February 8, 2007 9:44 AM

Actually, the probability of all the air molecules in this room bunching up in one corner and suffocating me is much smaller than the probability of two 160-bit hashes colliding by accident. There are about 10^26 air molecules in this room, so even if we ignore their mutual repulsion, we’re looking at numbers on the order of 2^(-10^26).

Ben • February 8, 2007 9:50 AM

“One-way hash functions are supposed to have two properties.”

You missed one critical property – It must be easy to compute the hash of a message.

Collision-free can be looked upon as two properties ( http://tinyurl.com/yo8tes ):
a) Given M, hard to find M’ such that H(M)=H(M’) — “second-preimage resistant”
b) Hard to find M,M’ such that H(M)=H(M’) — “collision resistant”
Note that b) implies a) (i.e. if we could solve a) we could solve b), but not conversely.

Fred P • February 8, 2007 9:56 AM

“But the odds of it happening naturally are less than the odds of all the air molecules bunching up in the corner of the room and suffocating you, and you can’t force it to happen either.”

You’d better have a really small (and/or extremely low-pressure) room for this to be true, and as a side effect you’ll suffocate from lack of (significant) air well before the air molecules bunch up (plus, if you go for size, you won’t be able to fit in it either – and few people would call something that small a room).

Roy • February 8, 2007 9:58 AM

Will there be a blog where we can learn the interesting/unexpected things that are turned up?

Anonymous • February 8, 2007 10:21 AM

@Roy

you could try here: http://geekz.co.uk/schneierfacts/fact/164

Bruce Schneier • February 8, 2007 10:29 AM

“Will there be a blog where we can learn the interesting/unexpected things that are turned up?”

Well, I’ll blog anything that I think is interesting and/or unexpected.

Bruce Schneier • February 8, 2007 10:30 AM

“Actually, the probability of all the air molecules in this room bunching up in one corner and suffocating me is much smaller than the probability of two 160-bit hashes colliding by accident.”

Yeah. I didn’t do the math when I posted the bit. I figured it wasn’t right, but it was good enough.

Michael Ash • February 8, 2007 11:15 AM

“Every time I write about one-way hash functions, I get responses from people claiming they can’t possibly be secure because an infinite number of texts hash to the same short (160-bit, in the case of SHA-1) hash value.”

Do these same people write in every time you talk about crypto claiming that standard encryption can’t possibly be secure because all you have to do is guess a short key?

Some people don’t seem to understand that crypto is all about how hard it is to find the critical bit of data, not whether it’s possible at all. And for whatever reason, there seem to be more people who have this trouble with hashes than with encryption.

radiantmatrix • February 8, 2007 11:34 AM

I propose that Bruce name whatever algorithm he submits “Onefish”.

Amonymous Coward • February 8, 2007 11:37 AM

What about SHA-256, SHA-384 and SHA-512?

Dror • February 8, 2007 12:08 PM

@Michael Ash
Only few people understand what encryption really is, and only a handful understand what hash functions are.
Cryptography is hard, and this is the result of incompetence.

Bruce Schneier • February 8, 2007 12:10 PM

“What about SHA-256, SHA-384 and SHA-512?”

I have, and continue to, recommend SHA-256 for immediate applications, and will probably continue to recommend the algorithm until this process is completed.

SHA-384 and SHA-512 are also fine, but overkill for most applications.

I’m sure any of these that meet the submission criteria will be submitted into his process.

Yozons • February 8, 2007 12:23 PM

SHA-1 hashes used in digital signatures are even less likely to be defeated by these reported exploits because the data being signed has meaning,.

Thus it’s not just finding another byte stream that results in the same hash that will break a digital signature, but they’d have to find another byte stream that represents a realistic alternative text.

If my digital signature’s hash proves the text, “I agree to pay you $100 for the Model 160 computer.” is original, it won’t be matter if your “other text” is something nonsensical. The barrier for a digitally signed document is much higher.

Of course, once the new hashing standard comes out, I’m sure digital signatures will make use of them, just as many have migrated to AES.

Jon • February 8, 2007 12:27 PM

I working on my entry which will be named, of course, “Corned Beef.”

Ian Mason • February 8, 2007 12:34 PM

“SHA-1 hashes used in digital signatures are even less likely to be defeated by these reported exploits because the data being signed has meaning”

This is true, but consider the number of other cryptographic protocols that go “Alice takes a random bit string B … hashes … Bob’s your uncle” where no intelligent observer is involved.

As for fishy hash references: Chowder?

Anonymous • February 8, 2007 12:35 PM

@Yozons:

Thus it’s not just finding another byte stream
that results in the same hash that will break a
digital signature, but they’d have to find another
byte stream that represents a realistic
alternative text.

Wouldn’t you agree, when the math has progressed to the point of being able to do the first, they’re that much closer to figuring out how to do the second?

concerned patriot • February 8, 2007 12:45 PM

I think we have to be very careful about making this an international competition. Allowing foreigners to inspect our code and algorithms, especially those who are serious threats to our country like the Chinese, is just asking for trouble. Of course, we’re too much better than them so they won’t be able to break our technology quickly, but why give them a headstart? Also, they might not adhere to any NDA, which could affect any company which wants to patent the new standards.

Keith • February 8, 2007 12:47 PM

I observed the AES competition “from afar”. Not only did I learn a lot, it was highly entertaining. Rather like an extremely geeky reality show. I can’t wait for “Season II: The Hash”.

Sebastian • February 8, 2007 12:52 PM

— quote, “concerned patriot” —
I think we have to be very careful about making this an international competition. Allowing foreigners to inspect our code and algorithms, especially those who are serious threats to our country like the Chinese, is just asking for trouble. Of course, we’re too much better than them so they won’t be able to break our technology quickly, but why give them a headstart?
— endquote —

If the security of the algorithm depends on an attacker not knowing how it works, the algorithm is weaker for it. Assume the attacker, whether it’s Chinese intelligence or John Smith up the street, knows how it works already — because even if they don’t yet, or if they don’t know for six months, they will.

If the algorithm isn’t strong enough to hold up against an informed attacker, it’s no good.

Brian • February 8, 2007 1:03 PM

@ “concerned patriot”
As someone who I assume is a regular reader of this blog, I’m surprised that you aren’t more aware of the problems with security through obscurity. Whatever hash function is the result of this competition, no matter how hard you might try to hide it, will eventually be examined by every cryptographer in the world. If anyone, Chinese or otherwise, can break it, I’d much rather that we know that during the design phase, rather than after every product on earth is using it.

As for patents, they have little place in serious cryptography and the promise not to patent if you win will almost certainly be a requirement of this competition, just like it was part of AES.

Finally, and not to put too find a point on it, but “we” are not “so much better than them”. Non-American cryptographers have contributed just as much to the field as their American peers, and some of the most interesting new ideas in modern cryptography have come from those untrustworthy foreigners we’re supposed to be watchful of. And lest we forget, AES was not designed by a couple of Americans from Utah, it was designed by two Europeans.

One of the things I’ve always admired about cryptographers (and scientists in general) is their ability to put aside such petty xenophobia and really work together as if their national backgrounds really don’t matter. As hard as it may be for some folks to understand, THAT it how we get the best cryptography. Remember, the very reason for this competition is that some Chinese cryptographers have a leg up on some of their American peers and decided to tell the world about it to help improve the overall state of the art

bertie • February 8, 2007 1:12 PM

And what about Whirlpool, the winner of the NESSIE competition ?

Nicholas Weaver • February 8, 2007 1:39 PM

Nah, the Schneier entry should be called FishStix.

And performance efficiency isn’t a cryptographic requirement, but an OPERATIONAL requirement.

Pat Cahalan • February 8, 2007 1:42 PM

C’mon, it’s Bruce.

His vote as part of the ex-Twofish team has to be for Cuttlefish. It’s got “fish” in it, but it’s a cephalopod!

Neal • February 8, 2007 1:46 PM

Hmm, referring to the air molecules bunching up, only the oxygen molecules need to bunch up in the corner of the room and only enough to lower the percentage below a certain threshold. I suspect that the likelihood of that happening is much greater than the other posters considered… so maybe you ARE in a resonable ball park.

Arturo Quirantes • February 8, 2007 1:49 PM

Bruce´s hash candidate cannot be named anything but SQUID!

Gregor • February 8, 2007 1:49 PM

@Yozons: In practice, you will not have single ASCII-only text lines about Alice and Bob. You will have Word files, PDF files, ZIP files, and so on … and there will always a space for you to hide some “random bytes” in these files. And due to the structure of many hash functions, once you have a “useless” collision on a few random bytes, you can construct useful ones on arbitrary pairs of documents just by changing a few bytes in these documents.

From http://www.cits.rub.de/MD5Collisions/ , who have a nice example, namely two postscript files with the same md5 hash:

“If you cannot exercise control over colliding messages, these collisions are theoretically interesting but harmless, right? In the past few weeks, we have met quite a few people who thought so.
With this page, we want to demonstrate how badly wrong this kind of reasoning is!”

dimitris • February 8, 2007 2:14 PM

Bruce, did you forget to turn off the blog’s sarcasm detector countermeasure device? Look what it did to Brian above.

Carlo Graziani • February 8, 2007 2:17 PM

If it’s fish-flavored hash, it has to be “GefilteFish”.

Spider • February 8, 2007 2:19 PM

Name proposal for bruce: shark. I found a recipe the team can follow here: http://www.foodreference.com/html/shark-hash.html Just substitute 256 bits of data for the 3 bunches of parsley and your set!

Anonymous • February 8, 2007 3:43 PM

@Yozons
“SHA-1 hashes used in digital signatures are even less likely to be defeated by these reported exploits because the data being signed has meaning,.”

It does make it harder, but not a whole lot. People are not going to be signing ASCII files, they are going to sign MS Word Docs, PDFs, etc. There is plenty of room within the file formats to play lots of games with the bits without affecting how it looks or prints out for the human user.

Anonymous • February 8, 2007 4:40 PM

@Anonymous

Unno ticed att ackS are Eas ier
with bina ry do cu ments, b ut theoret ical ly
you c oUld do the
same with ASCiI
documents
(s een Snowdr o p?)

Curious • February 8, 2007 4:58 PM

Here’s something I don’t understand. Why can’t you make hash functions using an encryption cipher?

All you do is use a known constant as the key, use CBC mode, and keep the last N ciphertext bits output from “encrypting” the document. You can then hash using whatever crypto algorithm you want: DES, AES, Blowfish, whatever.

I realize the mathematics of encryption and hashing are different (or are they?), so maybe I’m missing something obvious to a person with more math background.

I can also foresee that encryption might be slower than hashing, for a given quality of output. And there’s also the “munitions” question: a cipher-based hash could be used as a cipher, but a hash-only one can’t be.

Freiheit • February 8, 2007 5:40 PM

Is there a legitimate use for a hash with a known set of collisions? Since there are collisions it isn’t useful as a hash, but is now useful as something else.

I’ve run through a few use cases but each is solved by other cryptographic means.

Disclaimer: This is probably a hypothetically-useless question from someone who is only 2 or 3 ranks above ‘n00b’. I need to find the box with my crypto books in it and review.

McGavin • February 8, 2007 6:10 PM

One Fish
Two Fish
Red Fish
Blue Fish

From there to here, from here to there, funny things are everywhere.

Stuart Young • February 8, 2007 6:31 PM

I’ve always wondered about the merit of taking two existing hash functions that produce the same length output but that use totally different underlying methods of creating the hash (the hard problem to reverse-solve), running them on the same data, and then simply combining (XOR?) the output of both hash mechanisms together.

The idea being that if one of the hard problems gets broken (different data, same hash), it would be exceedingly rare to find a case where the other problem actually creates the same output value (as the hash from that algorithm would be different), which therefore changes the eventual value you get out of this sort of hash.

Of course, I’m being somewhat simplistic, but it gets the basic idea across. I think that perhaps integrating them somewhat deeper in each other would be more secure, but as with anything cryptographic, you need to be very careful and very smart on how you handle this sort of thing.

Michael Ash • February 8, 2007 7:36 PM

@ Stuart Young

By combining two hashes you’ve created a third hash function that happens to be implemented in a modular fashion. While it’s unlikely for an attacker to find simultaneous breaks in the two underlying hash functions which allow him to break your composite hash, an attacker who wishes to attack your composite hash would simply skip that step and attack your new hash function directly. The two underlying hashes don’t need to produce the same values for a break, they just need to produce values such that the XOR is the same as before. It’s not clear that this gains you anything aside from obscurity.

Spider • February 8, 2007 8:16 PM

@ Stuart Young

I had a simular idea yesterday. What if you simply stored the two distinct hashes created by two very different hash functons. You would have to find a plaintext colided with yours in both functions. I think that would be crazy difficult, but I think you’d pay for any increase in collision detection by leaking more information about the plain text. But, is it worth it?

I need to learn more of the math behind hashes inorder to answer that question. If only there were some book that a master of encryption could write that would serve as a starting point for such forays into the depths of the cyrpto-world.

Oh, wait…. {fires up amazon}

Billy Dorminy • February 8, 2007 9:36 PM

@Curious: Yes, you can encrypt in the manner you describe; Mr. Schneier talks about it in Applied Crypto. However, it’s slower than a proper hashing algorithm, and if I remember correctly (I probably don’t) it requires careful implementation.

Hashbrown • February 9, 2007 12:30 AM

Re: @Yozons: In practice, you will not have single ASCII-only text lines about Alice and Bob… From http://www.cits.rub.de/MD5Collisions/

Thanks! Pls post the paper’s availability here.

Paeniteo • February 9, 2007 2:32 AM

@Curious: “And there’s also the “munitions” question: a cipher-based hash could be used as a cipher, but a hash-only one can’t be.”

You can use a pure hash to encrypt, by using it as a pseudo-random generator and XORing the output with the cleartext.
But AFAIR “Applied Crypto”, hash-functions might perform poorly against some attacks that this makes possible.

@Billy Dorminy: “and if I remember correctly (I probably don’t) it requires careful implementation.”

Even if you don’t remember correctly, it is usually safe to assume that anything requires careful implementation. 😉

Paeniteo • February 9, 2007 2:42 AM

Of, I forgot to ask: Isn’t an easy defense against the collision-attacks to append/prepend some random “garbage” to the hashed files?

This would render a pre-found collision useless, in effect requiring a pre-image attack.

Assume the two documents “aaa” and “bbb” hash to the same value and Mallory wants me to sign “aaa”.
If I now sign “xaaay”, Mallory can’t use his “bbb” document anymore and he also won’t be able to simply make it “xaaay” (will he??). It might be that the random parts have to be large enough to span block-boundaries of the hashing-function, though.
Signature-applications could easily to this kind of padding quite transparently (I don’t know whether, say, GnuPG or S/MIME do already have such provisions).

Am I right to assume that:
H(“aaa”) = H(“bbb”)
does not imply
H(“xaaay”) = H(“xbbby”)
for a ‘secure’ hash-function H?

Lawrence D'Oliveiro • February 9, 2007 3:00 AM

You use the term “impossible”, which the mathematicians (and computer scientists) would cringe at. You really mean “computationally infeasible”.

But if you really want to use the word “impossible”, how about qualifying it thus: “very impossible”. That would tip people off that you don’t really mean “impossible”. 🙂

X the Unknown • February 9, 2007 9:22 AM

@Paeniteo: “Of, I forgot to ask: Isn’t an easy defense against the collision-attacks to append/prepend some random “garbage” to the hashed files?”

As I understand it, the problem is that the process has to be repeatable (so third parties can independently check the hash). Thus, any “random garbage” you add cannot be truly random, but must be algorithmically derived from the characteristics of the hashed data.

Basically, you are simply complicating the hash-algorithm somewhat, but not truly changing the fundamental process of what Hashing does.

greg • February 9, 2007 9:26 AM

I think one major benfit of a competion was left out. You don’t end up with just one hash function but a set of finnalists, just like most libarys implement all the finalsit from the last competion. That gives some people choice. We can use twofish if we think the underlying GF(2) sbox structure in AES is a concern for example. Or choose one that has lower power requierments for embeded systems.

X the Unknown • February 9, 2007 9:28 AM

@Paeniteo

If, on the other hand, you are proposing that a given file’s Hash consists of two values:

H1 = H(File)
H2 = H(<noise>File)

then what you have effectively done is double the length of the Hash itself. I am not mathematically qualified to analyze this, but it seems like it would be a fairly simple way of extending the useful life of a given hash algorithm…

X • February 9, 2007 9:29 AM

Sorry, my prior post ate some of the text as looking like HTML.

@Paeniteo

If, on the other hand, you are proposing that a given file’s Hash consists of two values:

H1 = H(File)
H2 = H(Junk + File)

Posted by: X the Unknown at February 9, 2007 09:28 AM

FP • February 9, 2007 9:31 AM

@Yozons:

“SHA-1 hashes used in digital signatures are even less likely to be defeated by these reported exploits because the data being signed has meaning.”

Yes, that’s an important point to make. Finding a bunch of random data that happens to have the same hash as a Word document is, in many situations, not much of an issue. (Although I’m sure that you can construct a scenario where an implicating document is replaced with useless junk in storage.)

The threat is slightly increased by the fact that many document formats allow the inclusion of non-functional data, i.e., “comments”. An attacker can substitute the content of the document, and then attempt to add non-functional data (e.g., using ‘%’ comments in PDF documents) that cause the file to match the hash of the original document.

X the Unknown • February 9, 2007 9:45 AM

@Freiheit: “Is there a legitimate use for a hash with a known set of collisions? Since there are collisions it isn’t useful as a hash, but is now useful as something else.”

Of course there is a legitimate use! I would argue that, in fact, this is the most common use of Hash functions, and probably predates any actual cryptographic use. It is a method to break large groups of input into smaller groups.

In computer programming, it is used to quickly populate sparse arrays of (relatively) short linked lists, so data can be found more quickly than by searching through all of the records. You take a hash of your search-key, and then only have to check the records that match that hash.

In the physical world, a very simple but common example is the practice of sorting mail by it’s addressee. A common secretarial tool is a desktop filing “thing”, with 26 flippable tabs, labeled alphabetically. The secretary then inserts mail into each section by first letter of the recipient – a very simple hash function. It is then relatively easy to find any given person’s mail, without sorting through the entire stack. Actually sorting all of the mail alphabetically would be a slower process, and requires some degree of re-sorting whenever new mail comes in.

Warren • February 9, 2007 10:29 AM

I’ve been really interested in cryptography since middle school in the 90s. The issue I’ve seen in both attempting to design hashes is that it’s [relatively] simple to create a hash that is unique – but not that’s computationally feasible or not a ‘nice’ length.

I’ve putzed around with some design in the past – and would love to get ‘real’ analysis of a simple proposal: http://warrenmyers.com/permlink.php?entry=20050425115835.

How does anyone go about actually designing/building a ‘good’ algorithm? It’s obviously not just hash length.

X the Unknown • February 9, 2007 10:45 AM

@Anonymous: “People are not going to be signing ASCII files, they are going to sign MS Word Docs, PDFs, etc. There is plenty of room within the file formats to play lots of games with the bits without affecting how it looks or prints out for the human user.”

Actually, I think the most-common things to be signed will be true binary data: executables, compressed archives, and communications data-packets. These generally don’t have as much “wiggle room”, because they have to “work” within the expected format.

I guess open-ended formats (like an executable or Word document) could play games with arbitrary data-segments that just aren’t accessed by the rest of the code (or “unused” Macro sections in a Word document), however. To prevent this sort of problem, I would suggest that a byte-count of the hashed data be appended to the calculated hash, to produce the “full hash” – something like the UNIX checksum program does.

To allow for hashing arbitrarily-large datasets, a special “length” value could be reserved (I would suggest 000000000000) to indicate “undefined”. As long as the canonical length-field is long-enough to handle most anticipated data-sets (at least a terabyte, I would think), this should produce a generally usable result that is particularly difficult to generate “valid” collisions with, given structured data.

Alternatively, truly-large datasets could just be required to be “broken up” into appropriate-sized blocks, with each block hashed separately. This might not be so unreasonable, since the datasets themselves would require special handling (they wouldn’t fit into RAM/disk/tape/etc. in their entirety, on many systems). You could then produce a “second-order” hash of all the block-hashes to provide the final, fixed-length working hash.

At any rate, if data-size is implemented as part of the hash-value, I would suggest that some method of “cascading hashes” be built into the ultimate Hash Standard, so we can hash ultra-large datasets in a common, standardized manner. The size of data-sets is growing geometrically, right along with Moore’s Law.

Ilya • February 9, 2007 11:02 AM

@Curious:

Because there would be a straighforward pre-image and collision attacks on such hash.

c = H(m), where H() is a hash function based on a block cipher E the way you’ve described.
x = E'(c)
x != m, c == H(x) == E(x)

I.e. it is easy to find x that will hash to the given c.

It is possible to construct a hash based on a block cipher, but it is not that simple as you may expect.

Paeniteo • February 9, 2007 11:06 AM

@X: “As I understand it, the problem is that the process has to be repeatable (so third parties can independently check the hash). Thus, any “random garbage” you add cannot be truly random, but must be algorithmically derived from the characteristics of the hashed data.”

No, I was suggesting something else. Let me make it clearer: The random garbage would be sent along with the document, in clear text. Maybe think of it as an “initialization vector”.
Therefore, it would be available to anyone wanting to recompute the hash.

Say, if you have document A, random garbage X and hash H(A . X), you would send all three bits of information: A, X and H(A . X).

It might look like this (cf. PGP’s ASCII-messages):

— BEGIN SIGNED DOCUMENT —
This is the clear text message.
— BEGIN HASH DATA —
IV: …random garbage…
Hash: …the hash value…
— BEGIN DIGITAL SIGNATURE —
…RSA signature of the hash data…
— END HASHED DOCUMENT —

NB: The hash (and probably the IV) would be the digitally signed bits to prevent tampering.
A checking application would know how to append/prepend/… the IV into the clear text (assume an appropriate RFC or the like).

The point is that an attacker may well produce two messages that produce the same hash when hashed alone. He cannot guess what random data you add, however and therefore his found collision will be useless to him.

He would have to work again to find a collision — but this time to a given hash value, which is a wholly different story (“preimage attack” IIRC).
AFAIR SHA-1 has not yet shown weaknesses in this respect.

The idea is to add something that the attacker cannot control to defeat any pre-calculation he might have made.

Gregor • February 9, 2007 11:34 AM

@X: This is called “Merkle-Damgård strengthening” (see http://en.wikipedia.org/wiki/Merkle-Damg%C3%A5rd_construction), it is done within md5 AFAIK, and it does not help as soon as the attacker can make the documents have the same length, e.g. by modifying the shorter one.

And: It is quite simple to hide data in .ZIP or .gz (they have comments fields). I would be surprised if there was a file format where this is not possible for a determined attacker.

Alan • February 9, 2007 1:48 PM

Yozons posted:

“SHA-1 hashes used in digital signatures are even less likely to be defeated by these reported exploits because the data being signed has meaning”

People said the same thing in the early stages of the MD5 break. Now there are script-kiddie level tools available that enable you to create, for example, two meaningful PDF’s that generate the same MD5 hash. It did not take long.

X the Unknown • February 9, 2007 1:57 PM

@Gregor

Thanks

ducksandruns • February 9, 2007 3:35 PM

Squidophile name: “Brhulhu”

Anonymous • February 9, 2007 5:13 PM

@X the Unknown

Thats why it sounded familiar! Thanks for reminding me that I need to review the stuff I learned in high school once in a while.

Keith Moon • February 9, 2007 6:35 PM

Simple…just divide by two

Wim L • February 10, 2007 1:04 AM

Paeniteo: I think that’s usually called a “salt”.

Anonymous • February 10, 2007 6:53 PM

@Paeniteo,

“””NB: The hash (and probably the IV) would be the digitally signed bits to prevent tampering.”””

So the “more-secure-hash-function” requires a digitally signed hash/IV.

Digital signatures require a hash function.

Obviously the hash needs to be securely hashed and signed, which in turn requires yet another hash which is … security through infinite recursion?

also,
“Am I right to assume that:
H(“aaa”) = H(“bbb”)
does not imply
H(“xaaay”) = H(“xbbby”)
for a ‘secure’ hash-function H?”

AFAIK the current attacks are of the form
H(“aaAaa”) = H(“aaBaa”)
where “A” and “B” leave the internals of the hash function in the same state. This implies that
H(“xaaAaay”) = H(“xaaBaay”)

Paeniteo • February 11, 2007 9:29 AM

@Anonymous: “So the “more-secure-hash-function” requires a digitally signed hash/IV.”

It was not meant to be a new hash function but rather an amendment to a common use of hash functions (digitally signing documents) which is at risk if the hash function is not collision-resistant:
An attacker may give a document to a notary/boss/… for signing and has a different document with the same hash ready. It would appear as if the notary had signed the other document.

“AFAIK the current attacks are of the form
H(“aaAaa”) = H(“aaBaa”)
where “A” and “B” leave the internals of the hash function in the same state. This implies that
H(“xaaAaay”) = H(“xaaBaay”)”

I did some testing with the example files at http://www.cits.rub.de/MD5Collisions/ and it appears that prepending random data destroys the collision:

Differing files with same MD5:
$ diff letter_of_rec.ps order.ps
Binary files letter_of_rec.ps and order.ps differ
$ cat letter_of_rec.ps | md5sum
a25f7f0b29ee0b3968c860738533a4b9 –
$ cat order.ps | md5sum
a25f7f0b29ee0b3968c860738533a4b9 –

Create “random garbage”:
$ echo “xxxxxxxxx” > random.garbage

Append:
$ cat letter_of_rec.ps random.garbage | md5sum
f3eeb7a2fd5ca51e5b464ccc3b65e805 –
$ cat order.ps random.garbage | md5sum
f3eeb7a2fd5ca51e5b464ccc3b65e805 –

Prepend:
$ cat random.garbage letter_of_rec.ps | md5sum
25e30fbe90d68c4d42677d916925e575 –
$ cat random.garbage order.ps | md5sum
7a293a889e4f9f684546a6f9a8cbb475 –

Note the different MD5 values when prepending the random string.
I assume that this “shifts” the original file contents around block-boundaries of the hash function and therefore mixes the process up, resulting in a different hash, in other words:

H(“xaaAaa”) != H(“xaaBaa”)

Aaron • February 11, 2007 10:17 AM

I wrote about the problems of SHA-1, and proposed better hashing algorithms to be the default in GnuPG. Even though it’s overkill, I have been using SHA-512 for the default of signing my text for a while now.

You can read that text here:
http://www.pthree.org/2007/01/22/time-for-higher-security-in-digital-email-signatures/

hand of spank • February 11, 2007 11:13 AM

Whirlpool

nuff said

Dave Clarke • February 11, 2007 3:43 PM

If we’re talking about fish-related names for a hash function, surely the obvious one is candiru (http://en.wikipedia.org/wiki/Candiru), being a kind of one-way fish.

Someone • February 12, 2007 4:07 AM

I propose that Bruce name whatever algorithm he submits “Onefish”.

“HASHFISH” sounds better

waldo • February 12, 2007 7:52 AM

@ concerned partiot
THE AES algorythm isn’t american either , yet i think you ‘ll use it every day
this is from wikipedia:
The cipher was developed by two Belgian cryptographers, Joan Daemen and Vincent Rijmen, both alumni of the Katholieke Universiteit Leuven, and submitted to the AES selection process under the name “Rijndael”, a combination of the names of the inventors. (Rijndael is pronounced [rɛinda??l] (IPA), which sounds almost like “Rhine doll”. [2])

It’s said that, the fact that it was developed by two guys from a small and relativly neutral country, made it a very suitable candidate because of the small chance of them having built in a backdoor.

am0teur • February 14, 2007 12:52 AM

Regarding the hash values: how about calculating two different hashes of an input using two different hash functions and then joining them? What is then the probability of collisions? (The motivation comes partially from TrueCrypt solution where Serpent+Twofish+AES are used in cascade.)

Henning Makholm • February 14, 2007 8:23 PM

About odds and air:

Say that a room contains some 2^80 molecules of air (give or take some – it’s in that ballpark). Then odds that at any given moment they will all decide to be on the same side of the room is on the order of 1:2^(2^80). In comparison with THAT, finding hash collisions by blind luck is expected to happen practically all the time. Words will not adequately express how monstrous that gap in improbability is.

Note just that the ratio of 2^(2^80) to, say, 2^1024, is 2^(2^80-1024), which for any reasonable purpose is indistinguishable from 2^(2^80) itself.

jxie • June 10, 2007 10:52 AM

Will encrypting a key by the key itself always produce a distinct result?

I posted this question elsewhere, no satisfied answer yet. Hopefully, somebody here will take this question like a piece of cake, 😉

Bruce Schneier • June 10, 2007 12:13 PM

First: this is a question about encryption algorithms, not hash functions. I menion this only because you’re asking it as a comment to a post about hash functions, not encryption algorithms.

The answer to your question depends, of course on the encryption algorithm in question. But if you assume that an encryption algorithm is a random key-dependent mapping from the plaintext to the ciphertext, then no. E_k(k) is not necessarily distinct for all possible values of k, and almost certainly there are collisions where E_k1(k1) = E_k2(k2). (Note that I am assuming that the block and key size are equal.)

Anonymous • June 11, 2007 4:44 AM

Thanks.
I knew this will depends on the algorithm. When I said encryption, I meant the popular ones like AES, or DES, 3DES etc.
Actually hash is a much slower than popular encryption, the standard way for pwd authentication is using hash comparison. If AES or other secure encryption algorithm has the feature I described, it can be used as an alternative way of authentication.

In the case that they don’t, it might still be used if the collision rate is very low, i.e., much lower than the key space, e.g., if key is 128 bit, and collition rate is 8 bit or something.

c4a • June 18, 2007 1:29 PM

Hello,

assuming you would like to use parts of the output of a hash function to feed another hash function, i.e. 32 digits of sha256 to feed ripemd128.

What’s the more secure way: Using the first… 32 digits of sha256 or using only the numbers 0-9 of all 64 digits (here perhaps with additional cutting or filling necessary, but for this example please ignore the consequences).
In the first case you use parts of the original algo. In the second case the original algo is destroyed.

Sydney Lewis-Picard • December 24, 2007 1:46 PM

How about naming the new hash from Bruce, Twoavids, or Twomamals, or Twomicrobs? I think the operative description if fish (a general class of animal), not two.

Or, call it Twoscales, or Twofins or Twogills, if we want to keep with the fish theme, (all parts of the fish), or even Fishbait, or Seaweed.

Or, call it Twochips, as in fish and chips. It would encourage people to always use Twochips along with Twofish.

Just a thought on the theme.

Bruce Schneier • December 24, 2007 2:34 PM

@ Sydney Lewis-Picard:

We’re certainly putting a lot of thought into the name.

Mario Ticci • September 1, 2008 12:30 AM

What i would really like to see in a new hash standard are non-trivial latin squares in some s-boxes.

A New Secure Hash Standard

Comments

Leave a comment Cancel reply