Notes from the Hash Function Workshop

Last month, NIST hosted the Second Hash Workshop, primarily as a vehicle for discussing a replacement strategy for SHA-1. (I liveblogged NIST’s first Cryptographic Hash Workshop here, here, here, here, and here.)

As I’ve written about before, there are some impressive cryptanalytic results against SHA-1. These attacks are still not practical, and the hash function is still operationally secure, but it makes sense for NIST to start looking at replacement strategies—before these attacks get worse.

The conference covered a wide variety of topics (see the agenda for details) on hash function design, hash function attacks, hash function features, and so on.

Perhaps the most interesting part was a panel discussion called “SHA-256 Today and Maybe Something Else in a Few Years: Effects on Research and Design.” Moderated by Paul Hoffman (VPN Consortium) and Arjen Lenstra (Ecole Polytechnique Federale de Lausanne), the panel consisted of Niels Ferguson (Microsoft), Antoine Joux (Universite de Versailles-Saint-Quentin-en-Yvelines), Bart Preneel (Katholieke Universiteit Leuven), Ron Rivest (MIT), and Adi Shamir (Weismann Institute of Science).

Paul Hoffman has posted a composite set of notes from the panel discussion. If you’re interested in the current state of hash function research, it’s well worth reading.

My opinion is that we need a new hash function, and that a NIST-sponsored contest is a great way to stimulate research in the area. I think we need one function and one function only, because users won’t know how to choose between different functions. (It would be smart to design the function with a couple of parameters that can be easily changed to increase security—increase the number of rounds, for example—but it shouldn’t be a variable that users have to decide whether or not to change.) And I think it needs to be secure in the broadest definitions we can come up with: hash functions are the workhorse of cryptographic protocols, and they’re used in all sorts of places for all sorts of reasons in all sorts of applications. We can’t limit the use of hash functions, so we can’t put one out there that’s only secure if used in a certain way.

Posted on September 11, 2006 at 3:30 PM27 Comments

Comments

katre September 11, 2006 4:20 PM

Bruce said: hash functions are the workhorse of cryptographic protocols, and they’re used in all sorts of places for all sorts of reasons in all sorts of applications.

Does it make sense to split up these uses, name them differently, and start to come up with different sets of functions for each use? I don’t have a solid enough grounding in cryptography to even know what the domains are, but it seems that having one function for everything is fundamentally harder to do securely than having separate functions for each domain.

Brian September 11, 2006 4:39 PM

Some of Theodore Tso’s comments on the design of the linux /dev/random seem relevant.

(From http://kerneltrap.org/node/6630)

“So one of the key design criteria was that /dev/random use its cryptographic primitives in ways that were non-brittle — that is, even if the primary use of that cryptographic primitive were compromised (e.g., the recent collision attacks against SHA-1), the security of the system would not be compromised.”

Having a few hash functions around can be nice, because for certain applications they can be combined to provide security even in the case where one of them is compromised.

Steve Parker September 11, 2006 6:29 PM

As the first two posters, I also think that diversity is a Good Thing… Otherwise, it’s just an arms race with a single target.

if 100% of encryption is SHA, then the breaking of SHA will affect every encryption.

Steve Parker September 11, 2006 6:32 PM

Break it once, break everything. That would just lead to an arms race. Like with (let’s say) WMF, who knows how long the black hats knew about it?

If we have a single encryption protocol, and only the black hats know the flaw, we’re all down the swanney.

Anonymous September 11, 2006 8:17 PM

Why not two hashing functions, A and B, using radically different and well studied techniques?

Then brand the cryptographic variant, C, as the xor or some other product of A and B.

A weakness in one would likely not be a weakness in the other – leaving C secure until the other is also compromised.

Is there a flaw in my logic?

Dave September 11, 2006 8:35 PM

How can you advocate for a monoculture of hash functions??? That would make the inevitable (?) class break all the worse.

B-Con September 11, 2006 9:08 PM

I think we need one function and one function only, because users won’t know how to choose between different functions.

“Users”? I would think the only people making decisions on this would be people with a degree of education on the matter. Like the NIST said, they’re open to accepting more than one (probably specialized) hash function if appropriate.

But I can see why the NIST might want to only officially vouch for one hash algorithm. If the public wants more choices, it can design more (not that there don’t already exist such options). But I still would like to see variety, so long as it doesn’t degrate the overall quality.

Ilya September 11, 2006 9:27 PM

I would rather say we need one hash function as a standard plus couple more of significantly different design. Different design is a keypoint here.

Michael Richardson September 11, 2006 9:51 PM

I agree that one function with a couple of parameters is good. But, I don’t like the situation with AES and the numbers, and then the various modes of it. It confuses the users a lot, and they naturally assume that bigger numbers are better, and then want things like AES142 or something silly.

katre has it right, make up a new fundamental function. Then, give each major use/complexity a name. I don’t care about the names — Islands of Greece, lists of MVPs from minor league baseball, pet-names of NSA administrators — but let’s have some names for the different uses, so that if one use is found to be insecure, we don’t have users running around screaming for no reasons.

greg September 12, 2006 1:29 AM

Well if you have to implement a protocol, and you have a PHB (Pointy hair Boss). Then trust me a single recomendation is a great thing. AES is the choosen cypher, but most libs have all the finalist from that competion. The point being that most people who use this stuff have no idea, and people planing this stuff don’t know either. And for the most part the weak link from these people won’t even be close to the crypography.(I have seen encrypted fields in a DB with the key stored in the clear in other fields!!)

But for those that do know what they are doing or know enough to get others to do it, then there are options anyway.

I’m all in favour of a single endorsed Hash function. It will make avarage systems better. There will be other finalists.

The idea solution is of course provable security. In the case of hashes we have some way to get yet.

Paul Crowley September 12, 2006 1:32 AM

“And I think it needs to be secure in the broadest definitions we can come up with”

Does this mean you’d argue against using the Merkle-Damgard construction, given the number of attacks known on that structure (length extension, multicollisons)? I’m really hoping to see some different ideas in hash function space. Something more like Panama, where the state is much larger than the output size. Panama hash is broken, of course, but there are some nice ideas in there.

Paeniteo September 12, 2006 2:37 AM

Bruce, did I miss you blogging about an attack on SHA-1 published at Cryto 2006?
-> http://www.heise-security.co.uk/news/77244

Anything to say about this? Being able to choose parts of the message when finding a collision sounds like we are way closer to a problem than with collisions alone.
The details are rather sketchy, however (reduced round variant, no statement about complexity). I wonder if you have heard more.

Ben Liddicott September 12, 2006 5:45 AM

@Anonymous 08:17

This does not help where both are broken. For example the PDF MD5 hash collision attack demonstrated a few months ago relied on finding a point in the document which could take two different values and yield the same hash.

If you use two broken hash functions, you have only to find two points in the document, one to make the first hash collide, the other for the second hash.

Geoff Lane September 12, 2006 6:13 AM

Now that /dev/random is widely implemented, could we get back to generating random numbers from real random events such as thermal noise, radioactive decay or those useful noisy diodes we used to use 30 years ago?

Adam September 12, 2006 7:19 AM

“it makes sense for NIST to start looking at replacement strategies — before these attacks get worse.”

You mean “before these attacks get better” don’t you?

“Attacks always get better; they never get worse.”
— NSA

Glenn September 12, 2006 8:42 AM

Maybe it’s because I’m a PHB, but a single standard algorithm sounds good to me for a lot of reasons. The objection that compromising that algorithm would compromise all hashes makes sense, but compromising one algorithm out of two, thus compromising half the hashes doesn’t sound a lot better. I like the approach Anonymous suggests of using two very different algorithm and XORing the results. Best of both worlds.

jmr September 12, 2006 9:41 AM

@Ben Liddicott

True, but the question remains: Does a composite hash function H1(X)^H2(X) remain secure when either H1 or H2, but not both, are broken?

Clive Robinson September 12, 2006 9:45 AM

@Bruce

Rather than one function, I would prefer one well thought out interface.

Programers neither known nor care what hash is to be used after all it’s in the spec they work to. What they do care about is ease of use / re-usability / maintainability as do their bosses who pick up the cost.

So I think spending time working out a good API to hang the latest Hash code onto would make everybodies life a whole lot easier. Beter still come up with a module system where the new code can be pluged in by the end user just by dropping in a new module would be a nice topping.

Brian September 12, 2006 10:11 AM

@Clive

Check out the java cryptography APIs, they meet your requirements. When designing a system, though, it doesn’t really help to push the cryptography specifications on to your customer. They probably know less about the problem than you do.

@Ben Liddicott

I’m not sure I understand your point. Let’s say I want to sign a contract. If I use one hash function, somebody who wants to forge my signature needs to find a collision in that hash. If I sign using two functions, then they not only have to find a collision in both, they have to find a single document that produces the collision for both functions.

Or am I missing something?

Victor Bogado September 12, 2006 10:45 AM

@Anonymous at September 11, 2006 08:17 PM

The combination of two strong functions does not necesserally makes a stronger one. This is somewhat similar to say that if you fly a long way and then drive a long way you will end up farther from the starting point, while it is probably true in some cases you end up in a “near place”.

What I mean is that the inner works of one function could somehow “break” the other function.

Anonymous September 12, 2006 5:34 PM

@People who responded to original H1(x) ^ H2(x) post.

Say H1 is fully compromised and I may create any result I wish with a modified x: x’. Any change to x required by H1 will produce and different and un-predictable H2 result, I would need to find a solution such that H1(x’) ^ H2(x’) would produce the original result. However H1(x’) ^ H2(x) would be a cinch to break, that’s a given.

This situation assumes that x or x’, not both, must be fed to H1 and H2.

I used xor as a simple suggestion, I’m sure there are other mixing techniques. Possibly maintaining some bits from each and mixing others. Maybe even simple concatenation. The idea remains the same.

The person who commented that if H1 and H2 are broken, well yeah – then you have bigger issues at stake. That’s a given for any system. The idea here is to degrade gracefully if one is fully compromised.

Even with H1 being MD5 and H2 being SHA1, solving it for H1 breaks any results you may have had for H2. Breaking H2 then destroys your progress on H1. No doubt there are solutions providing a collision for both, the trouble is finding it for for two instead of one.

Gordon Mohr September 13, 2006 11:16 PM

A recent paper (at the same conference as the latest SHA1 result) addressed the question of the best way to combine two hash function. I’ll let its abstract speak:

On the Impossibility of Efficiently Combining Collision Resistant Hash Functions
by Dan Boneh and Xavier Boyen
http://ai.stanford.edu/~xb/crypto06b/index.html

Abstract

Let H1,H2 be two hash functions. We wish to construct a new hash function that is collision resistant if at least one of H1 or H2 is collision resistant. Concatenating the output of H1 and H2 clearly works, but at the cost of doubling the hash output size. We ask whether a more clever construction can satisfy a similar security property with a shorter output size. We answer this question in the negative — we show that there is no generic construction that securely combines arbitrary collision resistant hash functions, and whose output is shorter than simply concatenating the given functions.

Mike Amling September 16, 2006 8:47 PM

The “one function” vs “many functions” issue will probably not be a problem. If it’s like the AES contest, the hash competition will have one winner and 3 or 4 other functions judged “good enough”. Those who don’t like hash monoculture can use one of the runner ups.

James October 28, 2008 7:50 PM

When you weigh the balance between computing power and security is it not better to lean towards security given the increasing rate of computing power? To that point – what would be the best hashing function (best = no collisions, reasonable speed to run the crypto, and truly irreversible) to use for the next 5 years? SHA-512 looks very good, but has the same number of rounds that SHA-1 has and you mentioned in a post that they both have similar weaknesses? At the end of the day you need to hash, so what would do it in todays world without getting collisions?

Scanning the web and following your posts one could easily state – ‘it depends’, but when the rubber meets the road I feel the SHA is the preference over WHIRLPOOL.

Thoughts? Guidance?

JimFive October 29, 2008 10:12 AM

@James
You can’t have a hash function without collisions.

More accurately, you can’t create a function with an arbitrary input and a fixed-length output without collisions. The reason is straightforward. The number of possible results from the hash is limited by the number of bytes that the hash returns. If you have more than that number of inputs there will be a collision. If you have 367 people there WILL BE 2 people with the same birthday.

Perfect hash functions are only perfect for a limited input domain.

JimFive

Clive Robinson October 29, 2008 12:00 PM

@ JimFive,

“If you have 367 people there WILL BE 2 people with the same birthday.”

You forgot the magic words “AT LEAST” 😉

As a very rough aproximation I would expect at least 18 people to share a birthday with at least one other possibly more.

@ James,

For big numbers in binary halving the number of bits is as close as (usually) makes no difference so for a 512bit hash you would expect a collision with around 2^256 randomly selected messages.

However that says nothing about how the collisions are spread out. For instance if you had 100 people and the only 10 with unshared birthdays all had 29th Feb as their birthday you would feel strongly that something was most definatly odd.

Collision resistance is ONLY ONE OF MANY many things that a hash function has to be good at (saying a hash has good collision resistance, is a bit like saying a chess grand master should be breathing 😉

Leave a comment

Login

Allowed HTML <a href="URL"> • <em> <cite> <i> • <strong> <b> • <sub> <sup> • <ul> <ol> <li> • <blockquote> <pre> Markdown Extra syntax via https://michelf.ca/projects/php-markdown/extra/

Sidebar photo of Bruce Schneier by Joe MacInnis.