"If you have 367 people there WILL BE 2 people with the same birthday."

You forgot the magic words "AT LEAST" ;)

As a very rough aproximation I would expect at least 18 people to share a birthday with at least one other possibly more.

@ James,

For big numbers in binary halving the number of bits is as close as (usually) makes no difference so for a 512bit hash you would expect a collision with around 2^256 randomly selected messages.

However that says nothing about how the collisions are spread out. For instance if you had 100 people and the only 10 with unshared birthdays all had 29th Feb as their birthday you would feel strongly that something was most definatly odd.

Collision resistance is ONLY ONE OF MANY many things that a hash function has to be good at (saying a hash has good collision resistance, is a bit like saying a chess grand master should be breathing ;)

]]>You can't have a hash function without collisions.

More accurately, you can't create a function with an arbitrary input and a fixed-length output without collisions. The reason is straightforward. The number of possible results from the hash is limited by the number of bytes that the hash returns. If you have more than that number of inputs there will be a collision. If you have 367 people there WILL BE 2 people with the same birthday.

Perfect hash functions are only perfect for a limited input domain.

--

JimFive

Scanning the web and following your posts one could easily state - 'it depends', but when the rubber meets the road I feel the SHA is the preference over WHIRLPOOL.

Thoughts? Guidance?

]]>Whirlpool is doing a great job i think.

]]>On the Impossibility of Efficiently Combining Collision Resistant Hash Functions

by Dan Boneh and Xavier Boyen

http://ai.stanford.edu/~xb/crypto06b/index.html

Abstract

Let H1,H2 be two hash functions. We wish to construct a new hash function that is collision resistant if at least one of H1 or H2 is collision resistant. Concatenating the output of H1 and H2 clearly works, but at the cost of doubling the hash output size. We ask whether a more clever construction can satisfy a similar security property with a shorter output size. We answer this question in the negative --- we show that there is no generic construction that securely combines arbitrary collision resistant hash functions, and whose output is shorter than simply concatenating the given functions.

Say H1 is fully compromised and I may create any result I wish with a modified x: x'. Any change to x required by H1 will produce and different and un-predictable H2 result, I would need to find a solution such that H1(x') ^ H2(x') would produce the original result. However H1(x') ^ H2(x) would be a cinch to break, that's a given.

This situation assumes that x or x', not both, must be fed to H1 and H2.

I used xor as a simple suggestion, I'm sure there are other mixing techniques. Possibly maintaining some bits from each and mixing others. Maybe even simple concatenation. The idea remains the same.

The person who commented that if H1 and H2 are broken, well yeah - then you have bigger issues at stake. That's a given for any system. The idea here is to degrade gracefully if one is fully compromised.

Even with H1 being MD5 and H2 being SHA1, solving it for H1 breaks any results you may have had for H2. Breaking H2 then destroys your progress on H1. No doubt there _are_ solutions providing a collision for both, the trouble is finding it for for two instead of one.

]]>The combination of two strong functions does not necesserally makes a stronger one. This is somewhat similar to say that if you fly a long way and then drive a long way you will end up farther from the starting point, while it is probably true in some cases you end up in a "near place".

What I mean is that the inner works of one function could somehow "break" the other function.

]]>Check out the java cryptography APIs, they meet your requirements. When designing a system, though, it doesn't really help to push the cryptography specifications on to your customer. They probably know less about the problem than you do.

@Ben Liddicott

I'm not sure I understand your point. Let's say I want to sign a contract. If I use one hash function, somebody who wants to forge my signature needs to find a collision in that hash. If I sign using two functions, then they not only have to find a collision in both, they have to find a single document that produces the collision for both functions.

Or am I missing something?

]]>Rather than one function, I would prefer one well thought out interface.

Programers neither known nor care what hash is to be used after all it's in the spec they work to. What they do care about is ease of use / re-usability / maintainability as do their bosses who pick up the cost.

So I think spending time working out a good API to hang the latest Hash code onto would make everybodies life a whole lot easier. Beter still come up with a module system where the new code can be pluged in by the end user just by dropping in a new module would be a nice topping.

]]>True, but the question remains: Does a composite hash function H1(X)^H2(X) remain secure when either H1 or H2, but not both, are broken?

You mean "before these attacks get better" don't you?

"Attacks always get better; they never get worse."

-- NSA

This does not help where both are broken. For example the PDF MD5 hash collision attack demonstrated a few months ago relied on finding a point in the document which could take two different values and yield the same hash.

If you use two broken hash functions, you have only to find two points in the document, one to make the first hash collide, the other for the second hash.

-> http://www.heise-security.co.uk/news/77244

Anything to say about this? Being able to choose parts of the message when finding a collision sounds like we are way closer to a problem than with collisions alone.

The details are rather sketchy, however (reduced round variant, no statement about complexity). I wonder if you have heard more.

Does this mean you'd argue against using the Merkle-Damgard construction, given the number of attacks known on that structure (length extension, multicollisons)? I'm really hoping to see some different ideas in hash function space. Something more like Panama, where the state is much larger than the output size. Panama hash is broken, of course, but there are some nice ideas in there.

]]>But for those that do know what they are doing or know enough to get others to do it, then there are options anyway.

I'm all in favour of a single endorsed Hash function. It will make avarage systems better. There will be other finalists.

The idea solution is of course provable security. In the case of hashes we have some way to get yet.

]]>katre has it right, make up a new fundamental function. Then, give each major use/complexity a name. I don't care about the names --- Islands of Greece, lists of MVPs from minor league baseball, pet-names of NSA administrators --- but let's have some names for the different uses, so that if one use is found to be insecure, we don't have users running around screaming for no reasons.

]]>"Users"? I would think the only people making decisions on this would be people with a degree of education on the matter. Like the NIST said, they're open to accepting more than one (probably specialized) hash function if appropriate.

But I can see why the NIST might want to only officially vouch for one hash algorithm. If the public wants more choices, it can design more (not that there don't already exist such options). But I still would like to see variety, so long as it doesn't degrate the overall quality.

]]>Then brand the cryptographic variant, C, as the xor or some other product of A and B.

A weakness in one would likely not be a weakness in the other - leaving C secure until the other is also compromised.

Is there a flaw in my logic?

]]>If we have a single encryption protocol, and only the black hats know the flaw, we're all down the swanney.

]]>if 100% of encryption is SHA, then the breaking of SHA will affect *every* encryption.

]]>(From http://kerneltrap.org/node/6630)

"So one of the key design criteria was that /dev/random use its cryptographic primitives in ways that were non-brittle --- that is, even if the primary use of that cryptographic primitive were compromised (e.g., the recent collision attacks against SHA-1), the security of the system would not be compromised."

Having a few hash functions around can be nice, because for certain applications they can be combined to provide security even in the case where one of them is compromised.

]]>Does it make sense to split up these uses, name them differently, and start to come up with different sets of functions for each use? I don't have a solid enough grounding in cryptography to even know what the domains are, but it seems that having one function for everything is fundamentally harder to do securely than having separate functions for each domain.

]]>