“If you have 367 people there WILL BE 2 people with the same birthday.”

You forgot the magic words “AT LEAST” ðŸ˜‰

As a very rough aproximation I would expect at least 18 people to share a birthday with at least one other possibly more.

@ James,

For big numbers in binary halving the number of bits is as close as (usually) makes no difference so for a 512bit hash you would expect a collision with around 2^256 randomly selected messages.

However that says nothing about how the collisions are spread out. For instance if you had 100 people and the only 10 with unshared birthdays all had 29th Feb as their birthday you would feel strongly that something was most definatly odd.

Collision resistance is ONLY ONE OF MANY many things that a hash function has to be good at (saying a hash has good collision resistance, is a bit like saying a chess grand master should be breathing ðŸ˜‰

]]>You can’t have a hash function without collisions.

More accurately, you can’t create a function with an arbitrary input and a fixed-length output without collisions. The reason is straightforward. The number of possible results from the hash is limited by the number of bytes that the hash returns. If you have more than that number of inputs there will be a collision. If you have 367 people there WILL BE 2 people with the same birthday.

JimFive

]]>Scanning the web and following your posts one could easily state – ‘it depends’, but when the rubber meets the road I feel the SHA is the preference over WHIRLPOOL.

Thoughts? Guidance?

]]>Whirlpool is doing a great job i think.

]]>On the Impossibility of Efficiently Combining Collision Resistant Hash Functions

by Dan Boneh and Xavier Boyen

http://ai.stanford.edu/~xb/crypto06b/index.html

Abstract

Let H1,H2 be two hash functions. We wish to construct a new hash function that is collision resistant if at least one of H1 or H2 is collision resistant. Concatenating the output of H1 and H2 clearly works, but at the cost of doubling the hash output size. We ask whether a more clever construction can satisfy a similar security property with a shorter output size. We answer this question in the negative — we show that there is no generic construction that securely combines arbitrary collision resistant hash functions, and whose output is shorter than simply concatenating the given functions.

]]>Say H1 is fully compromised and I may create any result I wish with a modified x: x’. Any change to x required by H1 will produce and different and un-predictable H2 result, I would need to find a solution such that H1(x’) ^ H2(x’) would produce the original result. However H1(x’) ^ H2(x) would be a cinch to break, that’s a given.

This situation assumes that x or x’, not both, must be fed to H1 and H2.

I used xor as a simple suggestion, I’m sure there are other mixing techniques. Possibly maintaining some bits from each and mixing others. Maybe even simple concatenation. The idea remains the same.

The person who commented that if H1 and H2 are broken, well yeah – then you have bigger issues at stake. That’s a given for any system. The idea here is to degrade gracefully if one is fully compromised.

Even with H1 being MD5 and H2 being SHA1, solving it for H1 breaks any results you may have had for H2. Breaking H2 then destroys your progress on H1. No doubt there *are* solutions providing a collision for both, the trouble is finding it for for two instead of one.

The combination of two strong functions does not necesserally makes a stronger one. This is somewhat similar to say that if you fly a long way and then drive a long way you will end up farther from the starting point, while it is probably true in some cases you end up in a “near place”.

What I mean is that the inner works of one function could somehow “break” the other function.

]]>Check out the java cryptography APIs, they meet your requirements. When designing a system, though, it doesn’t really help to push the cryptography specifications on to your customer. They probably know less about the problem than you do.

@Ben Liddicott

I’m not sure I understand your point. Let’s say I want to sign a contract. If I use one hash function, somebody who wants to forge my signature needs to find a collision in that hash. If I sign using two functions, then they not only have to find a collision in both, they have to find a single document that produces the collision for both functions.

Or am I missing something?

]]>Rather than one function, I would prefer one well thought out interface.

Programers neither known nor care what hash is to be used after all it’s in the spec they work to. What they do care about is ease of use / re-usability / maintainability as do their bosses who pick up the cost.

So I think spending time working out a good API to hang the latest Hash code onto would make everybodies life a whole lot easier. Beter still come up with a module system where the new code can be pluged in by the end user just by dropping in a new module would be a nice topping.

]]>