To answer your question; no, the attack presented here does not allow BitTorrent downloads to be compromised. The attack you describe (and what would compromise the way that current BitTorrent works) is what is called a "second pre-image attack". The attack presented here is called a "collision attack".

The difference between these two attacks is that a "collision attack" enables us to find *ANY two (different) blocks* of data that have the same hash value. In case of the "second pre-image" attack we have to find *one block* that has the same hash value as another *given* block. Ie., the difference between the two attacks is that in one case we can find *any two (different) data sets* which hash to the same value. In the second case we're *given one of the data sets* (one of the blocks in a BitTorrent file) and we're asked to find a second set of data that hashes to the same as the first (given) set of data.

Let's say I have two messages that hash to the same value: "Alex is a geat guy" and "Alex owes Bruce Schneier a million dollars." I get you to sign the hash of the first, which is how digital signature algorithms work, and then take you to court because you refuse to acknowledge the second.

]]>I am new to cryptography. I know that birthday paradox means that finding two random messages that hash to a same result is surprisingly low (2^m/2), but I don't know why that is important.

So let's say someone (eve) finds two messages that hash to the same result. Why is that important, and how can she do malicious stuff knowing two messages that hash to the same value?

Thanks,

Alex

The cryptographer asks, "Is the attack feasible?" That is, could it be done with available or finite resources in available or finite time. The security person asks, "is the attack efficient?" Is the value of success greater than the cost of the attack?

In response, the cryptographer will craft an example in which the attack might be efficient and rests his case. The security person asks whether the attack is efficient in the average or general case?

Courtney's first law tell us that "Nothing useful can be said about the security of a mechanism except in the context of a specific application and environment."

]]>is the 2^63 result also protected?

Second, is this a different 2^63 result,

or is the 2^69 + 6 more bits of weakness?

The birthday paradox only applies to the day and month only; it does not apply to the complete birthday. It seems to me we're comparing the odds of an incomplete set of data (partial birthday) being duplicated versus the odds a complete set of data (full hash value) will be duplicated.

It seems like the Birthday Paradox would be more applicable if you were concerned with the odds of 2/3rds of a hash value being duplicated...

]]>]]>

Just being able to find random text is good enough, as some researchers demonstrated with a pair of postscript documents (random text+two different printed documents depending on which random text was present). I think the hash involved was MD5 but the same "theory" would work for almost any hash function.

]]>The argument about the aplicability of the birthday paradox is unfortunatly a bit of a red herring, and actually probably does not apply in the practical case.

The reason for this is the assumption that two "random" texts have slightly better than 0.5 probability after 2^(n/2) trials.

The problam with this is that from a theoretical point of view "random" texts are fine, in the real world however they need to be "meaningful" texts. No matter what coding method you use, meaningfull texts have their own statistics, that are very far from random.

I actualy have experiance of the Birthday paradox failing woefully as a party trick (this is due to the skewed birth statistics due to spring ;)

Now I am not aware of anybody having looked into this in any depth so I do not know in which way the 0.5 probability will move but I am confident that it is going to be atleast a couple of bits or more different.

I suspect that for the "brute force" method things will be worse (for the attacker). However for the "crafted" aproach using weaknesses in the algorithum I have a "gut fealing" it is going to be better.

As I said it's a gut fealing (which even on a good day are not very reliable), however in the absence of "reasond" evidence I will go with it as it errs on the side of caution.

]]>So, no, this is not humour. Not at all. In fact, one of the lessons of living in a communist country is that a lot of well-meaning people can together create a monster, given the "right" kind of ethics. The US security apparatus is a good illustration - while I do not think there's a lot of scoundrels working for the goverment, the collective behaviour of nice and helpful government employees (reflecting their belief into the moral permissibility of imposing their will on non-consenting people for "their own good") creates exactly the kind of moronic self-destructive policies we observe.

The solution is not to try to educate the government, and not to take over it by some political process - because it merely replaces one set of well-meaning people with another, with no change in the ethics whatsoever - but to expose the idea of coercive collectivist power for the moral outrage it is, and disband the collective monstrosity - and, importantly, - refrain from building another one in its place, no matter how tempting it seems to have "our monster" to protect us.

The greatest danger to the personal security of US citizens comes from their own government, this is as simple as that. Never forget that the present terrorist problems are the direct result of previous actions of the government (such as financing and arming "our bastards"), and that our own collectivist retoric is exactly what allows pissed-off arabs to plausibly lump random civilians together with the US government into the category of legitimate targets.

]]>While I think your comments were entered in a spirit of fun (or gallows humour) they can also be seen as an extreme example of "agenda" as mention in Schneier's later entry: http://www.schneier.com/blog/archives/2005/08/airline_securit_2.html

That is, it occurs to me that government security people have a somewhat different agenda than citizens around the use and dissemination of crypto.

]]>> them to terrorists. I get it.

Terrorists are performing a useful service to the government - their actions provide a justification for the governments to expand their powers and increase confiscation of wealth of citizens.

Cryptographers, on the other hand, work on making citizens more immune from government snooping and, potentially, able to hide all or some wealth from the eye of taxmen.

So it is entirely reasonable that the government would want to let terrorists in while keeping cryptographers out.

]]>BTW, you're misunderstanding this attack. This is not a pre-image attack, this is a collision attack (I think those are the correct names). With google you can find information about those two classes of attacks on hash functions.

]]>Not necessarily, since it is possible that you know some restrictions to the 20 bytes identifier which would make it unique on the context you mentioned. An example known restriction would be - all the bytes are printable ASCII characters.

]]>Google and wikipedia are your friends:

http://en.wikipedia.org/wiki/Computational_complexity_theory

http://en.wikipedia.org/wiki/Hash_function

Also, in this case, we are concerned with the time complexity of an algorithm breaking the hash, not the execution performance of the actual digest function.

And what, for that matter, is a hash function?

Cheers, DF.

]]>"any weaknesses found in one are unlikely to exist in the other"

This is a very bad assumption, as both digest algorithms are based on MD4. There may be structural weaknesses that could be exploited.

Sorry, I meant "all Y of a certain length".

And I still do not follow the reasoning of the complexity of SHA1 mixed with MD5 being the product of the single complexities.

It should be possible to filter the search space for one hash by finding (partial) matches for the other first, i.e. if there is an efficient technique to look for messages Y which (partially) match the SHA1 of X, then you certainly do not have to start all over again for MD5.

]]>Now my question is: There is very little new under the sun and it is unlikely that Chris W. and I are the first two people to think of it. Does prior art exist in this area?

Put another way, if one wanted to find two algorithms that were sufficiently different to make the odds of the above described collision no greater than one with key length equal to the sum of each, which would you choose? Would the calculation of each of those algorithms be faster than the calculation of a similar hash function of the longer key length?

I'm sorry for the hand-waving, I'm sure someone would be able to formalise such a concept much better than I...

]]>There are [most likely] an infinite number of Y's such that SHA1(X) = SHA1(Y), so it wouldn't be possible to find all Y's for a given X. But your point is still valid -- if you have a way to find Y's which is better than 2^80 then you should be able to find a Y such that X & Y have identical SHA1 and MD5 hashes in better than 2^144. If this new attack is 2^64, then finding SHA1 and MD5 should be 2^128.

]]>]]>

Google for a25f7f0b29ee0b3968c860738533a4b9 OR a25f7f0b, an example of how to exploit a hash collision.

To avoid this exploit, the signer needs to make an unpredictable modification to the document prior to signing.

Alice and Bob use hash function H and signature function F to validate documents. Alice signs a message M by finding signature S such that F(S) = H(M). Bob accepts (M,S) as signed by Alice. Eve can calculate H(M) and F(S) but cannot find S given M.

Eve creates documents M1 and M2 which have the same hash, and asks Alice to sign M1. If Alice uses the expected hash, Eve can substitute M2 and Bob will accept (M2,S) as signed by Alice.

So Alice and Bob need to agree to a safer method. When Eve presents M1 for signature, Alice generates a new random key K, encrypts M1 with K and hashes, giving H(K(M1)). She then signs this by finding S such that F(S) = (K,H(K(M1))), and gives (S,K) to Eve.

Eve sends (M2,S,K) to Bob. Bob checks the signature by calculating F(S) and comparing it to (K,H(K(M2))).

Eve is foiled, because H(K(M1)) != H(K(M2)), even though H(M1) = H(M2).

Thus, immediately, the protocols for signing software distributions should be adjusted so that the signing authority Alice generates random key K and signs H(K(M)) instead of just H(M). The automatic software update agent Bob checks the signature (S,K) by calculating F(S) and (K,H(K(M))).

This will prevent substitution of a useful M2 for M1 by Eve, who is allowed to present an arbitrary M1 for testing and signature. Eve cannot predict K, and hence even if she can find M2 after the fact, such that H(K(M1)) = H(K(M2)), she is not allowed to change M1 after K is chosen. Hence M2 will not be useful.

Bob can safely install M1 automatically, confident that if M1 passes a simple syntax check (which M2 cannot possibly satisfy), it is the same program which Alice accepted for signature, even if hash collisions can be found.

Can you identify a 2^64 work effort task that's been completed? Remember, MD5 is thrashable without the Wang result if 2^64 is doable.

What's the mechanism for making a birthday paradox attack not require as much storage as it requires computation? Creating a 2^m store, with m<n and 2^m feasible, and recomputing hashes that fall within that range?

--Dan

Sorry; I just got back from Crypto and I'm tired.

]]>The birthday paradox comes from the scenario that two randomly chosen people in a classroom of 30 students will have the same birthday (excluding year) is almost certain.

Bruce, the way you're stating the birthday paradox is that if you choose the first student in the class, the odds that one of the remaining 29 will have the same birthday as the first are almost certain - which is false.

I don't believe you understand the birthday attack (and paradox) correctly. If one input (X) produces a hash H(X), the odds that any next input (Y) will also produce H(X) is equal to 1 in 2^n, where n is the hash length.

JSnow and Bruno have it right: The birthday paradox/attack says that you need 2^(n/2) inputs to find a case where the two hashes are equal.

So, in the case of MD5 (n=128 bits). The odds that any two inputs will have the same hash = 1 in 2^128. However, the odds that 2^64 inputs will result in at least one duplicate hash are almost certain.

]]>substr(XOR(mystring),32)

That would give you a 32 byte hash, right!? Isn't that secure enough!

Isn't 2^(n/2) the approximate number of texts

where the odds of any two in the set having

matching hashes is1/2?

This isn't the same thing as the odds of two

random texts having the same hash.]]>

No, you don't understand the birthday paradox correctly. The odds of one text having any particular hash is 2^n. The odds of two texts having the same hash is 2^(n/2), where n is the block length.

]]>Would it be silly of me to use either the hash127 or UMAC schemes for authenticating cookies passed back and forth between server and browser? Are these considered 'safe'?]]>

On the other hand (if I understand the birthday paradox correctly), if you were to chose 2^64 arbitrary texts you would have on average one md5sum collision somewhere amongst those 2^64 texts, and if you chose 2^80 arbitrary texts, you would have on average one sha1 collision (which is what I think you were trying to say, but maybe I'm confused).

]]>2^144 sounds impressive. But if the attack is enhanced to find _all_ Y for a given X such that SHA1(X)=SHA1(Y), and an equivalent attack is found for MD5, then 2^144 is no longer safe (because the effective search space is much, much smaller).

]]>It was just a little joke. What krypto did was get the sha1sum for the letters X and Y (as opposed to two "items" X and Y...numbers, strings, etc.). Since they're different values, SHA-1(X) != SHA-1(Y).

]]>The universe will either collapse on itself or the galaxies will drift apart into nothingness before you find such a pair of texts.

]]>]]>

Well, I certainly don't feel less stupid today. What are you (krypto) trying to demonstrate by the commands you posted? This one is over my head...

]]>