Skein and SHA-3 News
There are two bugs in the Skein code. They are subtle and esoteric, but they’re there. We have revised both the reference and optimized code — and provided new test vectors — on the Skein website. A revision of the paper — Version 1.1 — has new IVs, new test vectors, and also fixes a few typos in the paper.
Errata: Version 1.1 of the paper, reference, and optimized code corrects an error in which the length of the configuration string was passed in as the size of the internal block (256 bits for Skein-256, 512 for Skein-512, and 1024 for Skein-1024), instead of a constant 256 bits for all three sizes. This error has no cryptographic significance, but affected the test vectors and the initialization values. The revised code also fixes a bug in the MAC mode key processing. This bug does not affect the NIST submission in any way.
NIST has received 64 submissions. (This article interviews one of the submitters, who is fifteen.) Of those, 28 are public and six have been broken. NIST is going through the submissions right now, making sure they are complete and proper. Their goal is to publish the accepted submissions by the end of the month, in advance of the Third Cryptographic Hash Workshop to be held in Belgium right after FSE in February. They expect to quickly make a first cut of algorithms — hopefully to about a dozen — and then give the community about a year of cryptanalysis before making a second cut in 2010.
Lastly, this is a really nice article on Skein.
These submissions make some accommodation to the Core 2 processor. They operate in “little-endian” mode (a quirk of the Intel-like processors that reads some bytes in reverse order). They also allow a large file to be broken into chunks to split the work across multiple processors.
However, virtually all of the contest submissions share the performance problem mentioned above. The logic they use won’t optimally fit within the constraints of a Intel Core 2 processor. Most will perform as bad or worse than the existing SHA-1 algorithm.
One exception to this is Skein, created by several well-known cryptographers and noted pundit Bruce Schneier. It was designed specifically to exploit all three of the Core 2 execution units and to run at a full 64-bits. This gives it roughly four to 10 times the logic density of competing submissions.
This is what I meant by the Matrix quote above. They didn’t bend the spoon; they bent the crypto algorithm. They moved the logic operations around in a way that wouldn’t weaken the crypto, but would strengthen its speed on the Intel Core 2.
In their paper (PDF), the authors of Skein express surprise that a custom silicon ASIC implementation is not any faster than the software implementation. They shouldn’t be surprised. Every time you can redefine a problem to run optimally in software, you will reach the same speeds you get with optimized ASIC hardware. The reason software has a reputation of being slow is because people don’t redefine the original problem.
That’s exactly what we were trying to do.
EDITED TO ADD (11/20): I wrote an essay for Wired.com on the process.