Bruce Schneier | |||||||||||
Schneier on SecurityA blog covering security and security technology. « Schneier for TSA Administrator | Main | RIAA Lawsuits May Be Unconstitutional » November 19, 2008Skein and SHA-3 NewsThere are two bugs in the Skein code. They are subtle and esoteric, but they're there. We have revised both the reference and optimized code -- and provided new test vectors -- on the Skein website. A revision of the paper -- Version 1.1 -- has new IVs, new test vectors, and also fixes a few typos in the paper. Errata: Version 1.1 of the paper, reference, and optimized code corrects an error in which the length of the configuration string was passed in as the size of the internal block (256 bits for Skein-256, 512 for Skein-512, and 1024 for Skein-1024), instead of a constant 256 bits for all three sizes. This error has no cryptographic significance, but affected the test vectors and the initialization values. The revised code also fixes a bug in the MAC mode key processing. This bug does not affect the NIST submission in any way. NIST has received 64 submissions. (This article interviews one of the submitters, who is fifteen.) Of those, 28 are public and six have been broken. NIST is going through the submissions right now, making sure they are complete and proper. Their goal is to publish the accepted submissions by the end of the month, in advance of the Third Cryptographic Hash Workshop to be held in Belgium right after FSE in February. They expect to quickly make a first cut of algorithms -- hopefully to about a dozen -- and then give the community about a year of cryptanalysis before making a second cut in 2010. Lastly, this is a really nice article on Skein. These submissions make some accommodation to the Core 2 processor. They operate in "little-endian" mode (a quirk of the Intel-like processors that reads some bytes in reverse order). They also allow a large file to be broken into chunks to split the work across multiple processors. That's exactly what we were trying to do. EDITED TO ADD (11/20): I wrote an essay for Wired.com on the process. Posted on November 19, 2008 at 6:14 AM • 40 Comments To receive these entries once a month by e-mail, sign up for the Crypto-Gram Newsletter. moo • November 19, 2008 7:01 AM I find it bizarre that they describe "little endian" mode as "a quirk of Intel processors" that stores bytes in the "reverse order". Little or big endian is an arbitrary choice by the CPU designer--it's not as if big endian is the "proper" way of doing it, or something! x86 is the most widely-popular little endian design, but by no means the only one. I've always found it to be more natural than big endian... and also, this "quirk of Intel processors" is not a new thing in Core 2's, it has been that way since the 8080 and in the desktop world, the vast majority of machines have always been little endian. To me, the surprise is that so many crypto algorithms are specified in a way that naturally favors big endian hardware instead of little endian! I wonder what the historical reason is for this. moo • November 19, 2008 7:05 AM @noname: The submitters are able to reveal their submissions themselves, if they want to. There is little point in NIST showing submissions that they are going to reject out of hand before consideration even starts (especially if they are embarassingly amateurish in addition to being flawed. No need to embarrass anybody.) (If you were wondering why they would reject submissions out of hand, well... they would only jettison submissions that obviously don't function as advertised, or are a copy-and-paste of some other well-known hash function, or something like that. So that the crypto community doesn't waste any time looking at them, and can spend that time trying to analyse and break the rest of the submissions). moo • November 19, 2008 7:09 AM A follow-up about little endian... I guess which one seems more "natural" depends on how you number the bits in a word. Big-endian people are used to numbering from MSB=0 to LSB=n-1, while little-endian people use the opposite numbering (LSB=0 to MSB=n-1). So both camps are used to bits 0..7 of a word being the "first byte" of the word (i.e. at the lowest memory address) on the kind of machine they are used to. When you switch over to the other kind of machine, either you adjust your thinking to use the other bit numbering scheme, or you end up prejudiced against it like the author of that article. not me really • November 19, 2008 7:09 AM It's amusing that skein's designers are called well-known cryptographers, except for Bruce, who is called a noted pundit. Nicholas Weaver • November 19, 2008 7:15 AM One question, however. Does optimizing for little endian end up sacrificing performance on some embedded processors and/or big-endian CPUs? trevor • November 19, 2008 8:26 AM > Little or big endian is an arbitrary choice by the CPU designer--it's not as if big endian is the "proper" way of doing it, or something! The only reason I can think to consider big endian more "normal" or common is that network byte order is big endian so anything transfered over an IP network needs to be reordered on a little endian machine. rubbernecker • November 19, 2008 9:23 AM You can find the 15-year-old kid's submission here: http://ehash.iaik.tugraz.at/wiki/Ponic I know it's cheap to rip on a 15-year-old kid, and he's on the bright side of ordinary for his age, but I don't think NIST or other submitters are going to be wasting time "tearing apart" his submission as he hopes. I agree with moo that if this is representative of many of the 60 submissions then it's probably a good thing that NIST isn't publishing them until they've been weeded. It's also amusing, if not flattering to young Peter, that the NIST manager says that some of the submissions look like they were written by a *13*-year-old kid. David • November 19, 2008 9:24 AM Big endian is more normal, since it is closer to how we read numbers in real life. If we see "12 34 56 78" in a hex dump, it's more natural to read it as 0x12345678 than 0x87654321. Therefore, anybody who doesn't routinely work with little-endian hex dumps or fancy assembly language routines will favor big endian. It's like starting counting with 0 rather than 1, an unnatural habit that comes about through long exposure to computing quirks. bitmonger • November 19, 2008 9:24 AM
Not such a suprising a statement, actually. A little perhaps more inflammatory than it needs to be. It is only a slightly less touchy subject than say which text editor is better :), but I think that is really more a sign of an attempt to make the description accessible. Network endian is Big Endian, and generally that is the de-facto choice of almost all standards that grew up in respectable standards bodies, because its portable and the choice other people made. Every sane platform has a to-network-endian (big) and a to-native/host -endian function. So every programmer can easily make a big endian number on easily on any sane platform without using non standard APIs. Now, some protocols use a magic number to be endian agnostic. This is a decent idea for many things like filesystems. Detect the endianness and transparently do endian conversion on the fly. This allows better performance for the poor oppressed little endian masses. :) Little endian and big endian are just as good from a numerical perspective of course. However, I will not say the same about VAX endian. :) Ten • November 19, 2008 9:53 AM @David Well, at least for people who read and write from left to right. bitmonger • November 19, 2008 10:03 AM @david While what you say is true on some level. The left to right placement of bits from (0..n) _could_ be argued to be a choice as well. Arabic numerals do put the most significant place leftmost. So I suppose that makes them at least Left-to-right big-endian :-P. (Suddenly I wonder about languages that read right to left with left-to-right big-endian numbers embedded.) I could see arguing zero padding / casting aliasing of larger numbers on little endian So, like reading two bytes and four byte ... big endian So, I am _not_ arguing LE is better than BE or vis versa. I don't think it matters at all. Finite arithmetic in base 2 is not like a normal number we right down, IMO. If anything paper is mostly read like a serial protocol within a page and not really anything like endian ness in a CPU or in memory. So in short, my brain isn't little or big endian :)
Carlo Graziani • November 19, 2008 10:03 AM I would guess that the zeroth cut is on submitters who didn't satisfy all the submission requirements, including the boring boilerplate, and the exact section-by-section requirements of the algorithm specification & supporting documentation, and so on. Anyone who sent something in that doesn't conform to all the published requirements will have their proposal returned as "non-responsive" to the call for proposals. It won't even be seen by an actual cryptologer. Interestingly, I see that in the "Reference Platform and Compiler" section of the requirements, the following text appears: "Reference Platform: Wintel personal computer, with an Intel Core 2 Duo Processor, 2.4GHz clock speed, 2GB RAM, running Windows Vista Ultimate 32-bit (x86) and 64-bit (x64) Edition." "Wintel"? Someone has a sense of humor. That's not an industry term, but rather a bit of derisive jargon that tends to leave Intel engineers spluttering about "typos". kiwano • November 19, 2008 10:37 AM @bitmonger Note however that Arabic numberals, being Arabic, were originally developed in a R-L setting and are therefore historically LE. pb • November 19, 2008 10:51 AM I don't understand if that's what bitmonger said, but little-endian has the nice property that given a 64-bit pointer p, then (uint32_t)*p == *(uint32_t *)p. (Well, it's not valid C but you get the idea). That's why porting big-endian to little-endian is less likely to introduce bugs (except for program who do not use ntohl/ntohs properly) than porting little-endian to big-endian. 87943 • November 19, 2008 10:56 AM @kiwano Arabic numerals actually originate from south Asia where the script is from left to right. So historically they are what, big-endian*? * Yes, I notice the pun and I'm trying to avoid it. Gah! fgrosshans • November 19, 2008 11:06 AM @kiwano Interestingly, German is little-endian for numbers smaller than 100 (98 is said 8-und-90), a construction which has survived in English for the "teens" (nineteen=9+10, and my English is bad enough to let me sometimes confuse neineteen and ninety). Fred, who feels natural to say 4.20.10.8 to pronounce 98 Secure • November 19, 2008 11:24 AM pb, if you have to care about endianness in C then you're doing something fundamentally wrong. Most operations are working on the value level with a well defined binary representation, thus when you avoid to work on the memory storage then you get full portability. E.g. instead of sending an int over the network with: ui32 = htonl(ui32); you can do: unsigned char a[4]; which is a bit more code (but you put this in a function, anyway, instead of sprinkling portability-sensitive code everywhere, right?) and a bit less efficient (which doesn't matter for slow network communication, first prove the bottleneck by measurement before optimizing it), but it has well defined behaviour and will work on any platform with a standard-conforming compiler regardless of memory storage and padding. bitmonger • November 19, 2008 12:17 PM @pb, yes your very clear explanation is what I was getting at... :) My point was that the difference is mostly immaterial. Neither one is really more natural, IMO. Why do physicists draw right handed 3-d graphs and mathematicians draw left handed 3-d graphs? y z Because it makes no difference _except_ some people want things done their way or at least want a reason better than: "'cause we do", but sometimes: "'cause we do", is the only real argument. :P Cheers! bitmonger • November 19, 2008 12:24 PM Perhaps its is only because I don't like soft boiled eggs that I can be so indifferent. Randall • November 19, 2008 12:35 PM I feel much better about not making my submission now that I know there were a lot of amateur submissions. On the bright side, I get to feel clever for seemingly stumbling in a direction that smart folks thought was promising -- I was building on Daniel Bernstein's Salsa20, so, like Skein, I was working with a minimalist (and parallelizable) round function with only additions, XORs, and rotations, but fast diffusion, lots of rounds, and a large block size (in my case, 64 32-bit words), and I was building a combined stream/hash module from it. And, without actually submitting it, I don't have to deal with the pesky reality that, if it wasn't ripped up summarily, the best *possible* outcome would have been that I'd design the awkward little brother of Skein -- less flexible in its mode of operation (no tree mode, only one block size, etc.), maybe slower (I was banking on processing four 32-bit words at a time using SSE2 instructions, but notably, it takes four instructions to rotate in SSE2), and probably less secure (certainly with much less analysis in the submission paper) -- and I'd mostly owe whatever redeeming graces it had to Salsa20's design, not to original innovations of mine. Endo • November 19, 2008 12:40 PM Here's more background on endianness of bytes: Most architectures are bitwise little endian even if they're bytewise big endian (IBM being an obvious exception). Bitwise little endian makes sense since bit N is 2^N. Also, bit 0 is always bit 0, unlike IBM where bit 0 in 32 bit mode is bit 32 in 64-bit mode and the LSB is either bit 31 or 63 depending on mode. Bryan Feir • November 19, 2008 1:11 PM @fgrosshans: French, I take it? 98 is quatre-vignt-dix-huit (four-twenty-ten-eight) in French, and that's the only language I know of that uses that construction. As a native English speaker who took both French and German in high school, and is doing some Japanese now, the spoken forms of numbers seem to be among the least constant of constructions across languages. Back a bit more on topic, the last paragraph of the second quote reminds me of DES, where a level of bit-rearrangement was added to the spec for no reason other than to make software implementations slower than hardware implementations. It added nothing to the cryptographic strength of the algorithm, but bit rearrangement is O(0) in hardware and O(n) in software... jpa • November 19, 2008 2:21 PM @Randall: you submitted a hash function based on Salsa20? Note that BLAKE is based on Salsa20's variant ChaCha: radek • November 19, 2008 2:38 PM The Perl bindings are available on CPAN: http://search.cpan.org/dist/Digest-Skein I've included two patches; one to fix building on Win32, another for Dragonfly (I don't know if the Dragonfly one works -- CPAN testers haven't tested it at the time of writing). More information and links to the patches: http://radek.cc/2008/11/16/digest-skein/ Note: the code in NIST/CD/KAT_MCT/src has not been updated in Skein_NIST_CD_111008.zip; neither have been the *MsgKat*.txt files. Carey • November 19, 2008 3:12 PM @Bryan: “Four score and seven years ago...” It's not really modern English, though. Randall • November 19, 2008 6:00 PM @jpa: I didn't submit my design, thankfully -- most likely it would have ended up on the scrap heap of badly broken amateur submissions (I'm not a cryptographer or mathematician), and even if it didn't I'd merely have achieved a less-flexible, less-efficient, less-well-analyzed cousin of BLAKE or Skein. The design, such as it is, is here: http://twotwotwo.googlepages.com/aperture . It's really only loosely derived from Salsa20; check it out if you're curious. It's probably badly breakable. Excellent work designing and analyzing BLAKE and producing C, VHDL, *and* PIC assembler with a four-person team! No promises, but I might be able to come up with an SSE2 implementation of BLAKE-64 (and/or Skein or other functions). It seems like that could especially help with 32-bit-mode performance, because SSE can do 64-bit operations (often on two words at once) regardless of what mode the processor is in. RH • November 19, 2008 6:10 PM @Endo and the Endians: Your point is right... its the reason I entertain Big Endian as the "right" order. Big Endian is consistent, while little endian always has a discontinuity around the byte level. To demosnstrate, lets count up in hex, starting with bit 0 Not as clear, is it? On a similar note, I found that when you used mutlicharacter literals as values for integers, C/C++ seems to assume big endian: That prints 'orec' when I tried it. Muffin • November 19, 2008 6:16 PM Bruce - there's a typo on page 39 (section 8.1, under "variable internal state"): "Skein-256, the low-end version, which we consider more that adequate [...]" That should be "more than adequate". (I wish I could offer more meaningful feedback then that, of course, but while I find it quite interesting to read about cryptography, I don't actually know anything about it myself, so I can't. Alas, poor Yorick^WMuffin.) Ping-Che Chen • November 19, 2008 9:59 PM I thought little endian is more "natural" hardware-wise because lower memory address means lesser significant bytes, just like lower "bits" means lesser significant bits. The problem is, when the memory content is displayed for people to see, it's displayed in "wrong order" so people tend to think big endian is more natural. For example, a 32 bits number in little endian, if inside the range of a 16 bits number, can be read as a 16 bit number at the same address. It's not possible if it's in big endian. Clive Robinson • November 19, 2008 11:49 PM @ Ping-Che Chen, "For example, a 32 bits number in little endian, if inside the range of a 16 bits number, can be read as a 16 bit number at the same address. It's not possible if it's in big endian." You are of course assuming that you are dealing with the limited case of unsigned ints... There are three number format things to consider, 1, The location of the sign bit. That is Unsigned Ints are of the format, iiiiiiii. Signed Ints, siiiiiii. Floats, s.fffffff Fixed point (as used in DSPs), sii.fffff All of which makes BE-v-LE an open argument when number representation is taken into account. On top of this there is "byte alignment" and if negative numbers are "ones or twos" compliment making things harder. Then there is the actual data width, not all CPUs are 8/16/32/64 bits some are 12 or other bits wide and some CPUs have instructions for BCD or other data specific handeling (ie to deal with AD/DA convertors etc)... All in all, it is one of the reasons that the C spec makes no gaurenties on number conversions except in very specific cases. christopher • November 20, 2008 8:39 AM @Randall: you'll regret not submitting it, when a 15 year-old had the stones to do so. It would have cost you nothing to see if your design would have survived. @rubbernecker: the kid might be on the bright side of ordinary but he submitted something. what did you do? diane • November 20, 2008 4:38 PM It is funny. When people debate about endianness, there is always a mention of "left" and "right". Apparently, the way we write should influence the way data are organized in a computer. peri • November 20, 2008 10:32 PM @diane
http://en.wikipedia.org/wiki/Noise
http://en.wikipedia.org/wiki/data
Randall • November 21, 2008 1:05 AM @christopher: Not true that "it would have cost me nothing" to submit, because near the end of October I was deciding between doing loads of programming to support a US presidential campaign (my job) and doing loads of paper-writing and coding to produce a submission CD. (Alas, an idea and a complete submission are two different things.) I narrowly decided to do my job. Candidly, I'll give you that if I'd started putting together materials earlier, or just skimped on my job, it probably would have been great fun to join the bake-off. It'd make a much better story for comment threads, anyway. :) But I also believe what I said above: even if sound, my idea wouldn't've contributed much to finding SHA-3; other submitters covered the ground I was aiming to cover, and they did some things better. The number and content of the submissions helped convince me of this. Instead of relitigating all that, though, maybe I can do something productive now, like contribute fast implementations of some of these algorithms. And work on that day job. Pat Cahalan • November 21, 2008 9:34 AM Somebody already mentioned this, but I'm late on catching up and it bears repeating: >One exception to this is Skein, created by several Oh, did I laugh at this. Bruce, does this mean that you've graduated from being a mere cryptographer and arrived into the big world of punditry, or you've jumped the shark? Or was Graham was just giving Bruce a tweak in response to some Security Guru in joke? Is there a "bar at a crypto conference" backstory hidden in here? MCD • November 24, 2008 2:21 AM I was also wondering why Skein performs so much more better than the other SHA-3 candidates. I see mainly 2 reasons for it: I want to focus on the later one. Even though s-box lookups should be dismissed purely due to the fact that they allow timing-based attacks, there are still a plenty of uses outside of cryptography. In fact, they are so useful, that I think there should be a special processor extension that allows it to perform multiple, but rather small-tabled and simple constant-timed(like real time constant, required for the security of cryptography) lookups in parallel. Like a 4|8 x 8bit -> 16|32bit lookup. The required tables should be stored in a superfast RAM (like SRAM), comparable to onchip L1-cache memory, that unlike L1-caches must be visible and accessible. (marginal note: should all of the paralleled lookups work on the same table or should each basic lookup have its own table?) The biggest engineering challenge implementing this in current x86-like processors would be to allow complex instructions to access some kind of on-chip local storage besides regular registers, which has never been seen on the x86 interface yet. corollar: fortunately, this kind of access wouldn't require much of a change to a Cell SPE-like design, where one has direct access to the local storage per se, it would only require some new instructions to make it work a little faster by eleminating some(actually quiet a lot) householding shift-and-mask instructions. regards Thomas Mueller • December 5, 2008 2:08 AM I found an inconsistency in the test vectors of the new Skein download. The file ShortMsgKAT_512.txt still contains the old (1.0) results while the files skein_golden_kat.txt and skein_golden_kat_short.txt contain the new (1.1) results. Example: the digest for len = 0 in ShortMsgKAT_512.txt starts with 5F, while the digest in the file skein_golden_kat.txt it starts with D3 (:Skein-512: 512-bit hash, msgLen = 0 bits, data = 'zero'). Could you have a look at that please? I am working together with Maarten Bodewes on Java versions of Skein. I wrote a simple (one class) and fast implementation of Skein 512-512, while he wrote a complete and modular implementation. Thomas Mueller • December 8, 2008 4:58 PM FYI: Doug Whiting fixed the inconsistency in the test vectors. There are now two Java and one Javascript implementation of Skein available now on my web site. One Java implementation is optimized for speed, the other optimized for size (just 59 lines). Javascript was tricky, because it doesn't support 64 bit operations directly.
Post a comment
Powered by Movable Type. Photo at top by Geoffrey Stone.
Schneier.com is a personal website. Opinions expressed are not necessarily those of BT. |
|
Comments