Log10(x*1.2xor0xaa/7.8)

]]>You can tell what hash function was used by the output, they are not random distribution.

512 bit truncated to 256 on the output worked for one class of attack, I don’t think looking at entropy is the right àngle. When a byte is 0x00-0xff 256 BYTE hash needs to be used ]]>

Clive has raised a question about what happens to entropy in a hash function’s chaining process.

I’m confident based on my understanding and my experiments, that as long as Σ *H*_{i} (where *H*_{i} is the entropy of the *i*th block) over the first *n* blocks is significantly less than the width of the hash function’s internal state variable, that the retained entropy will be less than that sum by a negligible amount.

However, I haven’t found an argument or experiment to adequately account for the case of entropy approaching the internal state size.

I’ve offered an indirect proof that chaining loss is negligible, summarized thus:

(a) hash collisions cause entropy loss; in the absence of collisions, no input entropy is lost

(b) a fundamental security requirement for crypto hashes is that their distribution of outputs be indistinguishable from a random distribution

(c) because of (b), standardized crypto hashes are thoroughly vetted, by analysis and statistical testing, to as to whether any method can be found to distinguish their distributions from random

(d) as a consequence of (c), standardized crypto hashes have distributions sufficiently close to a random distribution, that their collision frequencies accurately follow the random model

(e) the requirement of approximation to random distribution is not dependent on input length: it inherently includes the effect of chaining

(f) from (d) and (e), hash collision frequencies accurately follow the random model even when the effects of block chaining are taken into account

(g) because entropy loss in hashing is due to collisions (a), and hash collisions follow the random distribution model (f), input entropy lost in hashing accurately conforms to the values of ~1 bit I have presented above

I could be mistaken on one or more of these points! I welcome *factual* corrections.

========================

As I wrote above, I don’t know how to make a more direct proof or demonstration.

For those concerned about Clive’s question — which I am taking seriously — there is a simple remedy: use a hash with an internal state significantly larger than its output.

There are two simple ways to do so.

[1] Use an SHA3 hash, which has internal state much wider than its output. (Note well that the SHA3 family of Keccak hashes uses the “sponge” construction, very different from earlier typical hash functions; and that the extra state is provided as a safeguard against preimage attacks, *not* to ensure that the output distribution is statistically random.)

[2] Use any old hash function wider than the number of bits you want, and truncate its output. This may seem counterintuitive, or even impossible! If I throw away half the bits from SHA512, aren’t I throwing away my entropy too?

Nonetheless, I’ve demonstrated that it works: I put 32 bits of entropy into a 160-bit (or even 64-bit) hash, and the *truncated* output has ~31 bits of entropy in precise accord with the math for randomized distributions.

I have resolutely resisted analyzing the mechanics of hash functions for two excellent reasons.

[1] For my taste, the matter is dull, tedious, and boring. I’m deeply grateful to the accomplished cryptographers — including Bruce Schneier and his colleagues — who have done the prodigious labor to figure this out. They did the hard work, so I don’t have to!

[2] Such analysis is **simply unnecessary** to determine the entropy of an *N*-bit hash for an input with *N* bits of entropy:

a) If the distribution of hash function outputs is a good approximation to a random distribution, then the probability that any particular one of the 2^*N* possible hash outputs will occur exactly *j* times is accurately give by *p* = 1 / (*e* *j*!), except for very small *N*.

b) From that probability formula, it is straightforward to compute that the hash of an input with *N* bits of entropy will have *N* – 0.83 bits of Shannon entropy, *N* – 1.34 bits of median guessing entropy, and *N* – 1.07 bits of mean guessing entropy. My experiments above show these predictions to obtain to .01 bit of entropy for *N* > 11.

c) If the output entropy is less (or greater) than these figures, then the hash function fails to meet the random distribution standard.

Really, it’s not more complicated than that!

I don’t worry about chaining, because it’s obviously not sufficient that the round function conform to a random distribution. If the chaining process does not also preserve the near-random distributional characteristic, then the hash function would fail to meet the requirement for multi-block inputs.

*We don’t need to know how the boffins satisfied the random-like distribution requirement, in order to know the output entropy.*

Either the hash function is a good approximation to a random distribution, or it ain’t.

]]>I don’t know which writing of Mr Dodgson you had in mind, but my favorite exposition of meta-meta-meta is the White Knight’s song with four titles from “Through the Looking Glass.”

The song *is* “A-SITTING ON A GATE”;

the song is *called* “WAYS AND MEANS”;

the *name* of the song is “THE AGED AGED MAN”; but

the *name* of the song is *called* “HADDOCKS’ EYES.”

When I was still a squirt, and had never yet even seen a computer, I was puzzling over a manual for IBM 360 assembly language, and feeling mightily confused about semantics: when did a notation mean the *identity* of a register, and when did it refer to the register’s *contents*?

So when I later absorbed the Alice dialogue about song nomenclature, the whimsical explication of a truly deep problem held much resonance for me.

========================

While still in my teens, I read in a book of science fiction a note by a famous author (I want to say Canadian A. E. van Vogt, but don’t trust my recall) that he had participated in work on a never-completed encyclopedia, which contained the entries:

**Carroll, Lewis** — *see* Dodgson, Charles Lutwidge

**Dodgson, Charles Lutwidge** — *see* Carroll, Lewis

Having an exceedingly literal mind, I took it to mean that the SF author was humorously pointing to a mistake the (would-be) encyclopedians had made.

Much later it came to me that they had paid tribute to the noted logician, in a manner I’m sure he would have relished.

]]>On a different note,

Shannon entropy measures the distribution of variations among a set of alternatives.

Depending on your usage of “distribution” makes that statment true or false.

Shannon entropy is not about “objects” or “data” but “relations” between objects or data. That is the normallised ratio of occurance usuall by the data value. That is probabilities in a given set of values. It’s meta-meta-data not data.

Where “data” is technically an object’s “held value” say 73.

Without meta-data such as ft, lb, m/s, the “held value” is useless.

The object has one or more “identifiers” by which the object becomes unique.

Without meta-data such as the objects location address say 0xFF38 the “object” is useless.

Which means the “held value” data is not accessable for use.

As programers we tend to take this as a given whilst most others do not even think about it.

What entropy is is the “relationship” between two or more “objects” in a defined set of objects, expressed by the normalised ratio of occurrences of the set of “held value” data. So it’s meta-meta-data or meta-meta-meta-data[1] depending on your view point…

Thus if your set of “objects” holds eleven addresses and the set of data held by those objects is just two unique “values” Shannon Entropy is the normalised “ratio” of the objects by their contained “value”.

[1] Which is more “meta” than most people can get their head around with just one reading. It was also tucked away in a book in the Victorian era –by the logician and photographer Charles Dodgson– most have heard of the book if not had it read to them when they are young or seen a film of it etc.

]]>Beyond that, simple math probably doesn’t offer much guidance.

Not much, but you can use semi-logic and reason from there.

We know that X and Y are seen as integers in simple math. However you can also view them as binary arrays (vectors) of width N-bits and you can then examin things “bitwise” via logic rather than “wordwise” arithmetically and make life a “little” simpler.

You can produce a table of X by Y where each entry is a bit array of the mixing function output and a count of differing bits etc. Likewise each X or Y index location on the table can hold not just the binary array with a bit pattern that corresponds to the integer value, but also the number of “set bits” in the bit pattern etc.

You can then use such tables to deduce other information such as when individual bits have changed or not and importantly why

If you run such a chaining function where the map is not used (that is each input bit becomes the corresponding output bit without change), you can see that you have the mixing function –XOR gate for this example– with a latch acting to give “delayed feedback”.

That is the latch’s D-input is driven from the mixing function XOR gate’s output, and the latch’s Q-output goes back to one of the mixing functions XOR gate’s two inputs.

So the latch acts like a single bit of memory holding the previous mixing function output.

You can draw up the equivalent of a truth table for this circuit, thus determine what effect it has on that bit’s state over the clocking of the latch.

To save doing some of the work, you can look the circuit up, it’s treated as a universal function for all sorts of things and is one of the two simplest digital filters[1] (the other has the latch between the two inputs and is used as a von Newman de-bias circuit in TRNGs you clock it twice to get one bit of de-biased output).

But you can also view it as a Linear Feedback Shift Register (LFSR) or a single bit “cellular automata” both of which come up very frequently in the design of “stream generators” and similar crypto functions.

The next step in the analysis is to replace the map with an XOR gate where the extra input becomes a “control” input.

So the circuit you now have, is an XOR gate acting as the mixer function with one of it’s inputs is the X-bit input the other the Y-bit feedback from the latch. The mixer function XOR gate is the first XOR gate and its output now goes into the non-control input of the second –new– XOR gate which has replaced the map.

The output of the second XOR gate is the circuit output and drives the data input to the latch.

Which leaves the control input of the second XOR gate as a –new– second input to the circuit. which is an input from a complex logic circuir what will become part of the non-linear wordwise function that is the map (just think of it as one bit from a ROM that holds the mapping function for now).

At this point you can still do logical analysis on the circuit… After all it only has two input bits and one bit of internal state making it effectively the equivalent of a three input gate in terms of the number of states, which is 2^(2^n) or 2^(2^3) which is, 256, not that hard 😉

As for that control input map function circuit, that’s a different order of magnitude by quite some way. With an N-bit wide map you are looking at 2^(2^(N-1)) potential states so with a ridiculously small 32bit width you’ll have 2^2147483648 states which is bigger than most calculators or computers will give up on and indicate infinity or overflow or similar. And… that is for just one of the 32 output bits…

It’s why removing the map makes sense when you want to start analysing things, and another reason why I treat a hash function as having two parts,

1, The map function,

2, The chaining function.

And keep the two as seperate as possible whilst analysing things.

[1] Be warned however “digital filter” or just “filter” should act as a warning flag of “higher maths” involving ‘e’ “approaching” which is enough to send many running for cover. Thankfully though with just single bits involved other techniques such as Walsh transforms can be used.

]]>CORRIGENDA

Reviewing my comment above, I see language which is wrong, without additional qualification.

Where I wrote “as few collisions as is mathematically possible,” that should be “as few collisions as is mathematically possible for a function whose output should have the statistical properties of a random distribution” (such distribution being another foundational requirement for cryptographic hash functions).

Likewise, where I wrote “E < X+Y < N occurs as infrequently as possible," that should read, "as infrequently as possible for a function whose output should have the statistical properties of a random distribution."

]]>We know from the basic math that if X + Y ≥ N then some of the given entropy must be lost. Beyond that, simple math probably doesn’t offer much guidance.

Here’s my informal reasoning:

Consider a hash chaining step in which (a) X + Y is significantly less than N, and (b) the hash-state entropy E after the step has completed is less than X + Y.

Shannon entropy measures the distribution of variations among a set of alternatives. X is the distributional variation of inputs processed prior to the chaining step (minus any entropy lost along the way), and Y is the distributional variation of inputs in the current input block.

Because (by supposition) E < X + Y, some set of alternative inputs up to and including the current block have collapsed into the new hash-state. In other words, two or more distinct input sequences (among the defined distribution of inputs up to that point) will fail to lead to distinct hash output.

This lost-entropy case guarantees one or more collisions.

========================

A fundamental requirement of cryptographic hash functions is that they produce as few collisions as is mathematically possible.

I presume that practically all crypto hashes standardized in the past 25 years meet this criterion (at least, to a very good approximation).

Unlike the other requirements for a cryptographic hash (like first and second preimage resistance), which must be gauged by Herculean efforts of cryptanalysis, I suppose that the minimum collision-probability property can be verified by comparatively straightforward test and analysis.

========================

If any hash function satisfies the minimum collision probability property, then the scenario considered above:

E < X+Y < N

occurs *as infrequently as possible*.

Therefore, I conclude that such chaining losses are very small. This *must* be so — **regardless of how the designers chose to implement the hash** — because otherwise the hash function would fail to satisfy the minimum collision probability requirement.

========================

You might know the Alfred Hitchcock anecdote which ends with one man saying “but there are no lions in the Scottish highlands,” and the other replying “well then, that’s no MacGuffin!”

If E < X+Y < N occurs with more than vanishingly tiny probability, then we can say "that's no hash function!"

]]>To the extent that I understood some the comments, Clive referred several times to the iterative chaining of the hash map function.

I think the problem is,

You see the hash as a single functional block and I do not.

I see the hash you see as being in two parts.

The first is effectively a one to one map N-bits wide if you could make one that big then it would be the equivalent of a ROM.

The second part is the “chaining function” that is where a block has two inputs N-bits wide, the current input to the hash and the previous output of the one to one map. The chaining function combines the two inputs and produces one N-bit output that goes into the one to one map.

The simplest chaining function would be a latch to hold the previous map output, and n XOR gates to mix the two n-bit inputs down to one n-bit output to drive the map.

In reality there are a whole bunch of different ways you can chain a crypto function like the one to one map. Have a look at some of the DES and AES “chaining modes” to see this.

Now consider the map output from the previous N-bit input has X-bits of entropy, The fact it has gone through the map, which significantly changes the output –avalanche requirment– it can not increase the entropy so it would still be X-bits of entropy. That is latched by the chaining function to use in the next hash. So the current N-bit input has Y-bits of entropy thus the chaining function has one input of X-bits of entropy and a second input of Y-bits of entropy.

But the chaining function only has an N-bit output, so the question arises of,

“How much entropy from X-bits and Y-bits of entropy at the input appear at the N-bit output?”

When you think about it a little while you will see it is one of those “it depends” type answers.

]]>