Curious March 11, 2021 7:20 AM

I am not comfortable writing this below, but I just can’t help but get these whacky ideas. And not knowing much about cryptography it oddly enough makes making such ideas easy for me.

I wonder: could it be possible to create a finger printing system, using a lot of dynamic XOR gates when processing a huge string of random bits? Admittedly, it isn’t clear to me if my vague notion of XOR games even makes sense. I guess, it could mean the same thing as if an observer covertly recorded the stream of numbers in some indirect and clever way, also in any later step in which the random number is processed or maybe re-processed (maybe like when processed in a dedicated computer chip used for encryption/decryption?).

Uh, I guess there would be two ways to think of a ‘fingerprint’ in this way, one being, recognizing somebody’s encrypted signature (signatures as such not supposed to be secret?), and also maybe for sort of recording or at least learning something about the very number used as a random number.

I guess I sort of imagined that, even if you had a seemingly endless stream of random bit numbers going through a lot of XOR gates, then perhaps you could create a finger printing system by predictably shuffling the XOR gates around during the same processing of the initial burst of random stream of bits into the XOR gates? Maybe similar to how bits are processed in permutation boxes with software? Chunks of bits being shuffled and re-shuffled around in ways?

I guess I am now wondering: might there be there any big no-no’s when it comes to working with XOR gates when processing bits of data?

I am ofc speaking from ignorance here, and I suspect if I really knew all there was to know about XOR gates, I wouldn’t have any fun writing this.

Clive Robinson March 11, 2021 9:06 AM

@ Curious,

I guess I am now wondering: might there be there any big no-no’s when it comes to working with XOR gates when processing bits of data?

Yes several.

All an XOR gate does is a “one’s complement add” without carry, which is the same as generating a parity bit from the two inputs.

If the individual inputs are,

1, Truely random
2, Uncorrelated with each other
3, Not fed back from either input in any way

Then the output should correlate with neither it’s self or either of the inputs.

Which gives rise to an interesting possability which von Newmann demonstrated.

As the introduction to the article indicates getting random data that is unbiased in terms of zeros and ones is not that easy.

Actually it’s more “not that easy efficiently”.

If you have a bit stream where the adjacent bits are “independent of each other” but there is a near uniform bias, then obtaining an unbiased stream is as simple as,

1, Read in two bits.
2, Use them as XOR gate inputs
2.1 if bits are 0,0 NO output
2.2 if bits are 0,1 output 1
2.3 if bits are 1,0 output 0
2.4 if bits are 0,0 NO output
3, Discard the two bits
4, If not enough bits generated Goto 1.

I won’t go through the math of why it works you can look it up[1], that’s not important as such, what is though is that the ouput bit rate is at best half the input bit rate, thus it’s not efficient.

[1] Scroll down to Examples and the,”Von Neumann extractor” in,

Essentially if the bias is uniform and the probability for a zero is 1-z then the probability for a one is 1-o there for either 0,1 or 1,0 is (1-z + 1-o)/2 so they both have a probability of a half which is what you want.

Tatütata March 11, 2021 9:29 AM

The link above is the commentary, an actual preprint (April 2020) of the paper is available at .

I’m trying to fathom how such a gushing fire hose of random digits could/would be used in practice. A massively parallel machine would have individual processors running at a few GHz, and wouldn’t require RNGs running at anything close to the clock frequency.

Say you need a uniform distribution in the interval [0,1[ as an IEEE754 double for a simulation, then you would essentially reframe the 250TB/s stream into ~53 bit chunks, which works out to 4.7 Gwords/second. How do you efficiently distribute this amount of data to your CPU farm, in addition to the normal data transfer necessary for your job?

Another issue in my eyes is the generation of other distributions, e.g., a normal deviate, from that stream. How does algorithms such as Ziggurat scale up in speed and parallelism? (The number of random picks for each produced deviate required isn’t a constant).

Clive Robinson March 11, 2021 9:39 AM

@ All,

Broadly the article is long on waffle and short meat.


1, Wide area lasers are noisy (good)

2, Wide area lasers get stuck in modes (bad).

3, A few years ago somebody discovered a D shaped optical cavity limited 2 (good)

4, A new optical cavity design shaped like a bow tie is better than 3 (good)

5, The results are preliminary as the photon extractor is not upto continuous use (bad)

6, It is assumed 5 will be solved (good).

None of which is realy relevant to informing if you are getting the highest of quality entropy needed for Cryptographically Secure True Random Bit Generators (CS-TRBG) so judgements can not be made.

However an observation can be made which is based around the truism of “Wallpapering air bubbles”. If you push down on an air bubble under wall paper when you put it up, the air thud bubble just moves somewhere else. Thus you have to solve the air problem to solve the bubble problem otherwise you just end up with the same problem that probably gave rise to the game of “Whack-O-Mole”. So your two options are “lance the bubble” or “push the air over the edge”.

I’m not at all certain either the D or Bow Tie cavities solve the mode problem. Though it can be hard to see there is a major difference between making something “chaotic” and something “random”. The use of a cavity would I suspect tend towards “chaotic” not “random” by the measures you need for cryptography. Though chaotic is fine for most statistical uses like Monte Carlo methods where you are actually trying to build a shaped probability curve not a fully non determanistic output.

Corine March 11, 2021 11:25 AM

@ Tatütata,

I’m trying to fathom how such a gushing fire hose of random digits could/would be used in practice.

I as well. A stream cipher can generally produce several billion unpredictable bits, once keyed with 128-256. We do have to worry about Spectre et al. these days, but otherwise, what’s the problem? It seems like a few hundred bits of “true” entropy could meet one’s needs for a lifetime, if managed correctly. More realistically, it could be seeded once per boot, maybe with “new” seed bits pushed occasionally.

Why the obsession with a constant high-bandwidth physical source? Is it just because running Fortuna or similar on a non-speculating coprocessor wouldn’t be novel enough to produce citable academic papers?

Clive Robinson March 11, 2021 1:47 PM

@ Tatütata,

I’m trying to fathom how such a gushing fire hose of random digits could/would be used in practice.

Currently I’d call it “a solution looking for a problem” and I’d also say “I’ll be darned if I could ever tell you when such a problem will ever come up”.

As for,

How do you efficiently distribute this amount of data to your CPU farm

The answer is simple but it does not realy solve the problem. That is by “independent optical links” which gets both impractical and expensive very very quickly. An easier and cheaper solution is just put what will be a lowish cost chip on every motherboard and route signals to the individual cores. If the random numbers are genuinely random then the fact you throw away 99.9999% of it’s capability matters not a jot, it effectively becomes “roulette wheel random generation” by default.

But to the more interesting case of “modeling distributions” and how you go about it,

How does algorithms such as Ziggurat scale up in speed and parallelism?

If you are using Ziggurat which is kind of a “lookup in log tables” type algorithm the speed limitation is going to be the CPU unless you pull a few tricks to keep it in cash.

As for parallelism that depends on the architecture and what you are trying to achieve. Obviously it will run on as many CPU cores as you can get two or three 64bit numbers into quickly. But it then depends on if you need the remaped random numbers one at a time or if you can generate a whole batch of them and keep them in cache or not till you use them.

But how many 64bit numbers you need to pull on is variable, plus there are always quirks. For instant Ziggurat has issues with correlations in any of the least significant bits. So…

It rather depends on the RNG output and,

1, How truely independent the individual bits are.

2, How uniformaly the resulting numbers are distributed across a given number space.

In theory the Ziggurat algorithm has five cases the topmost and bottom most layers, a number within the “boxed” distribution, a number in the dx box and the rest of the numbers that get disgarded.

If you have a very large number of layers then the top and bottom layers are very rare occurances (ie less than 1/1024 each) next in infrequency is the “dx box” if the number of layers is sufficiently high then this box can be assumed to have a straight line piecewise approximation of two equal area triangles (thus giving 50:50 and a simple bit flip decider is an effective speed up for go/nogo). Most frequent are the parts of the layer that are “go” to the left of the dx box and “nogo” to the right of it.

So… As bitwise corelations have a habit of being grouped in adjacent bits we can say outside of the top and bottom layers the majority of issues will occur around the edges of the dx box and the flip decider bit.

However, the layers are also not of “uniform hight” (dy), but “uniform area” to the left of the dx box. Outside of the issue of the top and bottom layers the other layers have a subtle bias due to what is effectively quantization noise on the left edge of the dx box.

I’m not going to go into the maths of what goes on at the left hand edge of the dx box as it’s both dull and lengthy. However it does show sensitivity to adjacent bit correlations. The solution to this whilst simple in hardware is not in software which is to permutate the bits as they come out of the “bit source” such that that adjacent bits are suitably seperated across the 64bits of the random number output.

metaschima March 11, 2021 4:14 PM


Thank you for the link to the original article, I really enjoy reading about and studying RNGs.

@Clive Robinson

You have covered pretty much everything I’ve wanted to say, and I agree with you. Although the paper is not an easy read I believe this is not a whole lot better than tuning your radio to an unused channel and then postprocessing the data to make it appear random when it actually does not contain as much entropy as they would have you believe. The quantum level emissions give it the entropy, the rest is just chaotic noise same as you would get from a transistor. If you want to use this system for cryprograpic purposes then as Bruce says you have to feed it into a CRNG/CS-RBG. Also, I don’t believe in TRNG/TRBGs. There is no way to prove that a generator is truly random. There definitely are processes in nature that are known to be truly random, on the quantum level, on the astrophysical level etc. but to create a device to fully isolate such phenomena, be consistent in this and not be influenced by outside non-random phenomena is the holy grail of RNGs, which will probably never be achieved, but maybe quantum computers could eventually do it…

lurker March 11, 2021 10:35 PM

@Corine &

Is it just because running Fortuna or similar on a non-speculating coprocessor wouldn’t be novel enough to produce citable academic papers?

tsk, tsk, that’s enough to call for the mouthwash in some quarters 😉

Winter March 12, 2021 2:37 AM

“I’m trying to fathom how such a gushing fire hose of random digits could/would be used in practice.”

With quantum cryptography, you want to establish a secret key over a very, very inefficient quantum link. Detector efficiencies for long links are in the (sub) percent ranges. And you would like to change the keys regularly.

Whether this could mop up all the random bits, I would not know. But if the laser gushes out too many bits, you can downscale it, use inefficient (cheap) detector setups, use it only intermittently etc. Too many random bits are like too much money, a first world problem.

wumpus March 13, 2021 1:20 PM

@Clive Robinson

Note that being certain your biased random numbers are otherwise perfectly random makes for pretty math but seems unlikely in practice.

If you didn’t want to use Fortuna, and you had an otherwise good random data source with a bias such that you would expect to need N bits per output bit, I’d suggest taking N*256 bits and running them through SHA256. This not only leaves you with a steady output of bits, it also removes the requirement that all bits are completely uncorrelated and otherwise perfect aside from bias.

Technically speaking, I’m pretty sure you shouldn’t need anything but avalanche in your hash if you throw away enough of the output (the ratio discarded vs. kept ratio equaling the signal/noise ratio), but I’d still think that something like SHA256 should be used to remove any impurities in the non-cryptographically secure hash (just SHA256 the output after the discards).

Clive Robinson March 13, 2021 5:07 PM

@ wumpus,

I’d suggest taking N*256 bits and running them through SHA256.

I generally try not to encorage people to use crypto algorithms on the output of an entropy source…

Whilstvit is fine if you know what you are doing, others reading about it fall into “magic pixie dust thinking” or they in effect forget crypto algorithms are fully determanistic, thus in no way create “entropy”.

Further way to many “entropy source” designers come up well short of ideal, thus use crypto algorithms to hide thrir failings. Intel are a major source of such behaviour.

But there is another reason you should be able to get at the raw output of an “entropy source”, they are very fragile and easily led. That is if you direct energy at them they become not just heavily biased but the entropy effectively diminishes to near nothing[1]. If you don’t have access to the raw entropy source to continuously test then you will not know it is broken in some way as the use of crypto easily hides it from those not in the know.

[1] Back in the early 1980’s when still a young engineer and 8bit CMOS CPU chips were the latest thing, I discovered they were very easily made to badly misbehave with a hand held two way radio pushing out less than a watt of RF in the VHF or UHF bands. I then did further research which alowed me to chose how I modulated the carrier to actively inject controlable faults into circuits. I took this up into using X-Band on smart cards and was able to get rather more usefull information than the decade later worries over Power Analysis.

What was especially sensitive to RF attacks that could be both remotely read and effectively write and thus give synchronized attacks were low level analog electronics like the so called “true entropy sources” such as “noise diodes”.

Much more recently a couple of student researchers from the UK’s Cambridge Computer lab got a best paper award when they demonstrated that an unmodulated 10GHz (3cm) source pointed at an IBM TRNG took it’s output range down from around 4billion to under 120… Not exactly desirable in a TRNG…

MarkH March 14, 2021 12:21 AM

@wumpus, Clive:

Sources of truly random numbers with bias are actually very common. Think of a real-world gambling die. It may be made as uniform as practical, but over a sufficiently large number of rolls, a slight bias might be found to one face or corner.

It’s biased in the sense that all outcomes are not equally probable, and also truly random in that once the bias is taken into account, nothing about its history of past rolls makes it possible to improve the prediction of its next roll.

In some kinds of hardware random generators (noise diode, for example) controlling bias at the raw-bit level is extremely difficult: but the bits are otherwise a practical impossibility to predict.

What wumpus proposes — using a hash as an entropy concentrator — is perfectly valid. For a true hardware random source, the hash function doesn’t need most of the properties of crypto hashes, because there’s no seed or state that could leak. The fundamental property that changing any input bit should toggle about half of the digest bits is sufficient.

Clive’s important caution is that hashing is dangerous if the integrity of the data pre-hash is not assured.

I recently recommended that for critical applications, each hardware generator should be monitored (a small dedicated computer would usually suffice) that can “sound an alarm” if the statistical character of the data stream runs askew. This control must be applied before any filtering to reduce bias. If the generator is soundly designed to make good quality output under normal conditions, then it’s very unlikely that any failure mode — whether spontaneous, or induced by external interference — will not make a large difference in statistical distributions of its output stream.

metaschima March 14, 2021 1:15 PM


I don’t think homebrew crypto is a good idea in general. You have to take into account many many things. It’s safer to use known good crypto like Fortuna. However, I do enjoy creating homebrew crypto just for fun. What I created is a program that mimics what you’re talking about, but with some addons. I use a 512 bit hash (whirlpool) but you can use any, although I recommend 512 bit. I designed it to grab a seed value from a separate decent source of entropy like /dev/urandom and use that, in combination with a counter as well as a chunk of imput data determined at runtime based on testing of the input stream. All of the above is hashed and outputted, then counter incremented, an new data replacing the previous data block. The output is checked to make sure it is not the same as the previous otherwise no output is output. The seed is reseeded at a set interval based on when the counter rolls over. Doing it this way ensures that even if the input fails the output data will still be cryprograpically secure. You can also use an encryption algorithm in OFB to build a similar structure. Anyway, this is for testing purposes only, not for production use.

Leave a comment


Allowed HTML <a href="URL"> • <em> <cite> <i> • <strong> <b> • <sub> <sup> • <ul> <ol> <li> • <blockquote> <pre> Markdown Extra syntax via

Sidebar photo of Bruce Schneier by Joe MacInnis.