In the fascinating book _Of Whales and Men_ (Dr. R. B. Robertson, Knopf, 1954), he mentions the practice of whalers having golf bags made from the same article.

]]>Check my math, but I think that the expected number of time a 5 digit sequence would appear in 1m numbers if 10, so 8 is not out of line.

]]>The requirements for cryptographic random numbers are much, much more serious than the requirements for statistical random numbers. Excel's function is fine for Monte Carlo simulations and the like, but are awful for cryptography.

You have to be very carefull with using an electronic roulet wheel as there are some interesting side effects due to the effects of sampaling both in theory and in real circuits.

Another thing is that radio active sources contain bias, not just from the Poisson distribution effects but also from the half life of the source. That is if you get an average of 1000 counts per second today, the average will have dropped to 500 per second one half life later. For some sources with half lives measured in thousands of years this might not seem important but it is.

Back to using two oscillators and some of the problems involved, i'll just quickly outline some of them ;)

If you use two oscillators and sample one with the other you are basically making a hetrodyne mixer that you can find in any radio. The result is you usually assume you end up with four frequencies (to a lesser or greater extent) at the output of the mixer,

F1, F2, F1+F2, F1-F2

You then filter out the one you want (normally F1-F2) and put it through an amplifier etc to get it to a usefull level.

In an ordinary portable radio the Radio station is (we assume) of high stability the one in the radio of low stability and probably of high noise as well. There is a well known problem in RF engineering circles of oscilator noise messing up the desired signal to the point that it is unintelegable (the eye diagram is used to show this on digital systems). In the case of your roulet wheel the output signal contains the noise and frequency variation of both oscilators.

What is not immediatly obvious to most people (and yes that includes a lot of design engineers) is that the four frequencies at the output of the mixer have real energy that has to go somewhere. Unless you are very very carefull it will end up where you don't want it, ie in with the desired signal...

How this happens is that when you select the desired frequency with an ordinary filter, and reject the others they bounce back into the mixer and generate even more frequency components. Some of these new frequencies end up back in the same frequency range you are looking at and add to the desired frequency causing it's zero crossing point to move. Some go out back to the oscilators causing more frequency generation in the oscilator circuits. All of these new frequencies tend to bounce back into the mixer and around they go again.

Also if a generated frequency gets reflected back into a variable frequency oscilator close to the frequency it is operating at it can pull the oscilator onto it's frequency (see lose locked oscilators such as those used in PAL and NTSC chroma circuits).

The result is that a mixer can be regarded like a preasure vessel, eventually the preasure is going to go out the only exits available to it, unless it has a safety mechanisum to remove it safely. If you block up the ones going to the oscilators then the energy has only one place to go, and that is onto your chosen signal as some form of modulation....

Another problem is the filter, as part of it's function it containes energy storage components (capacitors and inductors) all of these have defects and in combination have resonant frequencies. The result is that the amplitude of the signal will be changed by the filter not just at the filter band edges but within the pass band. So as your "random signal" moves up and down the filter bandwidth it is Amplitude Modulated (AM) by the filter.

Now what you may not know is that an AM signal also consists of four different frequency components two of which are of the same frequency but their phase is different and the two "sideband" frequencies. The phase difference does not usually matter unless one or both of the frequencies is changing then you get a rotational effect, this due to vector addition which causes further AM effects on the frequency components, etc, etc, etc.

Also a changing phase means that the resulting signal is being Phase Modulated (PM) which means the zero crossing point of the signal you want is being moved...

Guess what to drive a counter you need to have a rasonably acurate edge, sin waves just don't hack it it needs "squaring up". The easy (and wrong) way to do it is to put the signal through a limiter.

A limiter is usually an over driven amplifier where the output signal crashes (and bounces) into the supply rails. I'll leave you to imagine what this does interms of cross modulating between the signal and the supply line noise and vice versa. lets just say it's messy...

Also guess what squaring a sinewave up by limiting it's amplitude moves the energy within the signal around and again the energy has to go somewhere, the result is AM to PM conversion. Or to put it more simply the variation in amplitude of the signal caused by the filter, on passing through the limitter moves the zero crossing point of the signal. Also the amplifier has it's own frequency charecteristic and the signal smaking into the rails causes ringing at one of the amplifiers resonant frequencies...

The right way to do it is through a zero crosing or threshold detector where the zero crossing / threshold point is derived from the signal average (but this also has it's own problems).

Also there is the not so minor problem of how much do the electronics destort the signal during amplification, before it is converted to a square wave. The answer is a lot, your average active component has a power law curve, and the design engineer biases the device to get a desired effect either gain or minimum distortion (but not both). Again this leads to cross modulation of the signal with powersupply noise and the generation of harmonic components. One trick engineers use to try to reduce distortion is feed back, but this due to amongst other things such as time delays has frequency selective problems giving other AM and PM type effects to a variable frequency signal...

There are a whole load of other effects you can bump into as well but you would need several books to describe them.

The net result of all of this is that your original signal now has lots and lots of noise on it at the zero crosing point. If you analyse it (and people have) you will find that in most cases it can be found to consist of a number of predominant frequency components moving the crossing point back and forwards in a predictable manner. Opps this is a decidedly non random effect wich is going to not only be visable as such in the generator output but it can also add an offset or bias, this effect is well known to engineers who have tried to design Base Band Output Direct Conversion Receivers (one of the reasons hetrodyne receivers with Intermediate Frequency (IF) circuits where designed in the first place).

In general the bias can be removed by the use of a simple digital circuit however the non random modulation signal cannot be so easily removed.

So as you can see designing a simple "true random" generator is not an easy engineering task.

Then there are other difficulties to do with selecting the bandwidths of the signal path and the effects the filters have. That is they turn a Gausian White Noise Signal, into a bandwidth limited signal, those in the audio engeniring game refer to this as white to pink noise conversion.... Band width limited signals from sampled signals have their own problems to deal with as DSP engineers are well aware.

I could go on with other effects but I think you get the idea.

A lot of people decide when faced with designing a "True Random" generator, that to save cost and effort they will just fudge the whole issue and take the not quite random output and feed it through a crypto or hashing function, that way it is going to be (they think) not a problem...

> Has anyone analysed the RAND corporation's Random number list to determine it's randomness?

Yes; in the book's introduction, the results of some of RAND's own validation tests are given, and so it was known to be less-than-perfect even when it was published. However the biases are too small to matter for most purposes.

> I assume that it's small size would not allow you to determine it's period.

As it is generated from a hardware source of randomness which--under present understanding of quantum mechanics--is believed to be truly random, the sequence should be aperiodic.

> What about their technique? Can it be faulted?

There are a couple of possible criticisms.

First, they are really vague about the "random frequency pulse source" which is the real heart of the machine. It is widely belived to have been a Geiger counter mounted near a radioactive source of suitable size (presumably somewhere on the order of 30 microcuries), but this doesn't seem to have been actually recorded, and references to the machine's statistical quality "running down" after "one month of continuous operation without adjustment" tends to suggest otherwise. It is possible that the pulse generator was actually some sort of astable oscillator. In that case, it may well have been chaotic rather than random, making analysis of the rest of the processing, and consequent quality of the output "randomness", much more difficult. Additionally a chaotic oscillator might tend to end up synchronising with some accidental external driving field, which would make its output highly predictable.

A second point is that a 5 bit counter was driven from the pulse source and sampled once per second. In other words, each sample was the 5 least significant bits in the count of events in the last second. However a count of randomly distributed events over a set period is not uniformly distributed, it is a Poisson distribution. Reducing the count modulo 32 will largely correct this bias, but perhaps not entirely; the standard deviation in this case is about 320, only ten times the modulus, so once can easily see residual biases (on the order of a few percent) remaining.

Next, there is a curious anomaly in the process of converting 5 bit numbers to digits base 10. Basically the numbers are reduced modulo 10, but because 32 is not evenly divisible by 10, we have to discard some samples so as to avoid bias. To avoid bias only 2 values (e.g. 30 and 31) need to be discarded, however they actually discard 12. Since this would increase by 50% the duration of an operation which must have taken on the order of a fortnight, it must have been reasonably important; but so far as I know there has never been an explanation why. What was wrong with counts in the twenties?

Finally, to correct for biases that were detectable when the machine had "run down", they summed pairs of digits modulo 10 (this is what is meant by "rerandomization of the basic table"). The reason was that this "transformation was expected to, and did, improve the distribution in view of a limit theorem to the effect that sums of random variables modulo 1 have the uniform distribution over the unit interval as their limiting distribution."

This is correct so far as it goes but contains two hidden assumptions. Firstly, that theorem is only true if the two random variables are independent. That would seem like a reasonable assumption if they, say, generated two tables of a million digits each and then digitwise summed them. But what they actually did is produce one table of a million digits and then sum each digit with the one 50 places behind it (what they did with the first 50 isn't mentioned, I presume they wrapped around). So if any correlations in the device were capable of lasting across 50 seconds, the procedure is not mathematically valid, even if the output "looks" more random. Worse, because this trick produces the same number of output digits as input, it follows that the entropy per digit in the output cannot be any higher than the input. Since the entropy per digit of a biased distribution is less than a uniform one, it follows that the output sequence is just as biased as the input, only in a more complicated way.

In practice they seem to be plenty good enough for most purposes.

]]>> it seems to me that I'd just use Excel and the randbetween() function if I needed a five-digit random number.

Robert, random numbers are a much bigger topic than that!

The problem which the RAND table was addressing is that computer "random number" functions actually return "pseudo-random numbers", that is, they are generated by a mathematical formula which is intended to produce a series that "looks random" and has no obvious structure.

However there are several significant problems that can occur with these pseudorandom number generators, or PRNGs.

* Firstly, given the same "seed" or starting value, they always produce the same output sequence. This can usually be mitigated by using the system clock as the seed.

* Secondly, having a finite state, they are necessarily periodic, i.e. will eventually repeat. Many have a period on the order of 2^32, which is good enough for many purposes but not for large scale simulations or computer security. Cryptographic ones usually have a period of 2^64, 2^128, or even more. Some inferior ones have a period of the order of 2^15 which is really only good enough for games.

* Thirdly, while they may not have any "obvious" structure, the fact that they are generated from a formula means that they must have some type of structure. Often this will not correlate in any significant way with the simulation you are running but sometimes it will, and then seriously pathological results will arise. An infamous example of this is that one type of very popular PRNG, the linear congruential random number generator or LCRNG, tends to produce successive values which defines points on a small set of (n-1) hyperplanes in some n-dimensional space. If an LCRNG is used in a careless manner to simulate random positions in space or space time, it can produce totally spurious results. In other example, the FFT of many PRNG sequences show obvious "spectral lines".

When we ask for random numbers there are at least six types of problems we commonly want to solve:

* in some security protocols, we often require a number that must be unpredictable even by a very smart opponent. In this case, the number must be generated with a significant amount of "truly random" entropy. The usual process is to combine every available source of difficult-to-guess (high entropy) data, using an entropy-preserving formula (i.e. one that makes the final number as difficult to guess as correctly guessing _all_ of the input numbers). There also exist hardware RNG modules which generate "true" RNs directly from unpredictable physical processes, like thermal noise in a transistor.

* in other security protocols, we require a large number of random numbers, such that the opponent cannot guess the next one after having seen all the previous ones. The solution to this is a "cryptographically strong pseudorandom number generator" or CSPRNG. Finding any structure in a good CSPRNG sequence is as difficult as breaking an underlying cipher. However, CSPRNGs tend to be much slower than regular PRNGs. For more information see:

http://en.wikipedia.org/wiki/Cryptographically_secure_pseudorandom_number_generator

* in many applications, we only require that the numbers be "well distributed" across their range, and not repeat within the application domain. An LCRNG of adequate period is usually sufficient for this, which--along with simplicity and speed--is why they are so common in computer software.

* for statistics and simulations, we require that successive numbers do not correlate in any measureable way with parameters under test. It may also be desireable that the sequence can be easily reproduced by colleagues. This is the problem which the RAND tables were intended to address.

* in a very small number of security protocols, we don't care about the opponent predicting the number, but we want to be able to prove to him that WE couldn't predict it. The RAND tables have also been used for this purpose. Numbers generated in this way are called "nothing up my sleeve numbers". An interesting article on this topic can be found at:

http://en.wikipedia.org/wiki/Nothing_up_my_sleeve_number

* finally, in large scale statistics and simulations, we need numbers without pathological correlations, but we need a really enormous number of them. CSPRNGs have been used for this purpose but provide additional unnecessary assurances and are often too slow. The state of the art in this area is the Mersenne twister, see:

http://en.wikipedia.org/wiki/Mersenne_twister

The Mersenne twister is not cryptographicaly secure but it has a huge period, very little correlation, and is pretty fast.

Prior to 2003 the rand() generator in Excel was a defective modification of an LCRNG and known to have a number of quite serious defects, and a period of only 2^24. A much improved version was shipped in Excel 2003 (although with a bug not fixed until Jan 2004). In this version, rand() is the mod 1.0 sum of three 15 bit LCRNGs scaled to [0,1), and is fairly good as such things go, good enough for most non-security purposes except large scale simulations. The period is just under 2^45.

I do not know the function behind randbetween() but one source states it was not updated in 2003 and is still based on the older, defective rand() function.

]]>I teach this course that involves the study of probability and the book makes a really big deal out of how to get random numbers off a printed table, but it seems to me that I'd just use Excel and the randbetween() function if I needed a five-digit random number.

