Pseudo-Random Sequence Generator for 32-Bit CPUs
A fast, machine-independent generator for 32-bit Microprocessors
Dr. Dobb's Journal, v. 17, n. 2, February 1992, pp. 34-40.
Does the computer world really need another random sequence generator when there’s one built into most every compiler, a mere function call away? Just use it and be done with it. Unfortunately, if randomness plays a large role in your program, you’d better pay attention to the code that generates the random sequence. Considering the average quality of the standard random sequence generators, it is probably smart to ignore the RND function that comes with your compiler and roll your own. The generators described here are almost definitely better and probably faster than the one that came with your compiler. But first, I’ll explain why I even bother.
How Random is Random?
Most random sequence generators are not very random. Simple applications such as computer games need so few random numbers they hardly notice, but large-scale Monte-Carlo simulations that use millions or even billions of random bits to model complex systems are extremely sensitive to the properties of random number generators. Use a poor random sequence generator, and you start getting weird correlations and strange results.
Why? Because the generator doesn’t produce a random sequence. It probably doesn’t produce anything that looks remotely like a random sequence. Of course, it is impossible to produce something truly random on a computer. Computers are deterministic beasts: Stuff goes in one end, completely predictable operations occur inside, and different stuff comes out the other end. Put the same stuff in on two separate occasions and the same stuff comes out both times. Put the same stuff into two identical computers, and the same stuff comes out of both of them. There are only a finite number of states a computer can be in (a large finite number, but a finite number nonetheless), and the stuff that comes out will always be a deterministic function of the stuff that went in and the computer’s current state. That means that any random sequence generator on a computer (at least, on a Turing machine) is, by definition, periodic. Anything that is periodic is, by definition, predictable. And if something is predictable, it can’t be random. A true random sequence generator requires some random input; a computer can’t provide that.
In the military, where these things take on a degree of seriousness not found anywhere else, random sequence generators tap the natural randomness of the real world. Noisy diodes, devices that measure atmospheric static, and Geiger counters all can serve to produce random bit sequences. However, your typical computer programmer will find this sort of specialized hardware out of reach.
But even if you can plug your Geiger counter into your computer, you’re going to have a problem with repeatability. You might get a beautiful random sequence, but unless you saved every bit of that sequence there’s no way to reproduce the simulation. The random number generator that came with your compiler may have lousy statistical properties and repeat after only 16,000 bits, but at least it can reproduce the same sequence on demand.
Where have we gotten? We can’t have a true random sequence generator, and even if we could we couldn’t reproduce a given sequence anyway. So if we’re stuck with a periodic and deterministic “pseudo-random” sequence generator, we might as well choose a good one.
What does a Good Pseudo-Random Sequence Generator Look Like?
Many people have taken a stab at defining this formally (see Knuth’s Semi-numerical Algorithms for an example), but an intuitive understanding should suffice here. The sequence’s period should be long enough so the finite sequence actually used is not periodic. That is, if you need a billion random bits for a simulation, don’t choose a sequence generator that repeats after only 16,000 bits. These relatively short, nonperiodic subsequences should be as indistinguishable as possible from random sequences. For example, they should have about the same number of 1s and 0s, about half the runs should be runs of 0s and the other half should be runs of 1s, half the runs should be of length one, one quarter of length two, one eighth of length three, and so on. These properties can be empirically measured and then compared to statistical expectations using a chi-square test. (This, of course, assumes that the sequence has a flat distribution. If you want a sequence which is 0 three-quarters of the time and 1 one-quarter of the time, my advice is to start with a flat sequence and then manipulate it—it’s much easier that way.)
A lot of effort has gone into producing good pseudo-random sequences on computers. Generators abound in the academic literature, along with various tests of randomness. All of these generators are periodic (there’s no escaping that), but with potential periods of 2256 bits and higher, they can be used for most simulations that expect to terminate before the next ice age.
In the October 1990 issue of Communications of the ACM, a detailed and comprehensive article by Pierre L’Ecuyer discussed a family of linear congruential generators and other pseudo-random sequence generators based on them. The simplest of these have the form: Xn=(A * Xn-1+C) mod m, where Xn is the nth number of the sequence, Xn-1 is the previous number of the sequence, and A, C, and m are large constants chosen to make everything just so. (m should be a prime number, for example.) Many of the generators in this family can seriously bottleneck a complex program. They can require a large number of multiplications and divisions per cycle.
The pseudo-random sequence generator described in this article is both fast and statistically sound. Its period is long enough for most applications, and it has been optimized for fast execution on 32-bit microprocessors. In addition, it has no machine-dependent operations, so a specific sequence generated on one machine will be exactly the same as a sequence generated on another. The generator produces a pseudo-random sequence of bits. If you need larger random numbers, take a series of bits and combine them. Three sequential bits is a random number between 0 and 7. If you collect 4 bits in sequence and try again if you get a number greater than 1001, then you have a random number between 0 and 9.
The generator is based on a Linear Feedback Shift Register, or LSFR. Feedback shift registers have been generating random sequences for a long time. They’re discussed in Numerical Recipes in C and by Knuth. Basically, they consist of a shift register and a feedback sequence. Everytime a random bit is needed, all the bits in the register shift 1 bit to the right (the low-order bit falls into the bit bucket), and a new high-order bit is calculated as a function of the other bits and appended to the left side of the register. The generator returns the low-order bit. An LSFR is a special case of a feedback shift register, where the generator calculates the new high-order bit by taking the XOR of some subset of the bits in the register (see Figure 1).
In theory, an n-bit LSFR can generate a 2n-1-bit pseudo-random sequence before repeating. To do this, the shift register must cycle through all 2n-1 combinations. (It’s 2n-1 and not 2n because a shift register filled with 0s will cause the LSFR to output a never-ending stream of 0s, which is not particularly useful.) Only certain feedback sequences produce LSFRs with this maximum-length sequence. I’ll spare you the number theory in this article (see almost any cryptography text for details), and have taken the liberty of choosing three maximum-length LSFRs for my generator, one each of 32 bits, 31 bits, and 29 bits. The 32-bit LSFR is described in the function RANDOM in Example 1; it has a period of 232-1, or about 4*109 (four billion).
Example 1: Generating random bits with an LSFR
int RANDOM (){ static unsigned long register; /*Register must be unsigned so right shift works properly.*/ /*Register should be initialized with some random value.*/ register = ((((register >> 31) /*Shift the 32nd bit to the first bit*/ ^ (register >> 6) /*XOR it with the seventh bit*/ ^ (register >> 4) /*XOR it with the fifth bit*/ ^ (register >> 2) /*XOR it with the third bit*/ ^ (register >> 1) /*XOR it with the second bit*/ ^ register) /*and XOR it with the first bit.*/ & 0x0000001) /*Strip all the other bits off and*/ <<31) /*move it back to the 32nd bit.*/ | (register >> 1); /*Or with the register shifted right.*/ return register & 0x00000001; /*Return the first bit.*/ }
LSFRs are competent random sequence generators all by themselves, but they have some annoying nonrandom properties. Sequential bits are linear, which makes them a poor candidate for encryption. And large random numbers generated from sequential bits of this sequence are highly correlated and, for certain types of applications, not very random at all.
The algorithm VERYRANDOM in Example 2 uses three LSFRs, combined in such a way to produce a nonlinear sequence of bits. Two of the LSFRs provide inputs to a 2:1 multiplexer; the third LSFR chooses which of the inputs to output. The length of the three LSFRs are relatively prime to each other; this ensures that the sequence will not repeat before its maximal length, which is (2{32}-1) * (2{31}-1) * (2{29}-1), which equals about 2{92} or about 10{27} (one billion billion billion). Three-quarters of the time the output of VERYRANDOM is equal to the output of each of the LSFRs that act as inputs to the multiplexer, but only half the time is it equal to the output of the LSFR that switches the multiplexer. This fact is useful to cryptanalysts trying to break ciphers based on this generator, and in fact makes the sequence generator all but useless for encryption. Still, it should be perfectly acceptable as a random sequence generator for simulations and the like.
Example 2: Combining three LFSRs to increase the sequence length
int VERYRANDOM () { static unsigned long regA, regB, regC; /*regA, regB, and regC should be initialized with some random value.*/ regA = ((((regA>>31)^(regA>>6)^(regA>>4)^(regA>>2)^(regA<<1)^regA) & 0x00000001)<<31) | <regA>>1); regB = ((((regB>>30)^(regB>>2)) & 0x00000001)<<30) | (regB>>1); regC = ((((regC>>28)^(regC>>1)) & 0x00000001)<<28) | (regC>>1); /*regB is a 31-bit LFSR. regC is a 29-bit LFSR.*/ /*Both feedback sequences are chosen to be maximum length.*/ return ((regA & regB) | (!regA & regC)) & 0x00000001; /*Above is equivalent to: if A then return B else return C.*/ /*Variants: return ((regA & regB) | (regA & regC) | (regB & regC)) 0x00000001; Above variant returns the majority of A, B, and C. return (regA ^ regB ^ regC) & 0x00000001; Above variant returns the XOR of A, B, and C. */ }
Two variants of VERYRANDOM are also provided in the source code. Both modify the way the three LSFRs interact to produce the output bit. In the first variant, the output bit is the XOR of the three LSFR inputs. In the second variant, the output bit is the majority of the three LSFR inputs. Note that these two variants also produce a nonlinear sequence, and also have a period of 292-1.
The random bits produced by RANDOM and VERYRANDOM are repeatable; the same input seeds will produce the same output sequences. For applications where this is not a requirement, XORing the output of this generator with some system-dependent register (the low-order bit of some clock or garbage collection register, for example), will produce a sequence so close to random for most applications that it might as well be.
Other variants are also possible. You could decimate the sequence; that is, only use some of the bits produced. Collecting, for example, only every third bit of the sequence will produce a different sequence. And if 2n-1 is not divisible by 3, then the decimated sequence will have the same length as the original sequence. There are various other decimation techniques as well. None of them really do much to salvage LSFRs for encryption purposes, though.
With any simulation, it is always wise to check a few different generators. Sometimes you’ll get strange correlations with a particular generator and a particular application. Make sure that any output from your program is real, and not just an artifact of the pseudo-random number generator.
One final note of caution: There are many more feedback arrangements for various-length LSFRs that produce maximum-length sequences, but don’t fiddle with the feedback sequences without the proper mathematical theory. The particular bits that are XORed together may seem arbitrary, but they are chosen to ensure that the sequence takes 2n-1 bits to repeat. Blindly choosing a different feedback sequence can easily make the output sequence repeat after only a couple of hundred bits, and you would be better off sticking with your store-bought RND function.
PSEUDO-RANDOM SEQUENCE GENERATOR FOR 32-BIT CPUs by Bruce Schneier
[LISTING ONE]
int RANDOM () { static unsigned long register; /*Register must be unsigned so right shift works properly.*/ /*Register should be initialized with some random value.*/ register = ((((register >> 31) /*Shift the 32nd bit to the first bit*/ ^ (register >> 6) /*XOR it with the seventh bit*/ ^ (register >> 4) /*XOR it with the fifth bit*/ ^ (register >> 2) /*XOR it with the third bit*/ ^ (register >> 1) /*XOR it with the second bit*/ ^ register) /*and XOR it with the first bit.*/ & 0x0000001) /*Strip all the other bits off and*/ <<31) /*move it back to the 32nd bit.*/ | (register >> 1); /*Or with the register shifted right.*/ return register & 0x00000001; /*Return the first bit.*/ }
[LISTING TWO]
int VERYRANDOM () { static unsigned long regA, regB, regC; /*regA, regB, and regC should be initialized with some random value.*/ regA = ((((regA>>31)^(regA>>6)^(regA>>4)^(regA>>2)^(regA<<1)^regA) & 0x00000001)<<31) | (regA>>1); regB = ((((regB>>30)^(regB>>2)) & 0x00000001)<<30) | (regB>>1); regC = ((((regC>>28)^(regC>>1)) & 0x00000001)<<28) | (regC>>1); /*regB is a 31-bit LFSR. regC is a 29-bit LFSR.*/ /*Both feedback sequences are chosen to be maximum length.*/ return ((regA & regB) | (!regA & regC)) & 0x00000001; /*Above is equivalant to: if A then return B else return C.*/ /* Variants: return ((regA & regB) | (regA & regC) | (regB & regC)) & 0x00000001; Above variant returns the majority of A, B, and C. return (regA ^ regB ^ regC) & 0x00000001; Above variant returns the XOR of A, B, and C. */ }
Categories: Pseudorandom Number Generators