Doing it Randomly: Probabilistic Algorithms in Programming
The approach to using probability algorithms is a powerful and innovative way to solve sharing problems.
By Bruce Schneier
It may seem strange that programming, which has long been a bastion of exact algorithms behaving in precisely the same manner every time, occasionally turns to probability to solve some of its more difficult problems.
In some people's minds, algorithms should be proveably correct at all times and for all inputs (as with defect-free programming and formal methods). Probabilistic algorithms give up this property. There is always a chance that the algorithm will produce a false result. But this chance can be made as small as desired. If the chance of the software failing is made smaller than the chance of the hardware failing (or of the user spontaneously combusting, or whatever), there's little to worry about.
The Dining Philosophers Problem
Imagine a bevy of philosophers sitting around a large, round table. In the middle of the table is a bowl of spaghetti. Between each pair of philosophers is a single fork. (If you prefer your philosophers Chinese, substitute a bowl of rice and a single chopstick.) All these philosophers do is one of two things: think and eat. (They already have tenure, so publishing is not important.)
To eat, they need to use the fork on either side of them, which they can only pick up if their neighbor isn't using it. (Hygienists are requested to keep their hangups to themselves.) There is no communication between the philosophers. (Arguments are not conducive to their digestion.) Is there any protocol that can be implemented to ensure that all the philosophers can eat, and that none ever dies of starvation? Imagine that the philosophers have this simple eating protocol:
IF hungry REPEAT (take left fork if available) UNTIL (holding left fork) REPEAT (take right fork if available) UNTIL (holding right fork) REPEAT eat UNTIL NOT hungry (drop right fork) (drop left fork)
This is all well and good. When philosophers are hungry, they start grabbing forks. When they have both of them, they start eating. When finished eating, they drop both forks. No problem, until everyone gets hungry at the same time. In that case, everyone grabs for their left forks and waits for their right fork to become available. But since that fork has been grabbed by another philosopher, the system locks up forever. All the philosophers starve while waiting for the colleague on their right to drop his or her fork. In fact, you can prove that any deterministic algorithm for the philosophers has the potential to lock up the system. There are solutions that involve additional resources: buying more forks, limiting the number of philosophers at the table so there is always an empty seat, or hiring a waiter to make sure that everyone gets the chance to eat. Some solutions also require each philosopher to follow a different algorithm. But no fully distributed, fully symmetric solution to the problem exists that does not involve additional processors. To solve the problem we need something more.
D. Lehman and M.O. Rabin found that something more in probability. Each philosopher randomly picks up either the left fork first or the right fork first and drops the fork if the other is not available. The procedure is repeated until the philosopher has both forks. Then he or she eats until content and drops both forks. This method ensures that every philosopher gets the chance to eat. Although theoretically possible, in reality, no philosopher will ever starve:
IF hungry (flip a coin to choose left or right at random) IF left THEN REPEAT (take left fork if available) UNTIL (holding left fork) ELSE REPEAT (take right fork if available) UNTIL (holding right fork) IF (other fork is available) THEN (pick up other fork) REPEAT eat UNTIL NOT hungry (drop left fork) (drop right fork) ELSE (drop fork) (exit and re-enter procedure)
Big deal, you might rightly say. We now know how to hold dinner parties for philosophers. Send your solution off to Miss Manners. What in the world does it have to do with programming?
Plenty. This kind of problem comes up all the time in computer networks, where different processors compete for shared resources. How do you make sure that every processor can access a file server or printer in such a way that the system doesn't lock up? Probabilistic algorithms.
Finding Large Primes
Everyone has written a factorization program. It works great for small numbers, but tends to bog down with thousand-bit numbers. However, if you're only interested in whether or not a number is prime (and not what its prime factors are), there are ways to use probability and number theory to quickly check if a number is prime.
There are two probabilistic primality testers: Gary Miller and Michael Rabin's method and Robert Solovay and Volker Strassen's method. The Miller-Rabin method uses results from number theory to generate a series of "tests" of primality. The chance of a number passing a test and not being prime is less than one half. And each test is orthogonal, so the chance of a number passing n tests and not being price is 1/(2").
To test a number, n, for primality, first generate a random integer k, such that 2 < k < n -1. Perform the test on n with k. If n passes the test, k is said to be a "witness" of n's primality (the probability of this happening if n is not prime is less than 50 percent). If k fails the test, then n is definitely not prime. If there are 100 witness to k's primality, the probability of n not being prime is less than the probability of random quantum fluctuations crashing your program.
Listing 1 is the algorithm with 100 witnesses. Thomas H. Cormen, Charles E. Leiserson, and Ronald L. Rivest go into the number theory in detail. Remember, to generate prime numbers of any useful length for cryptography, you will need some kind of package that works with very large numbers.
Miller-Rabin (n) FOR i := 1 to 100 DO a := (a random number between 1 and n-1) d := 1 FOR j := (number of binary digits of n-1) DOWNTO 0 DO x := d d := (d*d) mod n IF d = 1 AND x<>1 AND x<>n-1 THEN RETURN "n is definitely composite" IF (the ith binary digit of n-1) = 1 THEN d := (d*a) mod n IF d <> 1 THEN RETURN "n is definitely composite" RETURN "n is almost definetely prime"
If you're searching for a long pattern string inside an even longer text string, comparing the two strings bit for bit can get tedious. Use a CRC funtion to hash the pattern into a 32-bit number. Then, hash each of the possible text substrings using the same CRC function. Now, all you have to do is compare the pattern's hash with each of the substrings' hashes. If the hashes are the same, then the text substring equals the pattern.
The chance of two different strings hashing to the same 32-bit value is 1 in 2^32. If you're really paranoid, you could compare matching strings bit for bit. But since you're more likely to get killed in a car accident on the drive home than to have two different strings hash to the same value, I would save my worrying for more important things.
The other benefit of this algorithm over other techniques is that there are no tables to precompute. You don't need to keep the entire text is memory. The algorithm can even run in real time, constantly checking text data as it comes in over a communications port or is read in from tape, possibly looking for some obscure high-energy physics event.
Still More Randomness
There are other applications of randomness in computer science. Cryptography is just teeming with them. Probabilistic algorithms are used in fault-tolerant systems. Some LAN protocols are based on them. They might seem like interlopers in the nice, deterministic world of programming, but they are here to stay.
Cormen, Thomas H., Charles E. Leiserson, and Ronald L. Rivest. Introduction to Algorithms. New York, N.Y.: MIT Press/McGraw Hill, 1990.
Harel, David. The Science of Computing, New York, N.Y.: Addison-Wesley Publishing Co., 1990.
Lehman, D. and Michael O. Rabin. "On the advantages of free choice: A symmetric and fully distributed solution of the dining philosophers problem," Proceedings of the Eighth ACM Symposium on the Principles of Programming Languages. (1981): 133-138.
Miller, Gary. "Riemann's hypothesis and tests for primality," Journal of Computer and System Sciences, Vol. 13. (1976):300-317.
Rabin, Michael O. "Probabilistic algorithms for testing primality," Journal of Number Theory, Vol. 12. (1980):128-138.
Schneier.com is a personal website. Opinions expressed are not necessarily those of Co3 Systems, Inc.