Lousy Random Numbers Cause Insecure Public Keys

There’s some excellent research (paper, news articles) surveying public keys in the wild. Basically, the researchers found that a small fraction of them (27,000 out of 7.1 million, or 0.38%) share a common factor and are inherently weak. The researchers can break those public keys, and anyone who duplicates their research can as well.

The cause of this is almost certainly a lousy random number generator used to create those public keys in the first place. This shouldn’t come as a surprise. One of the hardest parts of cryptography is random number generation. It’s really easy to write a lousy random number generator, and it’s not at all obvious that it is lousy. Randomness is a non-functional requirement, and unless you specifically test for it—and know how to test for it—you’re going to think your cryptosystem is working just fine. (One of the reporters who called me about this story said that the researchers told him about a real-world random number generator that produced just seven different random numbers.) So it’s likely these weak keys are accidental.

It’s certainly possible, though, that some random number generators have been deliberately weakened. The obvious culprits are national intelligence services like the NSA. I have no evidence that this happened, but if I were in charge of weakening cryptosystems in the real world, the first thing I would target is random number generators. They’re easy to weaken, and it’s hard to detect that you’ve done anything. Much safer than tweaking the algorithms, which can be tested against known test vectors and alternate implementations. But again, I’m just speculating here.

What is the security risk? There’s some, but it’s hard to know how much. We can assume that the bad guys can replicate this experiment and find the weak keys. But they’re random, so it’s hard to know how to monetize this attack. Maybe the bad guys will get lucky and one of the weak keys will lead to some obvious way to steal money, or trade secrets, or national intelligence. Maybe.

And what happens now? My hope is that the researchers know which implementations of public-key systems are susceptible to these bad random numbers—they didn’t name names in the paper—and alerted them, and that those companies will fix their systems. (I recommend my own Fortuna, from Cryptography Engineering.) I hope that everyone who implements a home-grown random number generator will rip it out and put in something better. But I don’t hold out much hope. Bad random numbers have broken a lot of cryptosystems in the past, and will continue to do so in the future.

From the introduction to the paper:

In this paper we complement previous studies by concentrating on computational and randomness properties of actual public keys, issues that are usually taken for granted. Compared to the collection of certificates considered in [12], where shared RSA moduli are “not very frequent”, we found a much higher fraction of duplicates. More worrisome is that among the 4.7 million distinct 1024-bit RSA moduli that we had originally collected, more than 12500 have a single prime factor in common. That this happens may be crypto-folklore, but it was new to us, and it does not seem to be a disappearing trend: in our current collection of 7.1 million 1024-bit RSA moduli, almost 27000 are vulnerable and 2048-bit RSA moduli are affected as well. When exploited, it could act the expectation of security that the public key infrastructure is intended to achieve.

And the conclusion:

We checked the computational properties of millions of public keys that we collected on the web. The majority does not seem to suffer from obvious weaknesses and can be expected to provide the expected level of security. We found that on the order of 0.003% of public keys is incorrect, which does not seem to be unacceptable. We were surprised, however, by the extent to which public keys are shared among unrelated parties. For ElGamal and DSA sharing is rare, but for RSA the frequency of sharing may be a cause for concern. What surprised us most is that many thousands of 1024-bit RSA moduli, including thousands that are contained in still valid X.509 certificates, offer no security at all. This may indicate that proper seeding of random number generators is still a problematic issue….

EDITED TO ADD (3/14): The title of the paper, “Ron was wrong, Whit is right” refers to the fact that RSA is inherently less secure because it needs two large random primes. Discrete log based algorithms, like DSA and ElGamal, are less susceptible to this vulnerability because they only need one random prime.

Tags: academic papers, cryptanalysis, cryptography, keys, random numbers

Posted on February 16, 2012 at 6:51 AM • 49 Comments

Comments

D0R • February 16, 2012 7:02 AM

Very interesting analysis of the paper, thanks.

Sadly, as usual, popular newspapers have reported it with too much hype — one of them alerted readers that “now the bad guys can crack any key in minutes”.

Brian Raaen • February 16, 2012 7:29 AM

You have the wrong link for Fortuna. It should be http://en.wikipedia.org/wiki/Fortuna_(PRNG)

GregW • February 16, 2012 7:38 AM

There’s some independent simultaneous research on this topic from some other security researchers who just rushed out a blog post. Their findings appear to be– at a high level– similar, although as I read it, it seems clearer from these second researchers that more of the problems may be from embedded devices (VPNs, firewalls, etc), and thus perhaps due to low entropy rather than bad PRNGs per se:

https://freedom-to-tinker.com/blog/nadiah/new-research-theres-no-need-panic-over-factorable-keys-just-mind-your-ps-and-qs

Lloyd • February 16, 2012 7:38 AM

The link for Fortuna is now OK: I added a redirect on the Wikipedia side. No need for it to be updated here now.

Carlo Graziani • February 16, 2012 7:45 AM

Without venturing into conspiracy theories about deliberate weakening of RNGs, I think it’s worth pointing out that RNGs are used for (at least) two distinct purposes, and that good design features and practices for one use are terrible features for the other.

In computational science, RNGs are used to power Monte Carlo simulations along. For this application, “true” randomness is of course important, but so is repeatability. You need to be able to replicate a Monte Carlo simulation bit-for-bit, for debugging purposes and for the sake of scientific repeatability. So even though the numbers are pseudo-random, the seeds and sequence starts must be recorded, and the RNG had better make the same stream of numbers given those initialization values.

In security applications, repeatability is not only unnecessary, it’s downright dangerous, as exemplified by the research above. RNGs for this class of applications should give some strong guarantee of non-repeatability, by gathering entropy from system electronic noise, keypress rates, etc. Which would be a terrible idea for scientific applications.

It’s a subtle distinction, possibly lost on some security coders (and sadly also on some computational scientists). Selection of an RNG algorithm and/or initialization practice from the wrong application category can produce insecure code in one case, scientific malpractice in the other.

Gweihir • February 16, 2012 8:09 AM

There is a trade-off in an entropy-starved situation: You can let the user wait (or enter a few random keystrokes) or you can be insecure. On the technical side, if it is Linux just read the number of bits you need from /dev/random (not /dev/urandom!) and you are fine. This may take a while though.

My guess is that the developers responsible are either incompetent and had no idea what they were doing (would not be a surprise) or decided to go for convenience instead of security (would not be a surprise either).

Side note: I know of cases where the CPU used in a device actually had a good true-RNG, but it was not used in key-generation or used wrongly due to developer incompetence. “Incompetence” here means they failed to read the half-page in the documentation that states that the bits delivered do not have 1 bit/bit of entropy but a lower value.

This is really not a technological problem.

Danny Moules • February 16, 2012 8:13 AM

It’s hardly surprising that machines that generate RSA keys straight out of factory have predictable predictability issues.

What is surprising is that no security personnel at the device vendors thought about shouting about this at any point – entropy, and its consideration in security assessments, shouldn’t be a black art.

Will • February 16, 2012 8:20 AM

Were there any weak debian keys among them?

(Or home routers and appliances with debian baked into them?)

Jakub Narębski • February 16, 2012 8:25 AM

What suprised me is that Arstechnica article states that sharing prime factor is worse for security of RSA, than for DSA.

Can anyone tel me why it is so?

llewelly • February 16, 2012 8:26 AM

Only 0.38%? Good news if you ask me, Much better than I would have expected. Very few people know how to test an RNG. Worse, it’s easy to screw up; common preconceptions about “random” will confuse at every step of the way.

And even among crypto geeks who /know/ how to test (not me; I once learned the theory behind testing RNGs, but determing the output of Mersenne twister was better than a linear congruential was the farthest I ever went ), how many of them test more than one or two of the RNG reliant pieces of software they use?

Further, I’m convinced (and I doubt many here will disagree) this is a general problem with security related software; it’s prone to extremely subtle bugs, which he vast majority of users, even most experts, are unlikely notice, which are nonetheless potential security problems.

But there is good news: This particular weakness is unlikely to be exploited; there are host of easier-to-exploit weaknesses out there.

mikeazo • February 16, 2012 8:47 AM

@Will, they removed any of the weak debian keys with known factors from the dataset.

Adam • February 16, 2012 9:05 AM

Surely a homegrown RNG would help here given that this appears to be a class break of some RNG being used in the wild. Not that I advocate using a homegrown RNG, just saying.

James Sutherland • February 16, 2012 9:08 AM

A lousy random number generator must be one of the harder vulnerabilities to check for – particularly as a concealed fault. Easy to imagine someone having slipped in “if timezone==Iran return 1;”…

Back as a student, I remember writing our own entropy source – getting the user to click on a series of pseudo-randomly-located boxes on screen. We hashed the exact timestamp, x and y position within the box and a random number from the existing PRNG together.

@Gweihir: the ‘just wait’ for more entropy is fine for some situations, but a busy server doing SSL could find it a bottleneck – particularly if an attacker knows that and opens a load of junk sessions to drain the entropy pool. I seem to recall entropy starvation becoming a performance bottleneck somewhere: on a headless server, entropy sources like keyboard and mouse movements don’t apply – and of course network timing can be externally influenced, making it suspect.

Mike T • February 16, 2012 9:23 AM

Heh. Randall got this one too: http://xkcd.com/221/

Mike

Brian • February 16, 2012 9:25 AM

Fortuna is definitely a good bet for secure RNG design. But it still relies on collecting sufficient entropy to seed the generator, which is a non-trivial problem. The NEED to do so is frequently brought up, but getting good entropy values for seeding still seems to be hard (particularly for server type machines).

Fortuna basically solved the problem of how to generate cryptographic pseudo random numbers from small amounts of entropy as far as I’m concerned (as long as the underlying crypto remains secure). Figuring out how to get good initial entropy still seems like an open problem.

Clive Robinson • February 16, 2012 10:20 AM

I would strongly suggest people read the paper because they make other statments about the number of keys with issues such as,

It pales, however, compared to the situation with RSA. Of 6.6 million distinct X.509 certificates and PGP keys (cf. [1]) containing RSA moduli, 0.27 million (4%) share their RSA modulus, often involving unrelated parties. Of 6.4 million distinct RSA moduli, 71052 (1.1%) occur more than once, some of them thousands of times..

Now as regular readers of this blog know “diceware” gets mentioned often when random number generation is discussed you can read the diceware authors take on this at,

http://diceware.blogspot.com/2012/02/was-whit-really-right.html

He thinks the research points to a possible signiture to “malware” fritzing the calls to the random number generator which Bruce has also mentioned.

Now this is far from a new idea Adam Young and Moti Yung proposed this and other ideas to fritz PQ pairs back last century under their ideas for “Cryptovirology”,

http://www.cryptovirology.com/

Whilst I would not rule it out (I’ve done similar things myself as I’ve indicated in the past on this blog) it’s more likely to be poor programing in some code library software engineers are using virtualy sight unseen.

Why do I think this, well Adam and Moti published information that showed how to easily hide your fritzing into the selection of one of the PQ primes that would be undetectable to detect with only the PQ pair or the resulting Pub/Pri keys.

Essentialy what you do is use the immense redundancy in the resulting PQ pair to hide a shortend (about 1/6th of total PubKey length) hidden RSA message for which you only have the PriKey, this message contains an offset number as to where one of the PQ primes was selected from. Thus you cut your search space down to a trivial number to find one of the primes (the other being genuinely random drops out).

Now as Bruce notes generating genuinely random numbers is at best a difficult problem (you first have to realise that there are no random numbers only random sequences and random is such a nebulous term you can only wave your arms around 😉

However the usual place to start looking is at what are quaintly called True Random Number Generators (TRNG’s). The way they work is to take input from a physical world source (I’ll leave quantum out of this simple explination). Now one the consiquences of physical world sources is their output is effectivly modelled as two or more signals combined (by addition, multiplication or both in one or more dimensions such as amplitude, time or sequence). One is in some way predictable and we call it “bias” and the other we cann’t detect predictability and that’s the one we want to use as our source of entropy.

Now seperating the bias from the desired unpredictable signal is even at the best of times wastefull and can appear as some kind of “voodoo ritual” method.

So many designers don’t bother, they figure jusst putting it through a crypto or hash algorithm will solve the problem. It doesn’t yo have no more entropy in the result than you had before, it just “looks” more random… The problem with this is you can get sequence correlation between two or more generator outputs, which can in some cases cause symptoms like those that have been found in those RSA Certs.

Now one of the problems that arises is the value of each bit of entropy. Depending on the source and how the bias is removed you could be getting only a very small percentage of the bits coming out of the source. The more complex the bias removal process the more bits you get but there is only so far you can go. Thus the designer has to make a trade off between CPU cycles and lines of code used to remove bias and the number of bits required.

Now back in the last century one or two people suggested a heretical idea of “entropy spreading”, and it caught on. Unlike hashing the output which does not change the entropy the idea was to apply each bit of entropy to a pool of bits in such a way that it spread it’s value across many of the bits in the pool and in some systems over time as well.

On further reflection you can come up with a different way to spread entropy, you take a stream cipher like RC4 or other reasonably efficient “pot stirrer” and apply each bit of entropy as it occurs to “jog the pot”. Like throwing a stone in a pool the effect of the bit ripples out across the entire surface and folds back and mixes unpredictably with the other pot jogging bits. The advantage however is that the number of unpredictable bits you can get out of the stream cipher is considerably higher (look up unicity distance to see by how much).

A further thought on this is that of “non determanistic sampling” that is on many systems it is not possible for a piece of code to regularly sample the output of the stream cipher, therefor this adds further to the unpredictably of the generator output as you can use it as another pot jogger as well.

However there is a catch, on certain systems and certain operating system types there is little or no uncertainty. This happens on many embbeded systems especially those that have a Real Time OS (RTOS) or use fixed low latency design for Real Time response (ie you want your foot going down on the break peddle in your car getting through the ABS system as rapidly as possible).

Also no matter how fast your source of entropy is it still takes time to build up to a usefull number of bits, and rapidly sampling the entropy pool can rapidly bring down the effective level of entropy in the pool.

Thus one solution is to “block” a requesting process untill such time as sufficient entropy has built up. This can cause significant problems in that the process can be highly non linear and this in many systems is highly undesirable (imagine what would happen if the brake peddle signal got blocked?). And I’ve seen some designs where the blocked processes were also the ones adding entropy to the pool so dead lock could be just a breath away followed by total system lock up, not good on a PC but in an embedded system easily fatal.

Not knowing much of the above before you build a secure system that needs a suitably specified TRNG means that the chances are high that you can and in all probability will shoot yourself in the foot and some of your systems users in the head.

John Kelsey • February 16, 2012 10:25 AM

Jakob:

DSA is designed so it’s perfectly okay to share the system parameters
(g,p,q) among many users, and also that there’s no secrecy in these values. If any users of the same parameters get the same secret key x and public key y, that’s a big problem, but it doesn’t look like they saw any of that.

I’m not sure why this happened. One guess is that if you generate your own (g,p,q) randomly, that gives you enough time to get from a relatively predictable system state to a more unpredictable one.

My intuition here is like this: On a closed-off computer with no user interface and no physical hard drive and nothing going on in terms of network traffic, when you start up, your processor is in one of a fairly small number of possible states. Techniques for drawing entropy from the OS are ultimately just measuring this state, and so there’s just not much entropy to be found.

Over time, I assume a certain amount of unpredictability sets in. If the device is plugged into a network, it may be processing some packets from time to time. There may be some other sources of unpredictability–for example, the device may have to manage its temperature by turning on/off a fan based on a thermostat, and that may affect the behavior of the system somehow.

If you’re generating your DSA parameters first, at least using the technique for generating a provably ok set of parameters, there’s a fair bit of processing going on. (You have to generate a 160-bit prime and a seed, then use that to generate a 1024-bit prime so that p%q==1.) My intuition is that during all this churning, there is enough time for the machine’s state to get somewhat more unpredictable–by the end, it can be in a larger set of possible states. (For each SEED choice, we get a different number of iterations of the process, meaning a different set of possible machine states.)

So I have this mental picture that what we’re really doing is:

RSA:
[small set of starting machine states]
Generate p
[slightly larger set]
Generate q
[slightly larger set]

DSA:
[small set of starting machine states]
Generate q and seed, and do lots of spinning and back-and-forth to get (g,p,q).
[much larger set of machine states]
Generate x and y=g^x mod p

I am not at all sure about this–I could be completely wrong. But this would explain something about that pattern we’re seeing here. Though it’s also quite possible that the DSA implementations are just better about getting fresh entropy, or that DSA implementations are done by higher end/more capable devices with more going on.

John Kelsey • February 16, 2012 10:37 AM

Bruce,

I think if you were installing an intentional back door, you’d have included the devices’ ethernet addresses in the RNG seed. That would make the weakness invisible to this kind of examination, while still letting you brute force the low-entropy prime factors cheaply as needed.

I think this problem comes entirely down to assessing entropy correctly and waiting to generate RNG outputs till you have enough of it. I don’t think Fortuna would help much–this is a place where what’s needed is dumping every bit of entropy in your possession into the pool just before you generate your high-value keypair, not holding some back in hopes that you’ll eventually reseed with enough entropy to get to a secure state. If your RNG isn’t in a secure state once you generate your keypair, it probably doesn’t matter whether you get into one later on.

Personal OTP • February 16, 2012 10:53 AM

Thank you for the Fortuna reference, particular given the links to code; however,
1) recommended code for entropy generation on various systems would be exceptionally useful

and

2) even more useful would be recommendations on existing random number generators, such as
gpg –gen-random 2 (for a given version)

openssl rand (for a given version/platform)

Truecrypt’s keyfile generator (for a given version/platform)

Microsoft .NET System.Security.Cryptography.RNGCryptoServiceProvider (for a given version/platform)

(hardware) Araneus Alea I hardware (possibly used in a reduced RF environment)

Users, unfortunately, can be told “Use a good random number generator” and “make sure the entropy pool is sufficient”, but at the user level (even highly technical users), that’s not very useful.

Gary • February 16, 2012 11:40 AM

Anyone have experience with these? http://www.idquantique.com/true-random-number-generator/products-overview.html

Petréa Mitchell • February 16, 2012 11:57 AM

“2) even more useful would be recommendations on existing random number generators”

In addition to the above, here’s an online service for random number generation.

Robert Thille • February 16, 2012 12:29 PM

Does anyone have comments on good hardware RNGs? Particularly the VIA C3, on chip RNGs?
http://www.via.com.tw/en/initiatives/padlock/hardware.jsp

Tony • February 16, 2012 1:19 PM

Coming soon to the X86 processor (Ivy Bridge):

http://en.wikipedia.org/wiki/RdRand

Patches are already in Linux to make (some) use of it.

Oracle • February 16, 2012 1:28 PM

Fortuna has one big problem – it’s a “proprietary” algorithm. There is no spec available online…

The Wikipedia article does not count; or rather it counts as the best spec. Which is obviously deficient for proper implementation – there are various details to get right, various tradeoff to be made.

And it’s not to useful to look at various implementations lying around – there is no way to understand what code is part of Fortuna and what is random hack.

So Bruce, if you really want to promote Fortuna, please make the spec public…

Dan • February 16, 2012 2:48 PM

This may be due to bad entropy sources in embedded devices. Servers are starting to see the same issue. SSDs have no rotational latency to use as an entropy source. They have no keyboards or mice to get entropy from the user. If running in a VM they may have biases in their view of event times so that multiple VMs could get the “same” entropy data. Hence the need for HW RNG or DRNG (Intel’s solution in Ivy Bridge).

Thunderbird • February 16, 2012 3:24 PM

In addition to the above, here’s an online service for random number generation.

Of course, just in case it isn’t obvious to anyone, letting someone else generate a random number you use to generate a key is probably a really bad idea if you are at all concerned with the key being, you know, secret. I suspect this is obvious to most all people that might comment here, but might not be to someone that runs across this conversation through the wonders of Das Intartubes.

SSL Servers • February 16, 2012 4:21 PM

There are quite a few public random number servers. Consider:

openssl s_client -connect google.com:443 < /dev/null | sha256

Using something like this–ideally using multiple servers–as input to an entropy pool may be useful for situations where an attacker cannot see all of the network traffic. Many systems generate keys when they boot the first time (e.g,. ssh); this is also the worst-case for available pool entropy…

Of course, proper hardware support is the only decent answer (VIA’s Padlock or Intel’s RdRand).

Clive Robinson • February 16, 2012 5:19 PM

@ Oracal,

Fortuna has one big problem…

Have a look at ,

http://th.informatik.uni-mannheim.de/people/lucks/papers/Ferguson/Fortuna.pdf

It’s a PDF of a powerpoint presentaiton by Niels Ferguson going into the background of Fortuna’s design and the things considered.

Although not giving you a full specification, it contains more than sufficient info for you to build your own version (and you could always go and buy the book 😉

Essentialy the working end of Fortuna is AES (or other block cipher) in counter mode. The initial seed value is the AESkey and value for the counter. It is updated by changing the counter or key and sometimes both. The update is done by K entropy pools where the pool used is based on 2^k, thus the current suggestion of K=32 should give a number of years (see Yarrow for an earlier idea). When either 1MByte of update or output is read from the generator the key value is updated. This is done by clocking the counter twice and using the two values to get the new key value.

The way the updating of the entropy pools is done is a little more complex. Have a look at,

http://rennes.ucc.ie/~robertmce/presentations/Fortuna_McEvoy.pdf

As far as I’m aware there is opensource code in atleast C++ or Python easily available that is sufficiently well written to be readily understandable.

On a more general note about randomness and it’s uses have a look at,

http://www.iro.umontreal.ca/~lecuyer/

His papers are definatly worth a look through if you have the time.

paul • February 16, 2012 6:13 PM

I remember reading a couple of years back that various gov’t agencies had compromised some aspects of various FOSS tools but never found out what was targeted or if the damage had been found/undone.

Perhaps RNGs were the target.

Brian • February 16, 2012 6:35 PM

@Clive Robinson

“However there is a catch, on certain systems and certain operating system types there is little or no uncertainty. This happens on many embbeded systems especially those that have a Real Time OS (RTOS) or use fixed low latency design for Real Time response (ie you want your foot going down on the break peddle in your car getting through the ABS system as rapidly as possible).”

That’s basically what I see as the fundamental problem with cryptographically secure random number generation. Computers, by design, are usually pretty good at having predictable behavior. Entropy in computer systems often seems to boil down to measuring external things like user behavior (which is not always practical) or behavior determined by a sufficiently large number of factors that predicting the outcome is difficult.

One thing I think might be worth exploring is the idea of maximizing use of entropy in random number generators. Decades of crypto research have given us tools that transform relatively short values into long unpredictable outputs (stream ciphers, for example). Random number generators could potentially get a lot more cryptographically random output for each bit of true entropy input than they do right now. The end result would be an ability to be a lot more careful in entropy collection without an appreciable drop in random number generation speed.

Marian Kechlibar • February 17, 2012 3:32 AM

Interesting.

I like Fortuna and I have implemented a C++ port of it for mobile devices. (On smartphones, entropy sources like the camera are very useful. Tons of entropy per single frame, even in viewfinder regime.)

Attacking the PRNG is very advantageous for adversaries, as it does not create any compatibility issues.

If you’re not sure about your particular PRNG, I would suggest running the battery of NIST tests on the PRNG output from time to time. They are easy to implement, but they do take some CPU cycles.

Patrick • February 17, 2012 11:58 AM

“The majority does not seem to suffer from obvious weaknesses”..

But :
“containing RSA moduli, 0.27 million (4%) share their RSA modulus, often involving unrelated parties. Of 6.4 million distinct RSA moduli, 71052 (1.1%) occur more than once, some of them thousands of times..”

What the researchers seem to have found is the RNG leaking it’s non-randomness .
ATM I see no reason not to assume that ALL RSA-keys are equally non-random .

Peter • February 17, 2012 4:08 PM

I stopped dead in my tracks the day it occurred to me that the Debian key generation bug could have been intentional sabotage. I’ve thought about every bug announcement differently since.

Alan Kaminsky • February 18, 2012 4:55 AM

The repeated RSA modulus factors might not be due to a bad random number generator or to bad seeding of a good random number generator.

Here’s another scenario: The software picks a random number, tests it for primality, and finds it isn’t prime. But then — rather than pick another random number, which might take a long time waiting for the entropy pool to replenish — the software just repeatedly increments and tests the previous number until it hits a prime. Since primes of the requisite size are (relatively) rare, there’s a small chance that two runs of the software will hit the same prime — even with a cryptographically strong random number generator seeded from a true random source.

Boneheaded? Of course. But it could have happened that way.

Clive Robinson • February 18, 2012 5:50 AM

@ Alan Kaminsky,

Boneheaded? Of course. But it could have happened that way

Err no trust me on this I’ve seen it done that way many times, from a software developers reasoning it makes one whole load of sense for a couple of reasons.

Firstly because it’s guaranteed to find a prime within a short period of time. And secondly because it “stays in chache” and is thus very fast.

It’s also partly done that way because some people incrorrectly designed de-bias systems for post processing TRNG output to work that way.

This is because the reasoning is somewhat as follows,

1, All primes (except 2) are odd.
2, Therefore the LSBit of a prime is always 1.
3, Half the numbers from a TRNG are even.
4, Make TRNG output odd.

There are three basic ways to do step four,

1, Test if even and increment.
2, Just set the LSB.
3, Rotate TRNG “word” till LSB is set.
3, Shift left one bit and set LSB.

The nominaly correct way is the last way (usually no branch or attendent cach issues), but many programers do it the first way.

And the “Add 2 and retest as prime” also makes a whole load of sense if your plan is to fritz the key pair as it reduces your search space a great deal (see my post above).

As “entropy” is like “fine wine” it “takes time to mature” and is thus “an inordinatly expensive commodity”. Another way you sometimes see is generate a whole stream of small TRNG numbers ( ie a word in size) then concatinate to make the large prime, if not a prime then swap bottom word with another new TRNG word or use a circular buffer and write over the bottom number and re-test.

Then there is “binary swap” or it’s legion of variations, generate a large random number as above if no good then swap within the array of words in a binary (or other) pattern.

But… At the end of the day jprimes are a ‘sparse data set’ so ust generating more and more TRNG output is NOT guaranteed to ever finish or even get close… So pragmaticaly you do have to switch to incrementing something at some point.

As I described above it’s a way I’ve used in the past to fritz PubKeys amongst other things (to demonstrate a point to managment about code review process). The only extra I used was to use a BBS random number generator as well (which conveniently hid the shorter PubKey used to encrypt the start point info).

MarkH • February 18, 2012 4:23 PM

@Alan Kaminsky:

“Since primes of the requisite size are (relatively) rare, there’s a small chance that two runs of the software will hit the same prime — even with a cryptographically strong random number generator seeded from a true random source.”

I’m not confident that I follow the reasoning here — is the idea that the increment-and-test algorithm increases the probability of repeating primes, versus generating a new pseudo-random bit sequence for each trial? It seems to me that the probability of repeated primes is not affected, but I’m interested in seeing any argument to the contrary. [Note: if the pseudo-random bit source is broken, the likelihood of repeated primes might well vary between these algorithms, depending on the nature of the defect.]

When I wrote an RSA key generator, I used the increment-and-test algorithm, on the premise that entropy is rare and valuable, and that numerous repeated draws from the entropy-pooling random generator would decrease the entropy of the resulting key.

By my estimate, generating a 2048-bit RSA modulus with a new pseudo-random candidate for each trial would consume on average of more than 350 kbits (versus 2 kbits for the increment-and-test approach). I suppose it boils down to how good one presumes the source to be. For me, that entropy is best at first and may diminish with repeated draws seems the more conservative assumption.

Alan Kaminsky • February 18, 2012 6:45 PM

@MarkH

Okay, let me elaborate. We are searching for a prime in a given interval. Method A is to generate random numbers repeatedly until the primality test passes. Method B is to generate one random number, then increment it repeatedly until the primality test passes. We assume the random number generator’s output is uniformly distributed in the given interval.

With Method A, the probability of picking a particular prime is the same for every prime in the interval, namely the reciprocal of the number of primes in the interval.

With Method B, the probability of picking a particular prime is not the same for every prime in the interval. In this case, the probability of picking a prime p is the length of the gap between p and the previous prime, divided by the length of the whole interval. But because the gaps between successive primes are not all the same, Method B picks some primes with higher probability than others. This makes it slightly more likely that two runs of Method B will pick the same (higher-probability) prime, compared to Method A.

It makes no sense to use a true random entropy source to pick security parameters, such as RSA primes, with biased probabilities. Instead, use a true random entropy source to seed a cryptographically strong random number generator. Then use the generator with Method A to pick RSA primes. That way, you don’t have to consume “valuable” entropy from the entropy source each time you generate a random number.

MarkH • February 18, 2012 10:43 PM

@Alan Kaminsky:

“It makes no sense to use a true random entropy source to pick security parameters, such as RSA primes, with biased probabilities. Instead, use a true random entropy source to seed a cryptographically strong random number generator.”

I agree fully with the logic of that — if the generator is unbreakable, and there is no risk that seed might be otherwise compromised, then a seed should be usable indefinitely.

The objection to Method B is interesting, and for me a novel one. In a brute force (trial division) attack on an RSA modulus, it would be advantageous (if Method B were suspected) to first test “long primes” (those following longer gaps) before short primes.

I examined an interval of small (non-cryptographic) primes (the 50th million of primes), and found that about half the primes expected from Method B would come from the “longest” 20 percent of primes in the interval. Now, I don’t know to what extent the distribution I saw applies to cryptographic-size primes, but if it did, I reckon a 2.5 times speed-up of the trial division attack, equivalent to a reduction of the modulus length by 3 bits.

Of course, noone has suggested that such a bias toward long primes might lead to any useful attack against a particular RSA modulus.

But could Method B bias account, to any extent, for the repetition of primes observed in the referenced research?

Suppose the distribution of prime gaps had such an extreme bias, that half of Method B primes came from only one millionth of the primes in the interval from which factors of an RSA modulus are drawn*. 1024-bit moduli are typically composed by finding primes whose representation requires 512 bits, of which there are about 2 x 10^151. The assumed severe “Method B Bias” would, accordingly, frequently draw primes from a population of, say, 10^145.

Applying the famous “birthday” approximation yields an estimate of more than 10^70 1024-bit RSA moduli that must be computed (under this extreme assumption of Method B bias) before the probability of a duplicated factor approaches 1/2 between any pair in this superastronomical population.

Those who suggest Method B as an algorithm for finding primes, sometimes also suggest setting a limit on the number of increment-and-test cycles before selection of a new pseudo-random odd. This is based on considerations of speed rather than security, because prime gaps are sometimes very large. However, applying this modification would reduce the Method B bias toward “long” primes.

*Formulae believed to well approximate both the density of primes, and the maximal gap between primes, suggest that the maximal gap is about ln(N) times the mean gap, where N is the upper bound the interval. For 1024-bit RSA moduli, this indicates a crude upper bound for Method B bias more like 100 than 1,000,000.

RobertT • February 20, 2012 12:23 AM

Someone is having difficulty with a TRNG, I can’t imagine why….All you need is a simple circuit with PSRR (power supply reject ratio) of greater than 150dB and absolutely no DC or AC bias.

Unfortunately most methods for removing a DC bias create a signal bias, or a “Tone” as it is called in the Audio Sigma Delta world.

Also the RNG should be super fast, because I need to throw away lots of results that fail some test of randomness (???) or Key strength / suitability.

Oracle • February 20, 2012 7:13 AM

Another fun aspect of the proprietary-ness of Fortuna that I did not think earlier, but ties nicely to this article:

Let’s say I want to implement Fortuna – with some effort, I can get access to the book, one way or another.

Let’s say I want to implement CSPRNG – probably I will skip Fortuna, because it’s not publically available.

But now the interesting scenario – somebody else has implementation called “Fortuna”, and I need to evaluate whether it works and has not been weakened somehow – intentionally or not. Seems I’m out of luck then…

MarkH • February 20, 2012 8:46 AM

@Oracle:

The world has changed! “Proprietary” used to refer to technology for which was secret, and/or for which some kind of license was required.

Apparently, the term has been redefined: “proprietary” is now any technology for which I need to pay more than $0.00, and/or requires that I move my bottom from my chair.

Bruce’s “Practical Cryptography” can be got used from Amazon for less than $16.

I’m old enough to remember making trips to the library, in order to learn cool and useful stuff.

Sincerely,

A. Fossil

Damon • February 20, 2012 2:42 PM

apropos Dilbert comic:

http://dilbert.com/strips/comic/2001-10-25/

Gilbert Midonnet • February 20, 2012 6:54 PM

@ A. Fossil,

Amen to that.

Carson • February 22, 2012 4:05 PM

@A. Fossil: In some places, it’s even worse than that; in at least one US state, “proprietary software” legally means pretty much any software that’s covered by a license of any sort. Which sort of makes it meaningless, I think.

_Arthur • February 23, 2012 10:42 AM

Kaminsky is correct.
To “generate” the prime number necessary for encryption, you start at a “random” point within a specific range, say, 200-digits numbers.
You then test the candidates for primality, sequentially.
If the range where you search for big prime numbers is too narrow (it is highly likely that everyone uses the same range), or the random number used for your “random” start point is weak, it becomes increasingly likely that you will end up picking the same prime number than someone else. Everyone is using the exact same primality test, after all.

Clive Robinson • February 23, 2012 2:44 PM

@ _Arthur,

Everyone is using the exact same primality test, after all

Err no they are not…

In the not so distant past, asside from the “Brut Force” method, there was not a diffinitive test for primality in very large primes, just a probabilistic test, which had the advantage of being very quick.

Then three graduate level students in India came up with a diffinitive test, which whilst not quite as quick as the probabilistic method was “none to shabby”.

There are also other ways of finding primes more quickly than sequential search from a random point.

One of which is based on the simple knowledge that the two numbers either side of multiplying the sequence of primes (including 2) has a better than average chance of being “twin primes” (2x3x5=30, 29/31). Also contrary to what is often said primes do have a fairly regular pattern because they effectivly “reflect” around these points and each sub-multiple down towards the next lowest point.

To see this think about each prime being the fundemental of a harmonic generator, each harmonic strikes out all it’s harmonics from further consideration, this is the basis for the sieve method. Now write out all the numbers from 0 to 60 and underline those that are prime. Now examin the pattern either side of 30 notice any similarities around the point, how about 60-90, 90-120, etc?

Thus if you were lazy you could encode the ofsets from a suitably high reflection point and the first couple of hundred primes. Ask how long a prime the user wanted find the nearest reflection point and use the offsets to give you a much greater chance of finding two primes very quickly.

Oh these reflection point twin primes have one prime with the two Least Significant Bits being both 1, the other 01, which means that the result of multiplying them is 3 mod 4. However as waluable as this may appear to be I would not use them 😉

MarkH • February 25, 2012 12:04 PM

@_Arthur:

[1] If “the random number used for your ‘random’ start point is weak…”

I take “weak” to mean here, that the number is to any significant extent predictable to an adversary. In such case, the resulting key is insecure, no matter how else the rest of the procedure is done. The initial “guess” must be unpredictable, or it’s game over.

[2] “…it becomes increasingly likely that you will end up picking the same prime number than someone else”

Yes — but all the same, very very very unlikely indeed. Typically, 1024 bit keys are based on randomly chosen 512 bit primes. The “narrow range” of 512 bit integers contains (if I did my calculation correctly) about 2 x 10^151 primes.

If (say) one trillion trillion (that is, 10^24) such keys were generated, the probability that there would be even a single duplicated factor among ALL of them, is less than 10^-100.

[3] “Everyone is using the exact same primality test, after all.”

No. There are three well-known tests for pseudoprimality (that can tell whether a number if probably prime), and two practical tests for primality (that can prove a prime to be so), which might be used in real-world cryptosystems. Most of these tests have parameters that must be chosen by the designers, and so may exist in numerous variants. Also, two of the pseudoprime tests require “random” numbers, and so might never run the same way twice.

When RSA keys are based on pseudoprimality testing, there is a risk that a modulus factor will be composite, which of course results in an insecure key. The system designer can choose the test parameters to set the probability of this disaster to a level that is judged to be acceptable. However, aside from this risk, the test chosen doesn’t affect the probability of duplicated factors.

Mario • March 22, 2012 11:43 PM

Hi !

Although I am bit late for this discussion, I do not understand why all this enornmous intellectual strain to figure out new PRNGs when a even a lousy hardware generator would give each bit its own piece of entropy ? It is true that hrng bits will be somewhat correlated and somewhat biased (without postprocessing which is not a good idea) but it is still infinitely better than a handfull of perfectly correlated bits produced deterministically by any prng.

It is sad that peolpe are spending so much time and research effort on futile topic especially for crypto purposes where speed and perfect de-biasness are of no importance.

Truong • May 26, 2012 5:30 AM

RSA weakness. duplicate publickey. What is the odds of that in practive. too small.

So just keep using RSA.

I am wondering that is there any way to improve to create primes . I mean there is no duplicate of public key left.

Schneier on Security

Lousy Random Numbers Cause Insecure Public Keys

Comments

Leave a comment Cancel reply