A Broken Random Number Generator in AMD Microcode

Interesting story.

I always recommend using a random number generator like Fortuna, even if you're using a hardware random source. It's just safer.

Posted on October 31, 2019 at 6:24 AM • 29 Comments

Comments

David KowisOctober 31, 2019 7:49 AM

Anyone got a link for how I would practically use Fortuna on my linux hardware? I'm having some trouble finding anything about implementation of Fortuna. I've got haveged running, which helps keep my entropy pools full, but I can't find anything on Fortuna.

LisaOctober 31, 2019 9:15 AM

I am curious as to how Fortuna compares to the 3 standardized DRBG (CTR, Hash, HMAC) from NIST SP800-90A.

I would assumed one is typically better off using standardized crypto which has been extensively vetted for many years.

With regard to PRNG/DRBG, does the AMD chipset provide any hardware source of entropy, such as ring oscillators? And if so, has this entropy source been vetted under various environment conditions, such as temperature, humidity, electrical power, neighbouring components, etc.?

And does the AMD chipset preform any "live" health check on the quality of the entropy collected prior to PRNG/DRBG seeding or reseeding?

Gunter KönigsmannOctober 31, 2019 10:18 AM

If you use the standard DRBG from the NIST the NIST would have been beenstupid if they hadn't kept the backdoor you get when you create it, see also: https://en.m.wikipedia.org/wiki/Dual_EC_DRBG
I once heard that Juniper once discovered that their DRBG was exchanged by a different one with a different backdoor.

Basically the DRBG is (AFAIK) an encryption algorithm that encrypts the memory that keeps the internal state of the random number generator. If you don't know the encryption key it is near impossible to find out the internal state and therefore the next number you'll get. If you know the encryption key you let a server generate one long encryption key using the DRBG, which basically is a random number, decrypt the random number using the DRBG's key and tell your own DRBG to start with this internal state. Once you have done that all future keys your DRBG generates are identical to the keys the server will generate.

Gunter KönigsmannOctober 31, 2019 10:29 AM

Relecant xkcd: https://www.xkcd.com/221/

And reading again what I wrote I mixed the encryption and the decryption key of the DRBG: The encryption key is what the generator uses for getting a new state. The decryption key is the random me for determining the current state from the DRBG's output.

LisaOctober 31, 2019 11:21 AM

@Günter, it is widely acknowledged that Dual_EC_DRBG is backdoored, which is why it was removed from any standards. In fact it was the vetting by skilled cryptographers which resulted it being removed.

However the other 3 remaining: CTR_DRBG, Hash_DRBG, & HMAC_DRBG, from NIST SP800-90A have been extensively reviewed by the global cryptographic community and no issues or weaknesses have been found.

So I don't see any problem with trusting these 3 NIST DRBGs.

However it would be interesting to see if Fortuna has any properties which provide additional advantages over the 3 NIST DRBGs.

Just because something is trustworthy, doesn't make it the best.

Clive RobinsonOctober 31, 2019 12:51 PM

@ Bruce and the usuall suspects,

I always recommend using a random number generator like Fortuna, even if you're using a hardware random source. It's just safer.

Hardware "True Random Bit Generators" (TRNGs) used to be the "gold standard" more years ago than I care to remember and now Quantum Generators are considered the best.

The problem is that "hardware random source" is not something you actually find on modern consumer hardware. As for "ring generators" and such like on silicon chips, they are by definition untrustworthy because the manufacturer in many cases does not provide access to the raw generator output. Worse they hide it behind some Crypto Algorithm to give it a pointless sprinkling of "magic pixie dust". Which means a crapy generator with maybe 2^8 bits of real entropy is made to look like it has 2^32 or much greater entropy when infact it is probably just driving an LFSR that gets it's entire state encrypted by AES or what ever other hardware crypto is on chip.

The other problem which is using external sources of input, such as network packet arrival times, head turbulance on hard drives, delay times on user key presses or mouse movments only works when,

1, You actually have such inputs.
2, They are actuall used .

After all if you have an embedded appliance / device that does not have a keyboard or mouse for user input, no physical hard drive and is on an effectively "not being used" network. It's "going to be a very chilly day in Haydes" befor you get even a few bits to extract even one or two bits of real entropy.

One way to get something in that situation is to use a crypto algorithm running in the interrupt structure in counter mode. If it runs in both the "idle process" and a fast interupt handler, although mainly determanisic the two will quickly cause chaotic --not random-- behaviour that then goes into the entropy pool and it's stirring.

Sometimes you either have to live with what you get, or fake it in a way to difficult for an adversary to make use of.

David LeppikOctober 31, 2019 1:46 PM

Unfortunately, the kernel needs high-quality random numbers at boot time for address randomization. PRNGs like Fortuna need to accumulate entropy over time. I'm not sure how you reconcile those needs when you have a hardware RNG that returns 0xFFFFFFFF all the time.

Clive RobinsonOctober 31, 2019 2:14 PM

@ Lisa,

However the other 3 remaining: CTR_DRBG, Hash_DRBG, & HMAC_DRBG, from NIST SP800-90A have been extensively reviewed by the global cryptographic community and no issues or weaknesses have been found.

A determanistic crypto algorithm is not in any way random. If you are lucky what you get is a determanistic output that an observer can not tell from random by any test they have.

The problem is that all such algorithms have "state" that is of limited size. If via some side channel, penetration, or other attack an attacker can determin some or all the bits of the state information they will eventually --in theory-- be able to reproduce the same output.

Thus where possible you run the algorithm that "stirs the state" as fast as possible but interupt or change it's function by whatever chaotic or random events you have available no matter how little. If you do it properly the attacker can not maintain a copy of the state information.

Fortuna whilst it uses a block cipher it is not realy a determanistic system it uses a largish "entropy accumulator" that consists of several entropy pools which get selectively stired by what ever sources of non determanism you have available.

It uses the block cipher in counter mode, but changes the key used after every request by hashing the entropy accumulator. In effect this becomes a form of "perfect forward secrecy".

To get a grip on how it works you need to look at a diagram of it's structure. As they say "A picture is worth a thousand words" ;-)

I've had a quick scan arroind but unfortunatly I haven't found one.

Frank WilhoitOctober 31, 2019 2:23 PM

~~25 years ago, I worked a bug report, on a platform that was very fashionable for a long time but is now essentially extinct. The problem only manifested in the output of a particular version of a COBOL compiler -- and only on certain machines. It came down to microcode for a CPU that was a third-party clone (the platform being already in its last stages of life). The string-compare-equal instruction always returned the cookie for "equal", even if the strings were not equal.

None of the above details are relevant. The point is that the platform gurus were sincerely, deeply, and innocently shocked by the notion that microcode (of all software!) might have been released feature-incomplete. They confronted the vendor, whose response was to the effect of "Damn, you caught us." Somebody learned something that day and it had nothing to do with microcode.

Clive RobinsonOctober 31, 2019 4:28 PM

@ Frank Wilhoit,

~25 years ago, I worked a bug report, on a platform that was very fashionable for a long time but is now essentially extinct.

Would it by any chance be a PerkinElmer?

PerkinElmer CSD made amongst other things IBM370 look alikes that had a Cobal compiler that was the bain of a mentor and friends early working life...

So much so Annette used "Perkin-Elmer" in place of a more obvious two word expletive. A habit I picked up from her and still occasionaly use in her memory.

PerkinElmers one claim to fame was "University of Wollongong Unix" it was the first port of Unix off of PDP hardware onto a new architecture. And yes PerkinElmer couldn't leave it as was, they had to improve it and it became Edition 7 Unix, thus it had issues just as the Cobal Compiler did.

MarkHOctober 31, 2019 5:42 PM

@Frank, Clive:

The candidate I guessed was Sparc ... but surely a lot of systems have fallen by the wayside during the past quarter century.

"Perkin-Elmer" calls to my mind a long-ago association with a posse of refugees from Perkin-Elmer Data Systems, which included some of the craziest people I have encountered in my working life. There was some crazy-funny, but more than a little crazy-scary.

They were brought in to make everything ultra successful, and managed to accomplish a shocking degree of damage in a pretty short time frame (less than a year, as I recall).

A propos of incompleteness, a late friend was told by one of his former colleagues who had gone on to Microsoft that his group was ordered to ship their code-under-development. It wasn't just feature-incomplete; they hadn't even gotten through test and debug. My friend was told that nonetheless, the software was shipped as it was ...

Frank WilhoitOctober 31, 2019 5:45 PM

@ Clive,

Sorry, nothing to do with the University of Woolloomoolloo. Apologies for excessive coyness; the faulty microcode was for a clone of the DEC PDP-10. Now look up the most successful vendor of such clones and you won't be far off.

The COBOL compiler itself wasn't at fault. Whoever built it with the switch enabling its output to use the EXTEND instructions shares a sliver of the blame. But who among us has never succumbed to the temptation of believing that a Processor Reference Manual might mean what it said?

SpaceLifeFormOctober 31, 2019 5:56 PM

@Clive, @Frank Wilhoit

Yes, I'm old. Dealt with DOS360.

https[:]//alt.folklore.computers.narkive.com/K6M48NSO/was-mvs-se-designed-to-confound-amdahl

WeatherOctober 31, 2019 10:05 PM

Hmac well someone else got that, for RNG, just know windows is clock tick for 64bit hash, but need more input for data.

Clive RobinsonOctober 31, 2019 11:32 PM

@ Frank Wilhoit,

Apologies for excessive coyness; the faulty microcode was for a clone of the DEC PDP-10. Now look up the most successful vendor of such clones and you won't be far off.

Agh time to rake through the silt at the bottom of my memory to the early 1980's when Digital had PDP 11's and VAXen into every university with a spare broom cupboard ;-)

Where access to the Internet was rare from this side of the puddle. Whilst many remember AOL and their "landfill project" of three floppies on the front of every magazine, less remember Compuserve.

Compuserve were the last big users of DEC PDP-10/20s that I remember. They even resorted to making their own hardware into the 1990's which in effect was,a clone of the SC-30 that was it's self a clone of the PDP-10, not bad for an architecture effectively from the 1960's.

But I think "Systems Concepts" with their SC-20/25/30 machines were still effectively making clones to around 2002. In amongst that though was the legand of the "Foonly-1" which was reputed to be one heck of a firerod.

But PDP-10 and 20 were effectively dead by Digital's own hand. They had decided their future was in the direction of the VAX. Which I've been told is still actively in use in some banks around the world.

DaveNovember 1, 2019 12:17 AM

@Gunter: Yes, this is exactly it. Never, ever rely on a single source of randomness, because when that fails, all the crypto you're ever going to use will also fail.

I would also add that any application that relies entirely on RDRAND is buggy/broken and needs to be fixed. Maybe it's a good thing that AMD broke RDRAND, because it points out all the applications using it in unsafe ways.

FaustusNovember 1, 2019 11:17 AM

Very interesting. I have a non-deterministic system for AI that never ran well on windows but provided great results on Linux. Until a few months ago, when it stopped. I suspected that the random number generator had been degraded, perhaps intentionally by an 3 letter org because that is one stop shopping for enabling many attacks on encryption and security. The fact that identical code never worked on windows speaks for itself.

Of course, this seemed like a tin-hat idea until this article, which, however, does not apply directly to my configuration.

I do note that the article seems to suggest that we need to worry about false positives in using 20 64-bit words of FF's as an indicator for determining whether this specific problem exists, which is way off. Most modern encryption, and security and other algorithms ignore the possibility of events like this, at a much lower threshold. Any predetermined 1/2^(20*64) probability event is just not going to happen in the history of the universe. That's less than 1/google^10 probability per test!

I would like to test my system with Fortuna under Fedora. Are there any reputable C/C++/golang sources or libs with dev info? (Fortuna doesn't look trivial and any attempt by a non-expert team is most likely to be wrong).

SpaceLifeFormNovember 1, 2019 1:10 PM

@Clive

On HID-less devices, I use timer_entropyd because, it's better than /dev/random being empty, which leads to failures.

But, you need root.

SpaceLifeFormNovember 1, 2019 1:55 PM

@MarkH

"who had gone on to Microsoft that his group was ordered to ship their code-under-development"

Question: Was this 'ship too soon' done by Microsoft, or at a previous employer?

Early non-QA ship of software usually points to 'deliverables' (milestone payments) or extremely serious emergency bug fixes that have no work-around.

Curious if MS or not.

I ask, because as a dev, I also did my own IT/QA/DBA on the fly, as fast as I could, working bizarre hours, to a DOD customer.

Platform *nix, C, Oracle. Millions of LOC.

Each day (while I was sleeping), they would report a new issue (that I had already encountered in my testing), and the next night I would fix what I could, and deliver to them via uucp over dialup, to other side of planet.

The next day, IT/QA would do their gig on what I rolled the prior night.

I had zero time for paperwork.

Essentially, this DOD facilty was BETA testing.


MarkHNovember 1, 2019 2:26 PM

@SpaceLifeForm:

Bearing in mind it was third-hand by the time I heard it, this incident was Microsoft in Redmond WA.

As I understood it, the release date had been scheduled, and some middle-manager decided to command "ship whatever you got". The devs were still in the thick of creation. Essentially, the bread was sent from oven ... baked, or not.

SpaceLifeFormNovember 1, 2019 4:17 PM

@MarkH

As I smelled. The Magic Yeast was in place.

It was likely NT getting the lame Orange.

MichaelNovember 2, 2019 4:04 AM

The take that I had from the article was that modern CPU's had RNG's, these RNG's could be updated by a package and that they were used in a VPN kernel module. I wonder if the microcode packages get special attention, apart from RDRAND the PCLMULQDQ seems interesting to modify.

On another note, infinite loops are always a bug.

JonNovember 2, 2019 1:48 PM

@ Michael:

On another note, infinite loops are always a bug.

Not always. Embedded systems very frequently are structured as an infinite loop of 'read input (if any)', 'act upon it (if any)', 'repeat forever'.

and I've built quite a few more that will wait forever for some thing to happen, and if that thing never happens, that's fine by me (imagine a fire alarm). They're designed that way... :-)

Jon

(Point taken, however, for anything in a boot process(!!)). J.

SpaceLifeFormNovember 2, 2019 2:18 PM

@Michael:

"On another note, infinite loops are always a bug."

Actually, that is normal in *ANY* kernel.

It is normally referred to as the idle loop.

Got plenty of machines doing that right now.

That is the code where the kernel spins it wheels until some kind of interrupt occurs.

Historical note:

On Tandem machines, that particular piece of code was written thusly:

(It was written in TAL, but the C equilvalent follows. TAL very much like C)

#DEFINE Bicycling 1

while (Bicycling)
{
/* Spinning Wheels */
}

/* NEVERREACHED */


Ed GardnerNovember 15, 2019 2:28 PM

Back to the original article.

What disturbs me is that if a BIOS patch can fix a so-called "hardware" RNG, then a different BIOS patch can break the same RNG, and probably in ways that might not be easily detected. Corrupt BIOS patches can be propagated across networks, after all that is how "good" patches are distributed. A disturbing attack vector.

Clive RobinsonNovember 15, 2019 10:28 PM

@ Ed Gardner, Michael,

What disturbs me is that if a BIOS patch can fix a so-called "hardware" RNG, then a different BIOS patch can break the same RNG, and probably in ways that might not be easily detected.

Yup, that's the short version of it.

The long version is Intel has had RNG's on chips for a very long time. And as far as I can remember has not allowed you access to the "raw bit generator" inside their chips. That is they hide the --alleged-- "True-RNG" what ever it might be, behind an unknown set of layers of hardware and software before applying a crypto algorithm of some form as "magic pixie dust".

Thus not only can you not check the True-RNG is working correctly, you can not see at all if it is a True-RNG or some kind of Pseudo-RNG that is driving that crypto algorithm that gives you your RNG output.

Some what simplistically here are a number of "noise" signals that could be used,

1, Fully determanistic.
2, Semi-determanistic.
3, Chaotic noise.
4, Thermal noise.
5, Quantum noise.

You can envision them on a line from bad to good[1]. Now whilst you can via statistical tests check these sources and tell good from bad to some level, after the "magic pixie dust" crypto algorithm you have little or no chance of determaning good from baf, unless it's very bad and the way they used the crypto was equally as bad.

Thus even if a thermal noise source or better is used after the magic pixie dust nor can you tell if the RNG is being influenced or entirely swampped in some way by "power supply noise" or other signal.

Thus only those that "have the key" to the magic pixie dust crypto algorithm can tell. Which is not what you want for you security. Which is why people have realised that they have to in effect build their own mixing algorithms to add as much non-determanism as possible, whilst extracting every drop of real entropy they can and spreading it across all the bits in the entropy pool.

[1] Thus "fully determanistic" could be a counter or Linear Feedback Shift Register. "Semi-determanistic" could be CPU noise where the code that is running is in effect unknown and further gets task switched and interupts which mix things up further. "Chaotic noise" is in effect determanistic, but is so sensitive to it's input conditions that after a few cycles around the loop it becomes effectively unpredictable. "Thermal noise" is present in all electrical circuits that contain resistance, however it is extreamly low level thus subject to external interferance that could swamp it via it's power supply, circuit traces etc. Then there are various types of Quantum sources but again care has to be excercised in preventing interferance.

Leave a comment

Allowed HTML: <a href="URL"> • <em> <cite> <i> • <strong> <b> • <sub> <sup> • <ul> <ol> <li> • <blockquote> <pre>

Sidebar photo of Bruce Schneier by Joe MacInnis.