Proof that HMAC-DRBG has No Back Doors

New research: “Verified Correctness and Security of mbedTLS HMAC-DRBG,” by Katherine Q. Ye, Matthew Green, Naphat Sanguansin, Lennart Beringer, Adam Petcher, and Andrew W. Appel.

Abstract: We have formalized the functional specification of HMAC-DRBG (NIST 800-90A), and we have proved its cryptographic security—that its output is pseudorandom—using a hybrid game-based proof. We have also proved that the mbedTLS implementation (C program) correctly implements this functional specification. That proof composes with an existing C compiler correctness proof to guarantee, end-to-end, that the machine language program gives strong pseudorandomness. All proofs (hybrid games, C program verification, compiler, and their composition) are machine-checked in the Coq proof assistant. Our proofs are modular: the hybrid game proof holds on any implementation of HMAC-DRBG that satisfies our functional specification. Therefore, our functional specification can serve as a high-assurance reference.

Posted on August 30, 2017 at 6:37 AM37 Comments

Comments

wumpus August 30, 2017 12:02 PM

Title appears to be clickbait, I don’t see anything about backdoors in abstract.

I don’t even see reason to believe that RSA(sequential numbers) wouldn’t pass the tests mentioned.

moz August 30, 2017 1:13 PM

@wumpus

I’m not qualified to comment on whether the proofs prove what they claim to prove. However the lack of backdoors is in this:

NIST SP 800-90A generates pseudorandom
output, assuming that HMAC is a pseudorandom function

In other words this doesn’t prove that HMAC has no backdoors, it proves that HMAC-DRBG doesn’t add any backdoors to HMAC.

Aside from the question of whether the proof really prooves and the question of the safety of HMAC, the headline is correct and less misleading than most headlines you will see in the media today. In fact, it’s really only our own ignorance of what HMAC-DRBG that leads us to be tricked at all.

wumpus August 30, 2017 8:24 PM

Ok, so while it is relatively limited, it at least limits any shenanigans to an algorithm which appears to be fairly well known. I was also wondering about the “ideally random seeding” the proof requires, and how badly that works in practice.

Anura August 30, 2017 9:21 PM

@wumpus

As long as your computer isn’t compromised, and it has sufficient sources of randomness, and the system RNG has been seeded with enough of these random bits that the attacker can’t brute force it, then you probably don’t have much to worry about (and “probably don’t have much to worry about” is about as good as you can get when it comes to practical and not just theoretical). With a desktop, the time between keystrokes, mouse movements, as well as a plethora of activity from all of the applications provides more than enough unpredictable data.

Ideally, your system RNG will have a proof to show that once it achieves an unguessable state (i.e. it’s been properly seeded with at least 128-256 unpredictable bits that haven’t been leaked to anyone) then it will remain in an unguessable state regardless of what is used to reseed it.

Clive Robinson August 31, 2017 3:28 AM

@ Anura, wumpus,

… and the system RNG has been seeded with enough of these random bits that the attacker can’t brute force it

And thereby hangs the rub…

It’s been known for years that there is a “chicken and egg” problem with security, which showes up realy quite baddly in embedded devices.

1) Getting “enough” is a real problem especially with low entropy devices, thus it can take a very long time (hours or days in some cases).

2) The number of random bits at first turn on is usually zero and does not build untill the system is in use.

3) For a system to be in use it should be secure, but this requires a large amount of random bits to generate certificates etc.

It’s a problem SigInt agencies are more than happy to exploit.

But the problem gets worse…

Having had this failing made public a few years ago various embedded software engineers came up with various ideas based around the likes “pre-seeding” or “quick and chaotic”.

One such set of reasoning is,

    AES-CNT is a secure algorithm thus if we encrypt the unit’s serial number with AES that will be secure to use as the pre-seed value.

I suspect you both know why that is just another form of “magic pixie dust” or “magic umbrella” reasoning.

Getting good and sufficient entropy at start up is actually a very very hard problem to solve.

ATN August 31, 2017 4:22 AM

To follow the current conversation, and in short:
Same devices should get enough random numbers, but should not get the same random numbers…
IHMO devices should also keep their random numbers secret, attacks after the event are a lot easier if you know the exact list of random numbers generated by the random number generator.
It may be difficult to detect that a secure application has read the random number generator 300 times, but the operating system service has been intercepted and the data has been stored in some unused portion of the 4 TerraBytes hard drive…

ab praeceptis August 31, 2017 5:07 AM

Clive Robinson

Oh, how I love the random field. Unfortunately, most (even developers) do not understand the field at all. Neither do they understand that good random is one of the very cores of good crypto, nor do they understand the properties of crypto.

To offer a typical example: One can read it anywhere -> “Do not use prngs but CSprngs!!”.

Unfortunately that often is simply stupid, the reason being that most csprngs do not offer good random properties. They are (somehow) secure, yes, which usually means highly unpredictable but random is so much more than being unpredictable. One desires, for example, a good distribution, a long cycle, etc.

Plus, of course, as you correctly mentioned, there usually is the hen and egg problem. After all, we want our systems to start up in a predictable way and to a predictable status. So, where is the unpredictable life cycle starter to come from? Usually from not very random sources like (typ. small) timing differences, cpu built in rngs (by definition prng), network traffic “hazard patterns” – plus lots of fairy magic.

So, let us be smart and buy a Trng – which, of course, is then connected to our system, say by usb … which pretty directly translates to visibility to opponents (i.a. thanks to intel amt and similar plunder).

Just lovely that whole field (if one likes to be at the mercy of unseen third parties).

neil456 August 31, 2017 7:40 AM

I try really hard to not be a conspiracy theorist, but its getting really hard.

The facts:

According to the research, “This research was supported in part by DARPA agreement FA8750-12-2-0293 and by NSF Grant CCF-1521602.”

Both of these agencies are part of the government that wants back doors.

In other areas, like medical and psychology research, universities routinely find the results they were paid to find in published research that is not reproducible. This primarily comes from the taught misuse of statistics. Search for “replication crisis”. Some of the top statistics professors think that 50% or more of the medical drug studies reach false conclusions about the effectiveness of the drug being studied. They also think that as much as 80% of the psychology research is not reproducible. The problem here is that the universities don’t care if the research is correct, don’t know how to assure the research’s correctness, or don’t make sure they hire people that can be trusted. They turn a blind eye as long as they are getting paid.

The lone non-university author is from Oracle and they are a huge government contractor.

I could not find, in a quick reading, where the authors provide access to the research data and programs such that a third party could analyze and replicate the findings.

Opinion:

Today we live in an environment where politics and money invades everything. You can no longer trust that people, companies, universities, or researchers will do the right thing. There is a lot of government power at stake based on back door tracking of information and a lot of money to be made by universities and companies providing support for that power.

I don’t think that anyone should believe this research as it stands. Maybe it will prove to be true over time as the research is replicated, but maybe not. It is time to ignore the sound bites (hey its secure, trust us) and insist on all of the details.

Clive Robinson August 31, 2017 7:41 AM

@ ab praeceptis and the other “usual suspects,

So, let us be smart and buy a Trng – which, of course, is then connected to our system, say by usb … which pretty directly translates to visibility to opponents (i.a. thanks to intel amt and similar plunder).

Many years ago (think last century) I used to design TRNGs as there were no adequate ones around that did not need “a mortgage sized loan” to buy[1].

What I realised then is as,true now as it was then. People do not want TRNGs they are to abstract an idea for most to get their head around, What they realy want is a “push button” “Key / Certificate Generator” that –back then- you talk to via a serial port and terminal / computer.

Whilst the market is still small a device that you could put in your pocket that has two or three buttons and slots for uSD cards or Smart cards, that creates Certificates or Key files that get written to the root directory either in plain text or via an intetnaly held cert, as “An HSM or CA in your pocket” would have a market.

[1] The reason they were so expensive is an abject lesson in pricing by supply and demand. The actual demand for TRNGs is realy realy low, thus the price is inordinately high to recover design, production run and storage costs. Because you might make only a hundred and sell them over five to ten years. Hence you need to recover the upfront cost over five or six units.

Bob Paddock August 31, 2017 7:43 AM

Anyone have any insights to Quantum Tunneling based Random Event Generators?

The devices sold by Psyleron are considered the ‘Gold Standard’ in Parapsychology Research, where researchers are trying to ‘prove’ the Mind can change the outcome.

Alas they never go into details of how exactly their devices work.
Searching for “Quantum Tunneling Random Number” turns up many possible ways.

Are there any papers that discus a comparison between various hardware methodologies?
Chua’s Chaotic circuit, Avalanche noise in reversed-biased p-n junctions, optical ‘noise’, thermal ‘noise’ etc.

Isn’t any truly random event generator also going to be a really good thermometer?

wumpus August 31, 2017 1:11 PM

@Clive Robinson “It’s been known for years that there is a “chicken and egg” problem with security, which showes up realy quite baddly in embedded devices.”

I’ve seen enough updates around here to know that mistakes in random number generation are easy to make and hard to fix. Some really nasty and overdue patches have been needed on widely trusted software.

Intel included a RNG that should have been enough, but we know that it can’t be trusted (I suspect that calling it 16 times and hashing the result via SHA256 should work, but would call it more and prefer adding entropy from other sources.

When I first encountered the problem, most PC motherboards already had microphone ports, I thought that placing a microphone near a fan would make a great source of random numbers (or a great bug, take your pick…). The picture quality of nearly any phone camera implies a ton of entropy, but hashing across sufficient data might use more power than you want (and again, the privacy issue).

As far as a possible market for a fob capable of producing secure random numbers, that is simply the ideal example of a “lemon market” (textbooks will only use less ideal examples due to the theoretical difficulties in explaining this one).

tyr August 31, 2017 5:46 PM

Time to ask a dumb question because someone
might know.

Many years ago I ran some computer programs
which were seeding a random number generator
from a keyboard entry. This generated a
sequence that the program used. What I found
strange was that if you ran the programs
again with the same keyboard seeding you
got identical results. My interpretation at
the time was that what computers call random
wasn’t random at all. It seemed to be a case
of the language mutating until things became
there opposite without anyone bothering to
notice. The repeatability was too handy to
discard so became embedded and coloured the
concept.

Is this a valid viewpoint ? Does comp use a
concept called random that is not random at
all in the previous sense of non connected
number series ?

If both send and recieve have a non connected
series to use as a key which can not be made
by any method. It would seem to be a lot more
work to crack (you’d have to use side channel
analysis to get any information at all).

John Kelsey August 31, 2017 6:38 PM

wumpus:

Out of curiousity, how do you know that the Intel hardware RNG can’t be trusted? It’s clearly possible for any hardware RNG to be backdoored in some hard-to-detect way, but your statement seems to suggest a lot more than acknowledging a possibility.

Software entropy sources based on collecting noise from the operating system are pretty hard to meaningfully evaluate, and we know some cases where they don’t get enough entropy (the Minding your P’s and Q’s paper shows a bunch of examples where this happened to appliance firewalls and such based on Linux). A hardware RNG solves that problem, but leaves you with the possibility that the guy who designed the hardware RNG has backdoored it. The combination of the two sources, though, works out well: use the hardware RNG to get (say) 512 bits of entropy input for SHA256_HMAC_DRBG, then use the OS’ random sources (interrupt timings, mostly) to collect some more bits, and feed them into the DRBG as well. If the OS didn’t collect enough entropy, you’re still covered unless your hardware RNG was also buggered. If the hardware RNG is buggered, but your OS sources collected enough entropy (more than a hundred bits or so, in practice), you’re covered.

Clive Robinson August 31, 2017 7:02 PM

@ tyr,

Does comp use a
concept called random that is not random at
all in the previous sense of non connected
number series ?

A computer is determanistic in operation not random. It works on the inputs it receives to produce output. Think of it if you want as a wheel spining, it will (Newton’s first law) remain spining at the same rate unless subject to a force. The force is in effect the input and the spin rate is the output.

Most RNGs are not real TRNGs due to the problems with the true entropy sources. So as our host once designed you have a block of memory the contents of which are changed by the TRNG randonly known as the “entropy pool”. However the entropy pool is also accessed by the output algorithm that gives the RNG output to your application etc. One of the reasons it’s called a pool is that it “gets stired” by the output algorithm to “spread the entropy”. Put simplybit runs a hash or other crypto algorithm against the entropy pool to change the values. But this is a determanistic process. So without the TRNG entropy input the entropy pool becomes like the State Array (Sary) in a stream cipher. So you would expect exactly the same number sequence from the same initial state of the entropy pool.

The idea in general is that the output function that is acting as a stream cipher will have a long enough cycle time that the TRNG will update the entropy pool / Sary into sufficiently out of sequence state that even knowing the crypto algorithm and the Sary state an attacker can not gain sufficient information to be able to predict the output. As a rough rule of thumb you’ld be looking at having a RNG output rate to average TRNG input of not more than a 1000:1 to alow for future attacks on the crypto algorithm.

I hope that helps.

Clive Robinson August 31, 2017 7:57 PM

@ John Kelsey, wumpus,

Out of curiousity, how do you know that the Intel hardware RNG can’t be trusted?

The fact they chose to hide it behind the “magic pixe dust” of a hash algorithm says that they have something to hide…

Do you waste time trying to find out what or do you just move on to something you can actually test and verify?

It’s a complaint I have consistently raised since Intel first brought out their white paper on their on chip RNG.

Look at it as an addition to,

Q : What do chefs do with sauces,
That lawyers do with words,
Doctors do with dirt
And Intel engineers do with hashes?

A : Cover up their mistakes.

Harsh maybe but in serious security “you take no chances”, as was joked about the CIA moto of “In God we trust” it realy means “all others we check”. And “If you can not check do not trust”.

Large Mouth Bass August 31, 2017 10:54 PM

I just don’t like the way these “proofs” are being presented. Yes, the idea of an automated checker (Coq etc.) for a formal proof is helpful, but there is clearly no human-readable presentation of the main ideas and significance the formal proof that was allegedly passed by “Coq.”

We need a car analogy:

It’s like some of those Haynes manuals they sell at the auto parts store: they tell you how to use a micrometer to measure the races of the journal bearings in case you are completely overhauling the engine, but some of the mundane details about wiring, switches, tubes, and hoses outside the engine itself, or differences from model to model, are missing, incomplete, or incorrect in the manuals. Almost as if the guy who wrote the manual rather resented doing so, in the hopes that “consumers” would be “encouraged” to hire a professional mechanic, rather than do it themselves.

The “guy.” It’s even worse for ladies. You can’t even cut your own fingernails or toenails. You have to hire a “professional” for that, too. And don’t even think about working on your car.

I just hate the New World Order service economy. Guns are banned and they just don’t let you make anything or do anything anymore.

COYNE TIBBETS September 1, 2017 12:29 AM

Is it possible to prove a negative?

The authors obviously think so. Clearly they never read, “Reflections on Trusting Trust.”

It seems to me to hardly be necessary to consider much beyond that thought.

Large Mouth Bass September 1, 2017 12:39 AM

Bears repeating.

Guns are banned, people steal shit, they don’t let you make or do anything, they run you out of town from city to city and from state to state, they screw up your birth certificate so you can’t even get a fscking passport, they got that white nationalist hate thing going, you can’t fight city hall because some village elders are telling you you can’t do this or that or the other thing when they haven’t even heard the full story, and pretty soon it’s a freaking town council, and next thing you know they’re aldermen like it’s damned Chicago, all telling you the same old lies, fsck, dude, you’re mentally ill, and you “need” to be on medication.

Oh, can’t you just get a job and work in this country? Oh, sure, they’d hire you on the spot if you got a haircut — ha, ha, fooled you.

/transgendered

@COYNE TIBBETS

Who knows what those authors are thinking? English doesn’t even sound like their first language, and they just have too many names on that damned paper.

gasche September 1, 2017 4:38 AM

@neil456:

I could not find, in a quick reading, where the
authors provide access to the research data and
programs such that a third party could analyze and
replicate the findings.

The correctness proof is expressed using a “proof assistant”, which is a program that checks the correctness of mathematical proofs expressed in a certain constrained way (rather similar to a program’s code in a programming language).

The proofs (the checkable code of the proofs) of this publication is is publicly available online:

https://github.com/PrincetonUniversity/VST/tree/master/hmacdrbg

For example, the claimed correctness specifications of the HMAC-DRBG functions are available in this form at

https://github.com/PrincetonUniversity/VST/blob/master/hmacdrbg/drbg_protocol_specs.v

and the (machine-checkable) reasoning steps constituting the proofs are available at

https://github.com/PrincetonUniversity/VST/blob/master/hmacdrbg/drbg_protocol_proofs.v

Running the proof-assistant program they used (Coq, which is developed in the open as free software by an independent team : https://en.wikipedia.org/wiki/Coq_(proof_assistant) ) constitutes a way of “reproducing” the result.

Of course, you do have to evaluate the proof in detail to make sure that the assumptions are as claimed, that there is no difference between what is claimed to be proven and what actually was proven. To be able to do this, you have to be knowledgeable in both cryptography and mechanized proof / proof assistants (including being familiar with the specific proof assistant (“proof language”) used, Coq, and the general framework/methodology of C verification by this research group, http://vst.cs.princeton.edu/ ).

I believe that the reproducibility and self-verifiability status of this work are extremely high, probably among the very best in all of science. Not only do you have a publicly available full mathematical proof, but the proof was expressed in a high level of detail that makes it verifiable by a machine — in particular, no reasoning step is left to the deduction of the reader (as is common in mathematical proofs circulated on paper / in LaTeX), everything can be unfolded to the level of detail you would desire.

ab praeceptis September 1, 2017 6:31 AM

@the usual suspects

Impressions from the paper:

I’m torn; On the one hand it’s highly laudable work and well done wrt academia standards and common practice.
On the other hand I (personally, subjectively, and maybe wrongly) consider it next to worthless from a technical and pragmatical pov (I do see, however, many good sides such as e.g. an increasingly strong tendency to have sharp eyes on nsa/nist).

My major complaints:

  • The title is misleading. What they did verify is a sha-256 based variant and not “HMAC-DRBG” (in general).
  • Too much was assumed (which I can understand but which is questionable and puts the proof on somewhat wobbly ground)
  • I’m not sure the whole chain has been properly verified and proven correct. Note that (already a positive assumption) “a is proven, b is proven, c is proven” does not necessarily mean “a -> b -> c is proven” (‘->’ indicating tool chain flow).
  • CompCert. Wonderful work, love it, but: Such proof is not of a general validity. It’s merely a statement about CompCert’s interpretation of C.
  • (more general version of above) C is not provable, period. It’s a “standard” that is ambiguous and leaves much (of which much may not be of concern in general app. dev. but is of great concern in crypto dev.) at the pleasure of an implementor.

  • The algorithm is largely simply assumed to be of good quality. It is, however, well known that most CSprng are not generally having good random properties. Many failed or performed poorly in, for instance, BigCrunch tests.

  • The authors do understand (they mention it explicitely) that frequent re-seeding (no further qualification) does not provide better attack resistance.

I would like to add to the last point, which imo is of high importance, that we need more work done re. re-seeding and ideal actually used period fraction.

Reason: The relevant issue is which constellation gives Eve the best chance to attack. One tradeoff of major relevance is the following: The longer the actually used part of the algos period, the better for Eve (due to having a long series in the generators target domain). Re-seeding after, e.g. period/2 (let alone full period) provides a highly attractive situation for Eve. On the other hand, i.a. for reasons of practicality and performance, the other extreme of reseeding every 2^1 cycles is giving away next to nothing (provides no basis for series analysis to Eve) but highly increases the DOS attack surface, particularly with high cost CSprngs.
Based on (very primitive, due to diverse personal limitations) research and analysis a number of cycles in the range of cube_root(period)/2 seems to be the sweet spot.
Moreover I suggest to use ratchets with a high quality (non CS) prng offering excellent random properties driven by a CSprng for (re)seeding.

Anonymous++ September 1, 2017 10:06 AM

Of course, you do have to evaluate the proof in detail to make sure that the assumptions are as claimed, that there is no difference between what is claimed to be proven and what actually was proven.

Right. That’s exactly what I’m really trying to get at. A high-level expository write-up of exactly what, in human terms, you are attempting to prove, and how the mechanized formalization of that provable statement corresponds to what you say you are proving.

Not a dump of technical details, and, “We proved [something] in Coq!”

Large Mouth Bass (Anonymous++) September 1, 2017 10:28 AM

Interesting. I had parked in a gravel lot for a long time, but I had my cell phone turned off. No one came near or bothered me. As soon as I turned my phone on, a man rode by in a bicycle, and then another man drove up in a pickup, made some sexual comments, and told me I was on private property.

Stalking with a sexual motivation. You can’t take my guns and my livelihood away for alleged “mental illness” and then tell me I’m on “private property” after tracking me down by the location of my cell phone. You don’t own shit when you’re perpetrating a Holocaust.

ab praeceptis September 1, 2017 10:42 AM

Anonymous++

Well, they kind of do offer that.

But as so often the trouble is in the details. One I mentioned already is that proving some property of “HMAC-DRBG” and of an sha-256 implementation is not really the same. On the other hand I understand that the authors had to chose; they could either limit themselves to HMAC-DRBG or they could examine a widely used implementation (of which they chose the latter).

Another issue I see (I didn’t mention yet) is the problem of proving a random generator. Just try to think about an approach in terms of a postcondition; what and how would you formulate that? To make it more difficult there is more than one relevant property, each of which is difficult to examine. So they chose a game theory approach. That has certain advantages, the major one being feasibility but it’s not a full proof in the strict sense.

So, does that work really – in the strict sense – deliver what its title promises? No, it does not. Is it worthless then? No, absolutely not. It does, for instance, contribute a lot in terms of understanding the problem field and to gain some more trust.

That said, do I personally now trust HMAC-DRBG any more? No, absolutely not. spotting “game theory” to me translates to “probability games” and “not trustworthy”.
But I’m still glad that Bruce Schneier brought that work to our attention and I do quite some value in it (albeit not the advertised one).

As for the complaint: That field is difficult, even more so as that work actually combines multiple (albeit losely related) fields, each of which isn’t exactly simple. It’s next to impossible to make a sufficiently precise and correct statement that at the same time is Jane and Joe understandable (even if Jane and Joe are not at all stupid and seriously interested).

John Kelsey September 1, 2017 5:11 PM

Clive Robinson:

First, the Intel RNG doesn’t use a hash function for cryptographic post-processing–they use a block cipher (AES). (Maybe you’re thinking of the much older Intel HWRNG, from a couple decades ago, which did use SHA1 for postprocessing?) The Wikipedia writeup on the current RDRAND design is pretty decent.

Second, cryptographic post-processing is a way to guarantee that your outputs are good even when your source is imperfect. This seems 100% reasonable as a design, and it’s very common in hardware RNGs.

The alternative is that you could have an entropy source that you read yourself, and then feed it into some cryptographic postprocessing later to get good statistical properties and such. That gives the user of the thing one more job to do to use it, though, and I’m not sure how much it improves security. It’s not clear to me that you could tell, from the outside, whether the entropy source you were looking at was really getting a lot of entropy, or was just doing some moderately complicated internal thing that gave good statistics. For all you know, inside the black box, the entropy source is just a simulation with a fixed seed burned in at the factory. How would you tell?

John Kelsey September 1, 2017 5:16 PM

ad preceptis:

I just skimmed the paper, but as I understood it, they did two different kinds of proof:

a. A proof that assumes HMAC is a PRF (that is, a kind of ideal keyed function–if you don’t know the key, every new input gives you a random number as an output). That proof shows that HMAC_DRBG is secure given that assumption.

HMAC being a PRF is a really common assumption in the world–it’s used all over the place in practical crypto protocols. If you think SHA256 is a good hash function (which is seems to be), then you more-or-less should believe that HMAC-SHA256 is a PRF.

b. A proof that a particular implementation follows the specification of HMAC_DRBG using SHA256.

This proof guards against screw-ups in the implementation of the algorithm, but doesn’t say anything about the algorithm itself being secure. You could do this kind of proof of a SHA1 implementation, and it would let you know that a given implementation of SHA1 had implemented the algorithm as specified, but it wouldn’t keep SHA1 from being a broken hash function!

Clive Robinson September 2, 2017 1:35 AM

@ John Kelsey,

Maybe you’re thinking of the much older Intel HWRNG, from a couple decades ago, which did use SHA1 for postprocessing?

Yes it’s what I ment to convey with,

    It’s a complaint I have consistently raised since Intel first brought out their white paper on their on chip RNG.

I was meaning the whitepaper they brought out with their first on chip TRNG line up.

But even though they now use AES the purpose is the same as the hash, which is to hide what is on the other side of the curtain.

Having designed all electronic TRNGs and a few electromechanical-electronic ones as well in the past I have a first hand knowledge of just what crud hides behind the curtain which is why in my designes I always made it avilable at a “test point” and often via a BNC connector on the front panel.

To see why I call it “magic pixe dust” or “magic umbrella thinking” a little thought experiment.

The output of the TRNG source such as thermal noise is amplified to a usable level and then “sampled” the sampling can be either “predictable” or “unpredictable”[1] the result is a stream of bits, that look like the ramblings of a mad oracle.

The ramblings may or may not be a message, some recognisable noise or something compleatly unrecognisable.

In a more practical view point all physical sources show bias, faux entropy and real entropy. Bias is something that is simply recognised and almost as simply removed by a “debias circuit”. The problem is faux entropy, as it’s often hard to recognise and harder still to remove.

For instance take two very very high stability oscillators square them up and put one into the D input of a D-Type latch the other into the CLK input. If you look at the Q output on an oscilloscope, close in it looks very random. However if you put it through an integrator, you will end up with a very high quality sine wave at the difference frequency of the two oscillators… So not random at all, hence that Q output signal is faux entropy. Likewise with a switch mode powersupply it produces a noise signal that looks random but is mainly not. That is the bulk of it is faux entropy with a tiny fraction being real entropy caused by quantum effects.

Thus the hard problem is getting the real entropy without the faux entropy. Why is this important well two reasons. Firstly an attacker may be able to use various techniques to lock onto[2] the faux entropy and thus to them it is close to being a fully determanistic system. Secondly due to insufficient protection they may well br able to inject their own faux entropy signal and thus make the supposed TRNG very much determanistic [3]. Either way it reduces the entropy to negligible amounts such that a simple value/range search becomes not just possible but trivial[3].

Which brings me to,

Second, cryptographic post-processing is a way to guarantee that your outputs are good even when your source is imperfect. This seems 100% reasonable as a design, and it’s very common in hardware RNGs.

This is a common belief and it is totaly wrong, hence “Magic Pixie Dust Thinking”.

To see why imagine that insted of chaotic rambings the oracle puts out a very clear plaintext message like reading out the entirety of the “Harry Potter” books. Once you have heard a few words you can very accurately predict the rest of the words. That is there is zero entropy in the plaintext message. Now we encrypt the message into ciphertext using AES. The message now appears to be random according to all the statistical tests. However we know it’s not random ay all, that is there has been no increase in real entropy what so ever in the message it’s just been subject to a fully reversable determanistic pocess. If you know he AES key then you can decrypt it to get the original zero entropy message.

As I occasionally say about “security myths”,

    If it looks like a duck, quacks like a duck and waddles like a duck, that does not mean it is not a goose.

Hashing / encrypting an entropy source does not increase entropy all it does is smear it out a bit. To believe otherwise is to belive in pixes and magic dust. At best it’s security by obscurity, with an obscurity that can easily be stripped off.

Which is a point I suspect you realy know in your subconscious because you say,

That gives the user of the thing one more job to do to use it, though, and I’m not sure how much it improves security.

All it does is change the problem to one of “also knowing the encryption key”. Which could easily be leaked in a whole host of ways. For instance you could simply replace the MSB bit in each encryption output with a bit of the key. An attacker in the know would after having recovered the leaked key then have the most trivial of search spaces (ie 2) to get the supposed TRNG output back. If the TRNG is nothing but a couple of ring counters or high stability oscillators fed into a D-Type latch… Then it’s game over in it’s entirety.

My advice to people wanting to use a TRNG is “If you can not get at and test the actual noise source output, don’t waste your money”.

It’s not just that it might not be a noise source at all to make this a reguirment. It alowes you to continously “test the health” of the noise source. They are at the end of the day extreamly sensitive to interfereance or any failure within the unit such as a dry joint on a powersupply decoupling cap etc. And it’s not a stretch to say that a users wellbeing is critically reliant on the health of their TRNG, so they should require that they be able “to erl it’s pulse”.

[1] By this I mean from the point of an external observer. If the sampler is driven by a reasonable quality clock derived from an XTAL then it is predictable to the observer. However if the sampler is driven by an unstable or chaotic oscillator then it is unpredictable to the observer. The reason for using an unpredictable oscillator is to try and hide the “Waggon Wheel Effect” you see in films and “stroboscopic timing”. One TRNG source favoured by chip makers is a “ring oscillator” in theory they “drift randomly” in practice they are very suceptable to “injection locking” from any signal of nearly the same frequency multiple or submultiple. The most well known “injection locked” oscillator used to be the NTSC or PAL “chroma oscillator” in the old analogue television signal.

[2] They could use TEMPEST techniques to passively listen for a signal such as the CPU clock the edges of which get phase modulated by the PSU noise. The same PSU noise that also gets into the TRNGs thermal noise source bias circuit and it’s amplifier prior to sampling.

[3] Although I’ve been waving a red flag about passive[2] and active EmSec attacks, especially the active “Fault Injection Attacks” for a very very long time –ie since the 1980s,– I’ve kind of been a lone voice in the wilderness. As far as the academic community was concerned there was no mileage in investigating it. Eventually a couple of grads at the UK Cambridge Computer Labs did their own experiment on a very high end commercial TRNG from IBM. By simply hitting it with an unmodulated RF signal they took the ouput entropy down from 2^32 down to less than 2^7… Which makes the search space for an attacker trivial.

Wael September 2, 2017 2:42 AM

@Clive Robinson,

Eventually a couple of grads at the UK Cambridge Computer Labs did their own experiment on a very high end commercial TRNG from IBM. By simply hitting it with an unmodulated RF signal they took the ouput entropy down from 2^32 down to less than 2^7

I thought frequency injection attacks existed since a long time ago. I would also differentiate between fault injection attacks and frequency injection attacks, so perhaps my dates are off because of that…

Are you referencing this: paper or its slide deck presentation? Although both are from Cambridge, I have a feeling you’re referencing older work. Would be nice to share a link.

Clive Robinson September 2, 2017 10:01 AM

@ Wael,

Are you referencing this: paper or its slide deck presentation? Although both are from Cambridge, I have a feeling you’re referencing older work.

I’ve not seen that paper before, and from a quick scan it makes the point’s I’ve made here.

You are correct that I’m refrencing an earlier paper, i guess I’ll have to dig out the link. But I’m away from the cave this weekend as there is an SSB contest on.

Wael September 2, 2017 10:13 AM

@Clive Robinson,

s I’ll have to dig out the link

Wanted to see two things: the correlation between the incident RF frequency and the amount of entropy reduction, and the relationship between power and effectiveness. I also wanted to see how much advancement (at least publicly disclosed) in that field happened over the past 50 years or so.

There is more to say about RF frequency injection attacks, but perhaps this isn’t the right time…

But I’m away from the cave this weekend as there is an SSB contest on.

Upper sideband or lower? 😉

Rachel September 2, 2017 10:19 AM

Clive
SSB contest – simplex side band radio?
who can use skip/sun spot activity to broadcast the furthest?
can’t wait for your response to 66535’s satellite query 😉 a teenager actually did that very thing to a nasa craft onced

ab praeceptis September 2, 2017 10:32 AM

Clive Robinson

Well and entertainingly put.

I’m sometimes under the impression that many problems stem from an unholy marriage between a darwinian/techno-religious world view, way too much trust in big names, and some carelessness thrown in.

For a start, we should always keep in mind what “random” is. The very definition of random (or hazard – but not coincidence) says it pretty clearly: random is what we can’t predict. Of course, science has worked on something more detailed and qualified, but in the end the basic definition is true and sufficient enough for general consideration.

Weather has been a classical example; one couldn’t predict it – but with an interesting twist: One could predict it over a long period (>= 1 year) but only vaguely; One could also predict it over very short terms, say milliseconds but not for the very periods of the highest interest in between. The reason I like the weather example, however, is that it very nicely demonstrates the core of the issue: random is some kind of output of a mechanism of sufficiently high complexity so as to be unpredictable. As soon as we understand the mechanisms behind it, which we nowadays do to a degree wrt weather, the randomness vanishes (or considerably decreases).

What is the ultimate driver? Today we have reason to think that it’s the realm of quanta.

That’s one reason why I like what you elaborated. In a way you describe the underlaying problem in a practical and good way. We might assume that the source is “perfect random” (a wrong assumption but good enough for our purpose) but not at all optimal (understanding) and fumbling on the way to a product one can solder to a board brutally taints and often even subverts the whole thing and one ends up with something that isn’t “true random” at all.

While one might call this somewhat complicated or on the edge of philosophy and physics, the second element of the unholy marriage is almost obscenely simple. Chip and electronics manufacturers want to make money; that is their primary purpose and goal and the decisive factor for them is not whether their product is good but whether customers believe so (which usually is cheaper to achieve through marketing/PR, etc. than through technical facts).

There is some kind of unloved stepson of that marriage, too, namely the belief that a) a brand name indicates some kind of quality, and b) and worse, that knowledge and understanding can be bought in the form of chips or devices. While that might be the case sometimes, very often it is not. The well designed TRNG Fred is building in his basement hobby workshop might quite well offer far better random properties than some chip or part thereof.

And, of course, there is still the basic problem when dealing with quanta due their very nature. Plus, btw., the question whether quanta are the “final” source or whether they again are just a part of a mechanism that is delicate and complex enough to be considered (i.a.) a perfect random source (similar to a thousand years ago and the then most up to date understanding of the building blocks of matter which today we know and understand (oh well, to a degree)).

Another part of what you wrote which I liked was what I like to jokingly call (in a rothschild allusion) “let me set the voltage levels that define which analog signal your circuitry takes as 1 and which as 0 and I care not who your CEO is” *g
But granted, many can’t know better with many books and teachers telling them that digital is “the other” electronics. Nevertheless we would be well advised to keep in mind that DE is but one part of analog electronics and a small one at that.

As for “spreading TRNG by crypto devices” I’m somewhat more tolerant than you. True, aes (symmetric!) is a questionable choice but (and the topic paper is about that, not aes) good crypto hash algos aren’t that bad due to irreversibily. Actually I think that hash algos generally get somewhat of a stepson treatment and are often undervalued (and badly or misunderstood). The problem I see (in agreement with you) anyway however is that the best hash algo will produce biased not-random if fed with poor quality “trng” as you described.
One of the points some seem to get wrong is that “hash algo xyz produces pseudo random output” does not mean “throw in whatever and you’ll get good random”. It rather means that the (deterministic!) algorithm per se doesn’t introduce bias. Of course one could (and probably should) look closer at that for cases were really high quality pseudo random is needed. After all, “random” is a gem with many facets and the ruler along which hash algos’ unbiasedness is measured is a simpler one than the one used for random algorithms.

Clive Robinson September 3, 2017 1:54 AM

@ Wael,

Upper sideband or lower?

Depends on how high you go 0:)

@ Rachel,

SSB contest – simplex side band radio?
who can use skip/sun spot activity to broadcast the furthest?

Yes you bounce HF “sky wave” signals off of the various layers of the ionosphere for long distance / DX contacts and use the “ground wave” to get local contacts. The various layers exist all the time, however as with the “northern lights” solar disturbances make things “interesting” however if we get a direct strike from a “solar ejector” it will quite litterly be the “worlds biggest EMP attack” and things will be quite a bit more disturbing than just “interesting”. And rather like big meteor impacts, Super Vulcanoes and major “ring of fire” earthquakes we are currently “long over due” for one. Whilst it would not be a “Mass Extinction” event for “organics” it might well be so for power grids and microchips semiconductors etc… Which begs the question of what we will do with all those effectively redundent “code cutters”, will there be enough ditches to be dug to keep them occupied 😉

As for SSB techhnically it’s “Single Side Band” to imagine it think of the frequency spectrum of an Amplitude modulated signal, it has two sidebands and a carrier. This is wasteful of both bandwidth and power, so you first remove the carrier so you get Double Side Band Suppressed Carrier (DSBSC), then you remove one or other of the side bands to give you SSB, hence “Upper Side Band” (USB) and “Lower Side Band” (LSB) @Wael was asking about (the conventon is LSB below 10MHz USB above, and as Wael appears to know is considered a bit contentious these days https://www.eham.net/articles/5913 ).

However having taken things out os AM to get SSB you have to add them back in, in the receive for most receivers. So to do it you inject a carrier back in with what is often called a Beat Frequency Oscillator (BFO) at the last “Intermediate Frequency” (IF) mixer and detect the result, in the usual way with a diode detector or better.

One side effect of this is you get “pitch shift” on the recovered audio if you are not correctly tuned in, so people can sound like they are breathing helium. As at one point somebody thought that sounded “groovy” the transmit and receive process became used as an audio “special effect”. It also was used in some early commercial “audio encryption” products as well for “band shifting encryption”. These were however not very secure because with languages like English the information content is in the signal envelope not it’s frequency/pitch. So what you would do is detect the audio signal envelope and use it to modulate another sound source. This technique also ended up in the music industry tool box as a VOCODA which most people will have heard clearly at the end of ELO’s “Mr Blue Sky” which was the last track of the LP A side which is why it says “Now Please Turn Me Over”.

These audio encryption techniques poor as they were carried on into early Video Encryption used by both cable companies and later satellite broadcasters. And it was these TV transponders in Satellites that used to be the easiest to hide a Direct Sequence Spread Spectrum Signal (DSSS) in.

Coincidentally you make a DSSS transmitter in almost exactly the same way you do an SSB transmitter. Though instead of just removing the carrier and upshifting the frequency using a Double Balenced Mixer as you do for SSB you modulate the upshift carrier frequency with the “chip code” usually by “phase reverse keying” in simple DSSS systems…

So now you see the “interconnectedness” of the apparently unrelated parts of your posted questions B-)

Wael September 3, 2017 2:02 AM

@Clive Robinson,

Depends on how high you go 0:)

These days are over after the recent loss of my itchy foot apparel 😉 I need to find a new pair.

Clive Robinson September 3, 2017 2:23 AM

@ Wael,

I need to find a new pair.

Well you could ask his brother “reefer sailor monkey footed primate” for assistance. After all rolling those hugh bags up whilst high up hanging on with your feet for dear life does give you a different perspective on life.

Clive Robinson September 7, 2017 7:15 PM

@ Jim Lippard,

Yes, it’s possible to prove a negative:

Oh dear Prof Steven D. Hales needs to adjust his thinking.

He has fallen for the trap of bivalent thinking, that is something is true or it is false. The real world does not work that way as it is in most cases multivalent.

The reason mathmaticians use the bivalent logic of Boole and his Victorian colleagues is because it’s a very handy tool because under bivalent systems it can only be True or False, thus if you prove one the other must be false, there are “no unknowns” to cause problems.

Life is by and large multivalent,
especialy when time is considered. For instance take his argument about Unicorns and the fossil record. It’s easy to see why it is a strawman argument.

Firstly prior to the first fossil bones being found / recognised man kind was unaware of them. That is they existed but we did not know about them. Therefore according to the Profs argument dinosaurs “did not exist” because of his bivalent argument. We are now aware of “some but not all” dinosaur bones that is we can demonstrate the fossil record is “incompleate” by simple induction. Thus his argument fails again because we know that there are at least three states ie,

1, Known to exist and found.
2, Known to exist but not found.
3, Unknown.

Point three is important, every year we discover new species for which we have no reason to know they exist, thus they are not covered by points 1&2.

Thus at best the Prof has created a “Strawman argument” based around a restricted version of “logic” or he genuinely does not understand that multivalent systems exist that can not be reduced to bivalent systems…

There is an old saw/truism that in the real world only three numbers make sense “zero, one and tends to infinity”. That is something does not exist, something is unique or there are an unknown multitude of equivalent things.

Leave a comment

Login

Allowed HTML <a href="URL"> • <em> <cite> <i> • <strong> <b> • <sub> <sup> • <ul> <ol> <li> • <blockquote> <pre> Markdown Extra syntax via https://michelf.ca/projects/php-markdown/extra/

Sidebar photo of Bruce Schneier by Joe MacInnis.