Random Number Bug in Debian Linux

On May 13th, 2008 the Debian project announced that Luciano Bello found an interesting vulnerability in the OpenSSL package they were distributing. The bug in question was caused by the removal of the following line of code from md_rand.c
	MD_Update(&m,buf,j);
	[ .. ]
	MD_Update(&m,buf,j); /* purify complains */
These lines were removed because they caused the Valgrind and Purify tools to produce warnings about the use of uninitialized data in any code that was linked to OpenSSL. You can see one such report to the OpenSSL team here. Removing this code has the side effect of crippling the seeding process for the OpenSSL PRNG. Instead of mixing in random data for the initial seed, the only “random” value that was used was the current process ID. On the Linux platform, the default maximum process ID is 32,768, resulting in a very small number of seed values being used for all PRNG operations.

More info, from Debian, here. And from the hacker community here. Seems that the bug was introduced in September 2006.

More analysis here. And a cartoon.

Random numbers are used everywhere in cryptography, for both short- and long-term security. And, as we’ve seen here, security flaws in random number generators are really easy to accidently create and really hard to discover after the fact. Back when the NSA was routinely weakening commercial cryptography, their favorite technique was reducing the entropy of the random number generator.

Tags: cryptography, hacking, Linux, NSA, operating systems, random numbers, vulnerabilities

Posted on May 19, 2008 at 6:07 AM • 88 Comments

Comments

erlehmann • May 19, 2008 6:47 AM

If you watch Battlestar Galactica, you might find this cartoon funnier: http://dieweltistgarnichtso.net/index.php?/archives/21-Caprica,-2-years-ago.html

(Shameless self promotion, I admit it. But people on IRC told me it’s better than the xkcd one.)

ChoJin • May 19, 2008 6:55 AM

Well, we could also argue this line should have been properly commented considering how important it was…

IMHO, the so called “self-documented code” is pure madness, since it never documents the intent. But that’s another debate I guess.

Anyway, somehow, it’s a beautiful bug 🙂

Anonymous • May 19, 2008 7:14 AM

Maybe the documentation could have been better, but someone whose thought process includes “Hm, this line causes a warning. Oh, well, it’s probably nothing important, I’ll just comment it out” doesn’t deserve commit-privileges, as fas as I’m concerned.

Of course, this whole thing required a string of failures (reviews, regression tests etc) to grow into the fiasco it has become.

Nice detail: Even if YOUR keys are secure, just the fact that you communicated with someone with an insecure key could compromise your key. So, in effect, to be really safe, we should regenerate every private key in the world.

Merijn Vogel • May 19, 2008 7:15 AM

In the last week Ubuntu and Debian both have submitted patches to their users fixing this issue, with a few informational messages about this issue.

Personally, I consider that a quick response 🙂

clvrmnky • May 19, 2008 7:25 AM

@Merijn Vogel: yes, easy to fix, but the problem is that any server key you have generated before the fix has to be retired, and new keys distributed. In some deployments this is seriously non-trivial.

Andrey Sverdlichenko • May 19, 2008 7:43 AM

Using uninitialized data as entropy source is an AMAZING idea in first place. “Fixing” or “unfixing” doesn’t make it any better.

Not Bruce • May 19, 2008 7:44 AM

Lessons learned:

The implemenation of encryption in computer hardware and software is so complex that you can never be certain that no vulnerability has been introduced.

The open source approach is not inherently more secure.

Expertise in cryptography is rare among computer programmers.

clvrmnky • May 19, 2008 7:45 AM

Take a look at the OpenSSL bug report history (link is in OP). Sure, we can argue that using uninitialized data to help seed your PRNG is weak, but the code in question /was/ documented very clearly, at least to anyone who codes in C for a living. And, technically, this is not wrong to do. Tools like Purify and Valgrind assume the use of /any/ uninitialized data is wrong. This assumption is wrong.

The question posed, “can I just remove this obviously broken line permanently?” and the answer was “no, the reason Purify/Valgrind is complaining is well-known, hence the ifdef. It will not hurt actual running code, and /may/ help a little in terms of seeding values. Don’t do this.”

The solution is to change a compile flag and update the FAQ and manual pages. The code has not been otherwise changed, AFAICS.

This was a obvious error committed by a coder who assumed that a band-aid solution was the correct way forward. Not only was the real reason for the Valgrind/Purify errors not understood, the solution of commenting out the code to clear the warning was never properly thought through.

I’m sure the coder in question is properly chastened, and we have all made bone-headed choices like this. But the fault lies squarely with the person who made this change for Debian, not OpenSSL.

Stine • May 19, 2008 7:46 AM

I’m going to get it out of the way (and be extremely pedantic) because I know someone else would have if I didn’t … it’s called “Debian GNU/Linux” not “Debian Linux”.

🙂

betabug • May 19, 2008 7:48 AM

@Merijn Vogel: judging from the running about and nervous activity of the debian/ubuntu crowd, just installing the update is the smallest part of the cleanup. Replacing every key and every password that was transmitted over a connection with one of those keys kept them much more active.

Nicholas Weaver • May 19, 2008 7:59 AM

Actually, its worse. The first reference, the one with the #ifndef PURIFY, is the one that valgrind/purify complain about, and only includes uninitiaziled memroy.

It was the other reference, which purify does NOT complain about (why do you think there is no #ifndef PURIFY?) that was NEEDLESSLY removed.

Rob Funk • May 19, 2008 7:59 AM

It’s important to note that the fatal flaw here wasn’t the removal of the “purify complains” line that opportunistically adds uninitialized memory to the entropy pool. It was removing the similar line earlier that was used to add all other entropy.

That more important line happened to be called with potentially-uninitialized memory at some point elsewhere in openssl, causing the valgrind warning.

Sorry, I’m just getting tired of seeing comments about the non-wisdom of using uninitialized memory as an entropy source. Uninitialized-memory-as-entropy is a total red herring here.

Rob Funk • May 19, 2008 8:05 AM

@Nicholas Weaver, you have it partly right and partly backwards.

The line with #ifndef PURIFY and “purify complains” comment is the one that could optionally have been removed, and was also the one generating most of the warnings. The obvious solution to that, of course, was to define PURIFY when checking with valgrind.

Matt • May 19, 2008 8:06 AM

This is a situation in which open-source is LESS secure than closed-source. Experienced developers created a program that worked. Less experienced system integrators with access to the source created a version that did not, and released it to end users. If they had not had access to the source the bug would never have been introduced.

Sejanus • May 19, 2008 8:33 AM

Matt,
I can imagine the opposite: not-so-experienced programmers created program with security hole. Experienced programmers had access to the source and therefore quickly discovered and fixed it.

Colossal Squid • May 19, 2008 8:51 AM

Debian really, really screwed the pooch on this one. More worrying than the existence of the exploit is that the patch was submitted to their repository a number of days before the vulnerability was announced.
http://svn.debian.org/viewsvn/pkg-openssl/openssl/trunk/crypto/rand/md_rand.c?rev=300&view=diff&r1=300&r2=299
“If they had not had access to the source the bug would never have been introduced.”
No, but instead black hats could just reverse-engineer the algorithm and we could sit around twiddling our thumbs until the devs decided to fix it:
http://www.computerworld.com/action/article.do?command=viewArticleBasic&articleId=9048438

JPJ • May 19, 2008 8:53 AM

@Sejanus.

You can imagine the opposite, but can you provide a reference to it?

–JPJ

Dash • May 19, 2008 9:04 AM

What I don’t understand is why a package maintainer with no cryptography background thinks it’s his business to “correct” experts’ code. After the fact all kind of justifications have appeared, but I still haven’t read an actual apology, just blaming OpenSSL developers. Sigh.

Carlo Graziani • May 19, 2008 9:06 AM

Unfortunately, this sort of thing is not that uncommon when Linux distribution packagers (like Debian, but also Red Hat/Fedora, SuSE, etc.) take application source and start applying their own patches. It’s just that it gets more attention when it’s a security bug, rather than just misbehavior by an application.

Frequently, source patches made by distributors are not passed back upstream to the original code maintainers (Licenses like GPL insist on source code release to downstream, not to upstream). So the people who are best-qualified to vet those changes never see them, until after a distribution release.

Not infrequently, those changes are poorly conceived and poorly tested, and lead to unexpected behavior by the application for some fraction of users. Then the application’s mailing lists (as opposed to those pertaining to the distribution) get flooded with reports of ‘bugs’ that the application maintainer is powerless to fix. This happens regularly to the pilot-link maintainer, as an example from a mailing list that I happen to follow.

Unfortunately — I say this as a Linux-exclusively computing person — the FOSS culture’s values of forward re-distribution, while generally positive, have the pathological aspect that anyone who can modify the code for the benefit of downstream users thinks that they ought to. This idealism, coupled to the driving pace of distribution release schedules means that changes are often not fed back to the original developers for approval before redistribution, with the consequences that attend this sorry story.

dmytry • May 19, 2008 9:08 AM

It is important to provide more context when discussing source code.
Those 2 lines were located in different functions, which do very different, in fact opposite things.

The first (by order) was in
static void ssleay_rand_add(const void *buf, int num, double add)
This function adds entropy to the pool. The “const void *” means that it is input parameter, and incidently, the commented out line is the only line that uses buf.
Commenting out MD_Update(&m,buf,j); here turns whole function into almost no-op!

This line does not cause any problem with debugger tools; ssleay_rand_add is meant to be called with initialized buffer (though calling with unitialized doesnt do any real harm).

The second is in
static int ssleay_rand_bytes(unsigned char *buf, int num)
This function outputs random bytes into buf, “taking” entropy from the pool.
It is sometimes called with unitialized buf, and sometimes with buffer that contains previous random numbers. It makes perfect sense here to hash buf to increase randomness; it doesnt do any harm when buf is unitialized, and it improves randomness when buff contains old key data.

Nonetheless, since debugging tools complain about any use of unitialized data, this function has #ifndef PURIFY around that call.

The distro maintainer probably assumed that the original developers were dumb/whatever and just forgot #ifndef PURIFY in the first function.
He didnt even bother to look what is first function for, and what does it do. He just commented out right away.

That kind of modification is a recipe for disaster. Regardless of whever this is linux kernel or some file system module, or ssl. Regardless of the amount of comments in the file (you cant comment every line in internal functions, can you? and even if you do, that is near useless).

It is completely wrong to assume that same line would do same thing, especially when it is used inside different functions with very different (opposite) purproses.

Sejanus:
That is indeed possible, but it is highly uncommon for distro maintainers to be better programmers than original developers. Most often, distro maintainer is not even a programmer, or is not familiar with the language that source is written in.
It happens however that highly experienced user (who also happens to be a programmer) fixes something. But chances are low; the code bases are big, you dont look at it except for fixing crash.

M Welinder • May 19, 2008 9:09 AM

Note, that the original OpenSSL code was buggy and could, depending on the C compiler’s mood, give you a constant key. Or worse. It appears that the developers did not think the C standard applied to their code.

But, yeah, the Debian team screwed up the fixing.

Anonymous • May 19, 2008 9:16 AM

“the only “random” value that was used was the current process ID”

Wait, what? Why are they not using the kernel entropy pool? Maybe this is because I’m not an expert, but I don’t see any reason not to use /dev/random or /dev/urandom.

lol • May 19, 2008 9:16 AM

Bruce, this is old news already. It sure took you long enough to post this to your blog.

LOL.

grendelkhan • May 19, 2008 9:20 AM

Not Bruce: “The open source approach is not inherently more secure.”

It certainly points out a weakness in the distribution chain; one wonders how on earth someone ended up poking around in cryptographic code. I suppose that somewhere, Dan Bernstein is having a great big I-Told-You-So moment.

On the other hand, patching the vulnerability once it was found was very quick; there was a very similar bug in the Windows 2000/XP PRNG which led to the creation of weak SSL keys. (Similar in effect, not in cause–as nobody outside Redmond can see the source, there’s no way to know what the cause was.) However, the time between its disclosure (2007-11-04) and the release of a fix in Windows XP (with SP3, 2008-05-06) was far greater than the disclosure-to-patch time for Debian. (Unless you upgraded to Vista, maybe.) I don’t think there’s a patch available for Windows 2000 at all. The moral of the story is that there’s no rush to patch security issues, unless you break their DRM.

Additionally, the bug itself was in the Debian sources from 2006 to 2008; it was apparently in the Windows codebase since Windows 95.

While the open source model–especially distributors sticking their fingers in critical code they don’t understand–isn’t inherently superior to the closed-source model, the response is quite different.

grendelkhan • May 19, 2008 9:22 AM

I didn’t notice that the software here strips tags. That’s what the preview button is for, I suppose. The Windows vulnerability was disclosed here:

http://eprint.iacr.org/2007/419

And this is the essay I was referring to with “unless you break their DRM”:

http://www.schneier.com/essay-126.html

Carlo Graziani • May 19, 2008 9:24 AM

@Anonymous asks, “…don’t see any reason not to use /dev/random or /dev/urandom”

The OpenSSL code is not a Linux-only system. It is cross-platform. /dev/random is not a POSIX-ish thing, so there is no guarantee that it will be available on other platforms e.g. Solaris, HPUX, AIX, the BSDs, etc.

Gweihir • May 19, 2008 9:25 AM

Debian is generally pretty good in the fixes they do to upstream code. But sometimes they mess up badly, and I have, on several occasions, encounterd a “will not fix, and will not warn” attitude.

The last instance for me was rdiff-backup, were Debian stable (etch) uses a version that is experimental and is incompatible with both, the stable and the development version of rdiff-backup. The absolutely least thing to do would habe been to add a strong warning to the packet description, however nothing was done and poeple find out about this when their backup fails. This is at best very low amateur level. At worst it is malicious.

I belive that Debian has an issue identifying and getting rid of maintainers that far overstep their level of competence. The OpenSSL mess seems to be another one of these. Messing with the random number generator in crypto-software requires both deep understanding of the issue and several competent people that indipendently check the fix. Seems both were completely missing.

Marcus Meissner • May 19, 2008 9:31 AM

openssl uses /dev/urandom if necessary.

However the problem was that it read the entropy from there and did not add it.

And please, everyone, there were two lines of code where just one should have been removed. (see post from dmytry)

Sejanus • May 19, 2008 9:47 AM

I didn’t mean distro maintainers in particular. I just meant this: open source can be reviewed (and is reviewed) by many people. This increases probability of finding mistakes, security holes, et cetera.

Raphael • May 19, 2008 10:03 AM

It’s becoming a MAJOR concern now…

Read this if you speak french :
http://sid.rstack.org/blog/index.php/275-du-hasard-et-de-ses-consequences

or try the approximate google translation :
http://translate.google.ch/translate?u=http%3A%2F%2Fsid.rstack.org%2Fblog%2Findex.php%2F275-du-hasard-et-de-ses-consequences&sl=fr&tl=en&hl=fr&ie=UTF-8

Eli • May 19, 2008 10:19 AM

The issue is a bit more complex than the comments here appear to grasp. The Debian maintainer did err, but tried to get feedback on the change from the OpenSSL team.

Please read
http://lwn.net/SubscriberLink/282038/528cb5a3f2dea48f/
for more detail.

Colossal Squid • May 19, 2008 10:34 AM

Further to Eli’s link, here’s some stuff from one of the OpenSSL devs:
http://www.links.org/?p=328

which highlights some problems with OpenSSLs communications, i.e that the mailing list the OpenSSL documentation points to for application-related questions is not actively monitored by the devs.

Plenty of blame to go round it seems.

Rob Funk • May 19, 2008 10:37 AM

@Eli, as I commented at LWN, I don’t think the LWN writeup gave nearly enough blame to the Debian maintainer for trying to fix things without understanding them first, and for not telling the OpenSSL people that he was proposing changing production Debian code, not just debugging.

I think the most telling part of all this is the Debian guy’s original email to openssl-dev:
http://marc.info/?l=openssl-dev&m=114651085826293&w=2
He starts out with “When debbuging applications that make use of openssl using
valgrind…”
and concludes saying, “What I currently see as best option is to actually comment out
those 2 lines of code. But I have no idea what effect this really has on the RNG.”

The response was, “If it helps with debugging, I’m in favor of removing them.”
And from someone else, “There’s your first clue, build with -DPURIFY :-)”

Alan • May 19, 2008 10:39 AM

Why are they not using the kernel entropy pool?

Because the bug commented out the line where the collected entropy that might include that was added into the pool.

More analysis:
http://www.links.org/?p=327
http://www.advogato.org/person/branden/diary/5.html

derf • May 19, 2008 10:46 AM

Microsoft had a loophole in its RNG in 11/07, and it was recommended by a few groups that Microsoft publish its source so it could be analyzed for further problems. I don’t think that was done.

The Linux issue was fixed quite a bit quicker (days). The Windows issue was supposedly fixed in XP SP3 (6 months later) and ignored for Windows 2000.

Gweihir • May 19, 2008 10:46 AM

If you do not get conclusive feedback of the form “Yes, I completely understand the issue and I am an expert on it” from more than one person, you DO NOT mess with cryptographic randomness generation. Everything else is pure incompetence. Yes, the Debian maintainer tried to get feedback, but he obviously has not the required level of understanding to be maintaining a security crtitical component and had no business changing anything with the level of feedback he got.

I think this guy should be removed as maintainer of any crypto or security software immediately.

There still is blame ofr OpenSSL, I agree. It usually requires a mess-up in more than one place for something this serious to happen (and this is a catastrophy, make no mistake). Still, the Debian maintainer tried to fix something he did not understand. That makes him unacceptable as maintainer. It also makes the Debian process of assigning maintainers higly suspect, if somebody not a crypto expert gets to maintain crypto software.

andrew • May 19, 2008 10:58 AM

I don’t mean to be polemical, but I can’t help but asking myself: is there any hope that this will teach us something? Whoever you choose to blame, if anyone, can’t we do better than depend on a line of code?

Carlo Graziani • May 19, 2008 10:58 AM

On the subject of “well, the OpenSSL people made it hard to figure out whom to contact, their -dev mailing list was not for actual developers, etc.”

The OpenSSL source code comes with a README that explicitly states where to send bug reports. It says:

” Report the bug to the OpenSSL project via the Request Tracker
(http://www.openssl.org/support/rt2.html) by mail to:

openssl-bugs@openssl.org"

The file containing the affected functions (md_rand.c) contains additional contact information, in the form of two e-mail addresses of the people who actually wrote the code. If for some reason the Debian developer thought that a problem meriting a code change somehow did not rise to the level of a bug report, he could have at least informally contacted the actual developers, to find out directly what they thought.

Remember, the developers are operating on the source code, so they can easily find out from the source who to get in touch with should the need arise. The “We Didn’t Know Who To Contact” defense is the weakest excuse in this entire sorry mess. It was straight-up laziness, by people for whom contact with upstream developers is simply not a high priority.

Joe • May 19, 2008 11:04 AM

Bruce, rather than hear more about the bug itself, I’d be interested in your analysis of the steps Debian took to distribute fixes. As well as fixing the code, they’ve also released a list of blacklisted keys (those that could be generated by the crippled software) and a tool to find if any keys installed in standard places on the system are in the blacklist. Is this enough? Is there anything else they could do to support the monumental task of regenerating all possibly compromised keys?

Shane • May 19, 2008 11:09 AM

Luckily, this doesn’t effect those of us who take the time and care to compile OpenSSL via the source package, straight from the source.

Over the years I’ve gotten plenty of flack here and there for the time and (allegedly misspent) energy put into compiling many of my applications directly from source, but this vulnerability alone already made years of this practice worthwhile, in my mind.

FDHY • May 19, 2008 11:19 AM

Wow, that’s too bad. Seems like a good security code review could have caught this…

Gweihir • May 19, 2008 11:44 AM

I think any halfway competent security review of the patch would have found the problem. Apparently none was done.

Pat Cahalan • May 19, 2008 11:50 AM

This isn’t the first time a source package has been altered by a distribution in such a way that the source package now no longer works as advertised.

Many linux distributions have “broken” Open SSH by changing the default configuration without changing the documentation. (see http://padraic2112.wordpress.com/2007/07/09/bad-security-201-remote-x-sessions-over-ssh/ for the details). As a result, the behavior of the application now does not match what you think it is doing if you skim over the man pages.

This was done for functionality purposes. Nobody has really freaked out over this (and it still hasn’t been changed), but it represents exactly the same mindset that caused the Debian bug over which everyone and their grandmother is throwing conniption fits.

If you’re going to alter the way a package works when you include it in your distro, either by changing the code or the default configuration, you have to document the change properly, using some method that is subject to review.

Pat Cahalan • May 19, 2008 11:57 AM

@ FDHY, Gweihir

That’s because the security review is done by the source maintainers, and the functionality review is done by the distribution maintainers.

There’s a fundamental disconnect there. It’s going to be more pronounced in the free distributions (very generally) because the level of rigor and the areas of application of review are different.

This is the one major area in which the BSDs are actually better than any of the Linux distributions (IMO). Of course, it’s also the reason why your level of expertise needs to be generally higher to enable things to work on BSD. Linux distributions are designed and built to be used, BSD distributions are designed to work. There’s a subtle difference there.

2arandom • May 19, 2008 12:03 PM

I’m glad that Bruce posted this article, audience here is a good resource.
Can’t wait for the movie, Linux Casino. ‘we were doing short counts, uninitialized memory, etc…’ READ ‘ Selling code in and out’ ‘What a perfect business, ROI is easy.’
Coders are the new lawyers for business.

sam • May 19, 2008 12:19 PM

Shane,

But compiling it yourself doesn’t help much in the fallout to this. Sure your keys were generated with enough randomness to be secure, but if you admin any machines then the user keys are suspect.

If you authed with a debian box at some point in the last couple of years, your dsa key is now suspect.

So you end up with as much work as everyone else, except I guess your host key (the simplest thing to fix) is fine 🙂

Alex • May 19, 2008 12:43 PM

This is a major PITA. By far it doesn’t end with Debian.

Many many systems are concerned, including Verisign and TOR to name a few.

And this will be exploited.

Nix • May 19, 2008 4:53 PM

@Rob Funk, your repeated comments that anyone using valgrind should rebuild openssl betray ignorance of how valgrind is typically used.

Because valgrind is a dynamic code instrumentation engine, it is generally run over unmodified binaries (at most they’ll have debugging info available). If a distro told me that I had to rebuild critical packages like openssl in order to get a warning-free valgrind run, I’d be disgusted with the distribution. (Further, the nature of the failure is such that the valgrind warning-suppression system would be unlikely to correctly quash this warning, and it’s very hard to see how to fix that system so as to quash it.)

It’s pretty damning that uninitialized data was being intentionally used with no comments to suggest that this was intentional and no comments to suggest that the other identical-looking line was not doing an unimportant uninitialized-junk-mixing job but was the primary point of entropy-mixing in OpenSSL.

If OpenSSL had been maintainably written, I can’t imagine this bug would have appeared. (I doubt very much that anything like it would appear in GnuTLS, not least because the maintainer is responsive, but also because the code doesn’t engage in downright obfuscatory ‘clever tricks’ like this.)

jan • May 19, 2008 6:04 PM

Bruce, can you verify the claim that DSA keys are compromised if used in communication with an insecure server? I still cannot believe this, because if this is true – wouldn’t it mean that any server you use could break your key by deliberately using a special “non-random number generator”? (Wouldn’t this mean that DSA is to be considered completely broken?)

Smee Jenkins • May 19, 2008 6:11 PM

Some arguments that are NOT excuses and should NEVER be used as excuses, not even as mitigating factors:

The code was badly documented.
The upstream maintainers did not respond.

If you patch and distribute something important like OpenSSL, you deal with the above and assume the responsibility if things go wrong. Some people can handle this. Don’t want the responsibility? Don’t touch it.

Doug Coulter • May 19, 2008 7:00 PM

A lot of folks here aren’t understanding something pretty important. As someone who did embedded code for decades in small machines, we got used to looking at every bit of memory to find bugs. In those machines it was typical for memory to assume a quite predictable pattern on power up — the only truly uninitialized state, as the flip flops (for static) or caps (for dynamic ram) had ever so slight builtin imbalances do to process variables. We had bugs at various times that didn’t show up until the manufacturer made some change in production that we never found up front as the unititialized ram was anything but random in a given run.

Modern language tools nearly always initialize ram before a process get it anyway — any Microsoftie will tell you about 0xDEADBEEF for example. Other reliable bug indicators are there outside of debug mode if you know what to look for.

So if this were a bug, it was there before too.
The only way that the ram was “random” if at all is if some other process put things in it and the system never re-inited it before putting it back in the pool (assuming allocation from a heap at all)…

Fix is a really good random number generator, and entropy gotten by other methods. Oops!

Anderer Gregor • May 19, 2008 7:16 PM

…so most people seem to prefer a major Linux distro being distributed in full knowledge that exactly the one server code which is most likely to be open to the whole net (namely, ssh) and which has full privileges actually causes a qc tool to raise a big red flag, indicating sloppy programming at least, and the potential for buffer overflows and all the other fun in the worst case?
Especially given that one of the maintainers said (or better, seemed to say) that it was okay to remove the two lines that seemed to cause the warning?

Rob Funk • May 19, 2008 9:33 PM

@Nix, I fully understand the issue of recompiling for valgrind. The line causing major valgrind warnings can be removed without major consequences — and is still missing in the fixed Debian package. And yes, there were comments on that line about Purify not liking it. The other line was the major line in its function; you can’t expect people to add comments like “this is important” on every line that shouldn’t be deleted.

And if valgrind was warning on the important line, then the maintainer should’ve tracked it back to where that function was being called with potentially-unititialized memory.

@Doug Coulter, see my earlier comments about the uninitialized memory being a red herring.

@Anderer Gregor, that maintainer said it was OK to remove the lines “if it helps with debugging”; there was no implication about putting it into production like that. And the stuff you say about “a qc tool raising a big red flag” demonstrates a fundamental misunderstanding of the problem.

Dierdre Tobin • May 19, 2008 10:25 PM

@Anderer Gregor

It was in OpenSSL not OpenSSH.

As Robert Funk indicated the possible removal of the lines was for debugging only, not for final release.

Vance • May 19, 2008 10:26 PM

Two questions for the knowledgeable:

Is it possible to include code in OpenSSL that tests the strength of the random numbers it generates?
If so, why not include such code?

It seems that there could potentially be a number of reasons other than a mistake like this (e.g., a fault in a hardware RNG or a poor /dev/random implementation by the operating system) which would result in less entropy being added to the pool than expected. Including a test for RNG strength would seem to be a wise fail-safe.

Lussy Pips • May 19, 2008 11:51 PM

.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
X X X X X X X X X X X X X X X X X X

Regarding the recent SSL bungle:

I’m not placing blame on anyone, but let us consider for a moment:

How long would it take a member of a rogue organization, a company such as Microsoft, or an intelligence agency to land a spot into such a role as a code monkey at Debian.org, under the guise of a pro-FOSS person? You do know all three examples above are quite savvy when it comes to infiltration, mafias, corporations, and intelligence agencies do this all of the time. So let us suppose this is what happened here, and considering the wide range of impact with this issue, I believe this is exactly what may have happened.

What checks and balances are in place to weed out potential moles? Any? And would you really know what to look for even if such a policy is in place? Perhaps this question is worthy of an “Ask Slashdot” submission?

How many Tor hidden services (.onion) were taken down because of MITM attacks related to this issue? Fucking moles!

The Debian OpenSSL flaw: what does it mean for Tor clients? \ May 13th, 2008

https://blog.torproject.org/blog/debian-openssl-flaw%3A-what-does-it-mean-tor-clients%3F

Remember:

Linux: Kernel “Back Door” Attempt \ November 6, 2003

http://kerneltrap.org/node/1584

I don’t care how much a person insists they can track the people making the changes, all it takes is either a stooge who will easily take the fall and perhaps know nothing of their puppet master’s true identity as they may be paid by proxy, or a person trained to disappear after doing the dirty work and does just that.

Most geeks are easy enough to bullshit, especially if you have (or claim to have) a vagina.

… … … … … … … … … …. … …. .. …. .. .. . .

Paeniteo • May 20, 2008 1:56 AM

@Vance: “Is it possible to include code in OpenSSL that tests the strength of the random numbers it generates?”

It is not that the random numbers themselves were “weak” in any way. Every random number stream that the RNG produced was really random-looking.
The problem was that there were only 32k different streams of random numbers available.

Think of it like AES encryption: No matter whether you just use a single letter as your password, or a 100-letter passphrase, the ciphertext will look like complete gibberish.

Juergen • May 20, 2008 2:04 AM

Back when the NSA was routinely weakening commercial cryptography,
their favorite technique was reducing the entropy of the random number generator.

Do you have links to any concrete examples of that?

Martin • May 20, 2008 2:06 AM

Why is it that I’ve seen hundreds of posts, even from Debian project members, trying to shift blame for their mistake to the OpenSSL devs, while I have yet to see one single Debian proponent that asks the question “What can we do to make sure something like this never happens again?”.

If the Debian project was serious about security, this should have been the first question asked after the initial clean-up. I guess the conclusion is that proactive security is not a high priority for the Debian project, and that PR and marketing (e.g. trying to shift blame) has a higher priority. Even the Debian wiki explaining the vulnerability and mitigations contained several paragraphs of excuses and blame shifting. But a conclusion that proactive security is not a high priority makes sense, the Debian project main CVS server has been compromised twice in the last five years:

http://www.debian.org/News/2003/20031121

http://www.debian.org/News/2006/20060713

They respond quickly when they screw up, though. That point is used extensively to market Debian.

In my opinion, there are no excuses for modifying cryptographic code without understanding the consequences. The only acceptable response would have been: “We did a major mistake. We take full responsibility, noone else is to blame for this. We will review our policies and practices to make sure this can not happen again. We are sorry for all the trouble we have caused for both users of Debian and others out there.”

D0R • May 20, 2008 2:38 AM

If you need a random number, why not just pick up 42?

Martin • May 20, 2008 2:43 AM

@Jan

“Bruce, can you verify the claim that DSA keys are compromised if used in communication with an insecure server?”

No, they are not. The DSA key might be compromised, however, if the DSA signature was generated on a Debian or Ubuntu server. So, if you used ssh from a secure OS to a Debian or Ubuntu server using DSA authentication, your key was not compromised. If, however, you used a Debian or Ubuntu server to generate a DSA signature (e.g. ssh from a Debian or Ubuntu server using a DSA key), your key might be compromised.

The reason is that the secret random value k used for the signature is generated by the signature maker. So, if the signature is generated on a server with a good source of randomness, the key can not be recovered.

Martin • May 20, 2008 2:47 AM

@sam

“If you authed with a debian box at some point in the last couple of years, your dsa key is now suspect.”

Not true, see my previous comment to Jan.

Sakuraba • May 20, 2008 3:11 AM

The Debian “developer” who removed the code without understanding what the code did, is indeed no competent (secure-) software developer but just a Debian packager. Security critical software packages like openssl should be mantained in Debian by true software developers working together very tighly with the original openssl developers.

So in my view, Debian should name most of his volunteers “Debian Packagers” instead of “Debian Developers”. There are few true Debian Developers in Debian (e.g. Joey Hess, Ian Jackson, etc.) but the majority aren’t and should not touch security critical software.

Werner • May 20, 2008 4:49 AM

@Martin
The only acceptable response would have been: “We did a major mistake. We take full responsibility, noone else is to blame for this. We will review our policies and practices to make sure this can not happen again. We are sorry for all the trouble we have caused for both users of Debian and others out there.”
Sounds like what we would get from a politician, and proclamations like this is the least we neeed. “Taking responsibility” means to me:
– make the mistake open
– correct the mistake
– try to minimize the damage done by this mistake.
That’s exactly what Debian is doing. Additionally: Debian developers are working on the processes to assure quality all teh time. If you have a concrete proposal, how to do this better, post it do Debian, but don’t ask them for meaningless statements for the press.
There where a lot of proposals like
– communicate with upstream
– only change code, when you are competent
– review the code
– ..
All this is stated policy of all the GNU/Linux-Distributions. Every software developer will agree. But it almost never is done sufficiently. Why is that so?
There has been a lot of flaming, at least in Germany, against Debian because of their slow release cicle. The only advice I can give to the Debian developers: Please stick to your policy of “release when ready” and take all the time you need.

Werner • May 20, 2008 4:57 AM

This ill-fated change could also have been given a full review much sooner if one had tried to contribute it back upstream.

I don’t know why this hasn’t happened in this specific case, but I notice a general tendency of distribution maintainers to be sloppy about that and to amass distribution-specific patch collections that not only grow in size but also in scope.

So getting proper upstream review is one more reason for making the (sometimes considerable) effort of contributing one’s changes back – even if the license doesn’t require this.

Werner • May 20, 2008 5:06 AM

Sorry for my English. I did a major mistake. I take full responsibility, noone else is to blame for this.

What I wanted to say is:
Proclamations like this are the last thing we need.
BTW, it should have been “release cycle”.

Werner Baumann • May 20, 2008 5:15 AM

Another example of bad random name generation?
There are three postings by “Werner” in a row (4.49 AM, 4.57 AM, 5:06 AM). But only the first and the third are from me, the second one is from another Werner.

Werner Almesberger • May 20, 2008 5:33 AM

Hi Werner 🙂 Good example for unlikely collisions happening in real life.

Anonymous • May 20, 2008 5:40 AM

My favourite article (and an assortment of relevant cartoons):

http://www.gergely.risko.hu/debian-dsa1571.en.html

Martin • May 20, 2008 5:47 AM

@Werner

“That’s exactly what Debian is doing. Additionally: Debian developers are working on the processes to assure quality all teh time. If you have a concrete proposal, how to do this better, post it do Debian, but don’t ask them for meaningless statements for the press.”

I didn’t ask for a statement for the press. I just observe that the attitude seems to be one of finger-pointing and blame management, rather than taking full responsibility and making sure that this will not happen again.

“If you have a concrete proposal, how to do this better, post it do Debian, but don’t ask them for meaningless statements for the press.”

Yes. My proposal is this: Never ever modify cryptographic code unless you know exactly what you are doing, and make absolutely sure that all your project members follow this religiously. This last point is, as far as I can see, the major deficiency here. The fact that Debian ended up in this mess shows that the policies you mention are empty talk and not taken seriously. That needs to change.

Please note that I am not trying to blame the package maintainer here, this is a failure of the entire process and security mentality of the Debian project, and another major security failure is bound to happen in the future unless this changes.

Werner Baumann • May 20, 2008 7:21 AM

@Martin
Don’t know, why you judge about Debian “that the policies you mention are empty talk and not taken seriously.”
No. In my experience, these policies are real.

One example: Your proposal (“unless you know exactly what you are doing”) and this serious Debian bug are closely related to the qualification of the maintainers. New Debian Maintainers must go through a process of qualification, tutored by an experienced Debian developer, before they are alowed to do commits on their own.
Now, you may say: this does not work good enough. Ok. Experienced Debian developers should take more time to qualifiy new maintainers. Thea should raise the level of required skills for some packages.
What I want to hear from you too is: yes agree with the consequences: less manpower to prepare the next release, less manpower to integrate the latest release of some application software, less manpower to add the latest exiting features. I would agree to this. But as far as I can see, there is no common agreement on this, quite the contrary.
And this is not a Debian specific problem. From my experiance with Apache/mod_dav I would say, this standard Apache module is simply unmaintained. It is a common problem. The balance between consolidating what exists, and creating new projects and new features, is lost. We are going to get just as bad as commercial software in this respect.

Rob Funk • May 20, 2008 7:37 AM

Since the conversation has turned to the question of Debian’s maintainer policies, some might be interested in this:
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=332498
The original Debian openssl maintainer requested help. The maintainer responsible for this mess volunteered, and ended up doing most of the work by default. Others volunteered to help but (judging by further comments on this bug) didn’t really do much.

So it looks like the solution is for qualified people to both volunteer and actually spend time on package maintenance.

Carlo Graziani • May 20, 2008 9:40 AM

For the life of me, I cannot understand why the Debian OpenSSL maintainers believed that they had a bug on their hands.

I could understand if Valgrind had indicated the presence of a memory leak. That would probably justify some kind of intervention, although as a matter of policy the intervention should be “Go find an upstream developer and don’t stop trying until you find one and talk to him and get a definite recommendation”, not “tweak code you haven’t read and don’t understand until the symptoms appear to go away, then do no rigorous validation testing to make sure you’ve not broken anything”.

But what Valgrind indicated was not a bug. It was at worst a poor coding practice, and that appears to be a matter of debatable opinion. The fact that it is acceptable under project guidelines for a package maintainer (and not a developer, as has been pointed out above) to reach in and change the code under such circumstances is a scandal, one that reflects very poorly on Debian as an institution.

Not that (to my knowledge) the other distributions are necessarily better in this regard. They just haven’t broken anything as important as SSL. Yet. As far as we know.

Richard Braakman • May 20, 2008 10:11 AM

@Carlo:

The bug was that the Valgrind warnings were showing up in programs that were simply using openssl, thus making it more difficult to debug those programs.

Carlo Graziani • May 20, 2008 10:43 AM

But Richard, that makes it even loonier. Valgrind has a facility for fine-grained error suppression. That noise from libssl could easily have been shut off in valgrind directly, without modifying the code at all. The fact that this guy felt it necessary to tweak the openssl code anyway is even more damning of his competence.

sam • May 20, 2008 10:55 AM

@Martin,

Well that’s good. I have emotional attachment to my dsa key generated back last millennium…

Richard Braakman • May 20, 2008 4:07 PM

@Carlo: Not in this case, apparently.

From the original bug report on this issue:

“Suppressions don’t seem to be good enough to eliminate this unfortunately – the uninitializedness taints all the users of the openssl random number generator, producing valgrind hits throughout your program, making it unnecessarily difficult to see the wood for the trees.”

I’ve linked my signature to the bug report, in case you’d like to take a closer look.

Martin • May 20, 2008 5:22 PM

@Werner Baumann

“Now, you may say: this does not work good enough. Ok. Experienced Debian developers should take more time to qualifiy new maintainers. Thea should raise the level of required skills for some packages.”

Yes, that is exactly what I mean. Catastrophic failures like this (and the cron modification in 2001) indicate an organization that does not foster a security mindset among its members.

“What I want to hear from you too is: yes agree with the consequences: less manpower to prepare the next release, less manpower to integrate the latest release of some application software, less manpower to add the latest exiting features. I would agree to this.”

It is not really relevant whether I agree to that or not, I’ve just replaced my Ubuntu installations with OpenBSD. I did that in 2004 as well (after the series of root holes in the Linux kernel, I used Debian sid at the time), but I was regrettably stupid enough to start using Ubuntu again last year. I should have known better. OpenBSD has come a long way since then, so I doubt that I’ll use Debian or Ubuntu any more.

“But as far as I can see, there is no common agreement on this, quite the contrary.”

Exactly. OpenBSD, on the other hand, has security as their #1 priority.

Pat Cahalan • May 20, 2008 5:47 PM

@ Martin

OpenBSD, on the other hand, has security as their #1 priority.

Theo, on the other hand, has failed to follow his own rules for vulnerability disclosure.

Yes, Debian screwed up badly here. Yes, it’s a process problem. Yes, it’s probably going to happen again at some point. That doesn’t mean that overall it is a worthless distribution. It means that you need to be careful about using it. Just like anything else.

Blanket trust in OpenBSD is just going to come back and bite you in the rear end later.

Martin • May 20, 2008 6:19 PM

@Pat Calahan

“Blanket trust in OpenBSD is just going to come back and bite you in the rear end later.”

Good point, I don’t trust them blindly. There are certainly flaws in OpenBSD, and the developers have made mistakes. It is, however, easier to control an OpenBSD system IMHO, and the defaults are more sane than in any Linux distro that I’ve used, not to mention that protection against a whole range of trivial attacks is enabled by default. If you’ve tried to use the grsecurity patches for Linux, you’ll know how maintainable that is. I’m not saying that Debian and Ubuntu are worthless, but as you say, something like this will probably happen again. My concern is the direction that vulnerability research is going, towards automated generation of exploits based on patches/updates:

http://isc.sans.org/diary.html?storyid=4310

In a couple of years, we might see automated attacks within hours or minutes of the patch release. Those who focus primarily on reactive security, with short time to patch release as the sole metric of success, will, to use your words. probably get their rear ends bitten. Those who also focus on defense-in-depth and proactive security might not be such an easy target.

Patrick Cahalan • May 21, 2008 1:12 AM

@ Martin

Ja, I noticed that story. All the more reason to keep your Internet-facing services to a minimum, and your incoming files sequestered as much as possible 😉

ajt • May 21, 2008 5:50 AM

Code had bugs in it, people make errors and the maths is fallible… I think Bruce once said.

I use GNU/Debian and was affected, having to regenerate some keys and then redeploy them. It’s a pain but I’m not qualified to test all the code myself so I have to trust someone else to do it, and accept their errors.

It’s a pain in the bottom having to fix things but arguing about open/closed source is pointless. The error should not have been made and it should have been corrected earlier , but it wasn’t. If it had been closed no one would have known and it could be there for ever. Open is better in theory not not always in practice. Closed is closed so it’s impossible to comment.

dmytry • May 21, 2008 8:05 AM

getting back there, it seems a lot of people are completely missing the point about unitialized memory…
To repeat:

The OpenSSL random generator code in question per se does NOT use uninitialized memory in any way.
(see my earlier message)

There is a 2 different functions, both have “MD_Update(&m,buf,j);” line. First one is ssleay_rand_add and second is ssleay_rand_bytes
The “buf” is a parameter in both functions, not some local uninitialized variable.

The function purposes are really pretty obvious from names and parameters.

ssleay_rand_add “seeds” the generator; on unixes it is called with data from /dev/[u]random (and in portable apps with entropy gathered from events).
The debian change turns it into no-op, entirely disabling the seed.

The ssleay_rand_bytes outputs the data into a buffer. The buffer might be uninitialized. But in many application it contains old random data, except for the first call. Thats why ssleay_rand_bytes uses the data already in the buffer, before writing to it. To preserve the entropy. Debugger tools give false warnings about this. If that bothered debian maintainer, he should’ve changed makefiles to have -DPURIFY by default (as he was told to).
Also, when asking upstream, he just posted line numbers. No context whatsoever. Thats more like “covering your ass” or “wasting others time” than “asking”.
Assuming that was not intentionally malicious, it looks like either he himself didnt even look at the context, or he cant read C code at all. It does not take security expert to understand the code.

Regarding OpenSSL code quality, and the whole “it should have been documented” issue.
Commenting out almost any line causes compile error or some sort of bug.

How the documentation would have to look to prevent that kind of ‘mistake’?
…. ; /* commenting out this line is going to seriously weaken security /
….. ;/ commenting out this line is going to seriously weaken security /
….. ; / commenting out this line is going to cause compile error /
….. ; / commenting out this line is going to cause memory leak */

You cannot prevent idiocy by documenting code. There was ifdef and endif and /* purify complains */ comment.
It should’ve been obvious that he needs -DPURIFY rather than source code modification.

However, i must agree OpenSSL code is rather bad. Its dumb idea to hash PID (or any other very low entropy stuff like datetime) into random numbers generator at start. If OpenSSL wouldnt have used PID, or would’ve added it using ssleay_rand_add, that bug would cause broken OpenSSL to always output same pseudo-random sequence, which would be immediately obvious.

In my own software, I hash /dev/[u]random (or cryptapi on windows; btw cryptapi random was not as broken as openssl in debian),
and then on x86 platforms i sample the cpu tick count 1024 times, first 512 separated by mallocs and sleep(0) (precise timing of which is always different because of cpu cache), next 512 separated by free()’s of malloc’d data. This is merely a safeguard for case /dev/urandom or microsoft cryptapi is found to be really really badly broken (i only use that as “seed”).
I never hash tiny entropy, guessable values like PID in my prng.

Samuel • May 22, 2008 10:40 AM

There is a lot of fuss about the weak keys, but what seems worse in my limited understanding is that Diffie-Hellman exchanges in IPsec, SSH and SSL are also compromised. That’s far from perfect forward secrecy.

Nat • May 22, 2008 4:10 PM

I understand few programmers enjoy testing their own softare. But a crypto library is different.

Did anyone do the normal statistical tests on this software?. Probably not.

A open-sourcre project with a goal of testing/proving/attacking openssl would be welcome.

2closeTLS • May 23, 2008 9:52 AM

OpenSSL has a lot more issues than just code that will bite poor programmers trying to improve it.
OpenSSL abuses the name Open. Time for new modular OpenTLS, with solid code.
I’d bet that many have done extensive tests on ANY common element, like OpenSSL, flash, Firefox.
One can do crypto with math programs, then take key, etc, into program/os. Only trust what can verify. Need for simple solid crypto/math is priceless and yet fleeting.
People are too busy and stressed in IT. All on purpose. Government only makes it worse…

2Late4Crying • May 25, 2008 9:34 PM

OpenSSL with the FIPS 140-2 Cert #733, had bad PRNG issues in self test and seeding!
See D-kriptik website, google/yahoo/your local disk archives is your friend.
Entire CERT/gov etc, and these issues are still here.
Full disclosure is dead. CERT $ game failed. Incentives and exploits, hum….

Tom • September 7, 2013 1:16 PM

Random Number Bug in Debian Linux (2008) (schneier.com)

33 points by pdknsk 16 hours ago | 13 com

https://news.ycombinator.com/item?id=6343647