Was the iOS SSL Flaw Deliberate?

Last October, I speculated on the best ways to go about designing and implementing a software backdoor. I suggested three characteristics of a good backdoor: low chance of discovery, high deniability if discovered, and minimal conspiracy to implement.

The critical iOS vulnerability that Apple patched last week is an excellent example. Look at the code. What caused the vulnerability is a single line of code: a second “goto fail;” statement. Since that statement isn’t a conditional, it causes the whole procedure to terminate.

The flaw is subtle, and hard to spot while scanning the code. It’s easy to imagine how this could have happened by error. And it would have been trivially easy for one person to add the vulnerability.

Was this done on purpose? I have no idea. But if I wanted to do something like this on purpose, this is exactly how I would do it.

EDITED TO ADD (2/27): If the Apple auditing system is any good, they would be able to trace this errant goto line not just to the source-code check-in details, but to the specific login that made the change. And they would quickly know whether this was just an error, or a deliberate change by a bad actor. Does anyone know what’s going on inside Apple?

EDITED TO ADD (2/27): Steve Bellovin has a pair of posts where he concludes that if this bug is enemy action, it’s fairly clumsy and unlikely to be the work of professionals.

Tags: Apple, backdoors, deniability, iOS, patching, sabotage

Posted on February 27, 2014 at 6:03 AM • 168 Comments

Comments

Michael P • February 27, 2014 6:15 AM

I disagree that this is particularly easy to overlook when reading the code. That it survived tells me that either Apple as an organization did this intentionally or that they have neither good code review practices nor remotely useful static analysis tools, even for security-critical code. Even a halfway decent compiler, using the flags that are appropriate for this kind of code, should flag the unreachable code with a warning (promoted into an error by one of those flags I mentioned).

Chris W • February 27, 2014 6:29 AM

The flaw is subtle, and hard to spot while scanning the code.

Actually, it isn’t hard, plenty of compilers give warnings (or errors) if it can statically determine that specific code is never executed.
This is a prime example of that and should have been easy to spot.
I don’t know a thing about ios development, so I don’t know about used compilers. I do know plenty of developers ignore warnings. But thats no excuse for security-critical code.

Additionally, these things should have test suites, unit tests, the whole nine yards.

Just checked the comments on the link, apparently I’m not the only one with these conclusions.

Bruce Stephens • February 27, 2014 6:42 AM

Looking at the dozen or so lines of code presented, the bug is obvious. Has anyone seen this in the context of the whole change (which I presume would be changes to several files, and quite likely more changes to this one file)?

In such a context (presuming it is larger than what people are showing) I could easily imagine such a bug getting through code review. I imagine it’s commonplace at Apple and elsewhere. What’s different in this case is just that it happens to have a catastrophic impact.

Likely similar bugs (causing odd minor corruption of iCloud-saved Pages documents and the like) just get dismissed as ordinary bugs. Even if we had access to the source nobody (outside Apple) would care enough to look for the cause.

What’s harder to explain is how this would get past automated (or other) testing. That’s got to feel embarrassing to the team responsible, though test suites are pretty much guaranteed to be incomplete; maybe this functionality just wasn’t part of the testing. (Unless the flaw was deliberate, obviously.)

Bob S. • February 27, 2014 6:46 AM

I would think the code fail would show up in testing and or packet inspection. No?

Is it really that easy?

And as usual complete silence by apple or the govt as to how this happened.

felix • February 27, 2014 6:49 AM

@Chris W: the Clang compiler does have a warning for dead code, but it is not included in -Wall and -Wextra. GCC did have such a warning, but it was removed a couple of years ago. So the chance of noticing this due to a compiler warning is pretty small.

john • February 27, 2014 6:56 AM

If they wanted to make it difficult to discover, wouldn’t it have made more sense to put some different line of code in place of the first “goto fail;”? Indented code that looks like part of a conditional block but isn’t, is easy to miss. Two consecutive goto’s? Not so much.

OTOH, this version is marginally more deniable.

Ruben Schade • February 27, 2014 6:57 AM

The use of this block shortcut in C-derived languages really needs to be discouraged. It does make code cleaner if all you have is one line, but it just invites trouble like this. I only just starting my professional coding career in the last few years, but already I’ve encountered similar issues where an extra line was inadvertently added outside a block. We were just lucky that appropriate procedures (compiler defaults, testing) caught them.

Dre • February 27, 2014 7:32 AM

Apparently neither GCC nor Clang (the compiler from xcode) throw a warning about unreachable code, even with the “warn all” flag enabled. That’s right, “warn on unreachable code” is specifically excluded from the “warn all” setting. Clang does have a separate flag for it, though.

Name • February 27, 2014 7:34 AM

This bug would have been impossible in Python. I know that a lot of C fans hate Python’s significant whitespace feature, but it completely eliminates this family of errors.

I’m just saying.

Kyle • February 27, 2014 7:36 AM

My guess was that it was a merge conflict that went unresolved.

Martin Schröder • February 27, 2014 7:37 AM

@Dre: And in gcc the “warn on unreachable code” is disabled since at least 2011.

Jim H • February 27, 2014 7:39 AM

Are Apple using xcode internally? Does xcode highlight unreachable code?

Stephen Thomas • February 27, 2014 7:41 AM

Merge conflict or copy-and-paste gone awry. If someone had wanted to camouflage the error, they could have done a much better job.

Alex Young • February 27, 2014 7:43 AM

Here’s the full source file: http://opensource.apple.com/source/Security/Security-55471/libsecurity_ssl/lib/sslKeyExchange.c?txt

Poul-Henning Kamp • February 27, 2014 7:44 AM

I’m with Bruce on this one:

This smells so much that it’s hard to belive it is an accident.

However, I don’t think NSA would be behind this, they have other leverages with Apple (ie: PRISM “direct access to servers” and the CA/PKI system as such.)

I think it is far more likely that this was done on behalf of somebody Apple would never “assist with their inquries”.

But I’m not going to speculate who that might be, since there are far too many candidates.

Bobby • February 27, 2014 7:47 AM

There are so many questions here – many asked above (apparent lack of testing, who did what and when in this code module, not enabling Clang unreachable code warnings and so on). I find it unacceptable for Apple not to respond professionally on this issue. They should do a full investigation and present their findings. We cannot let them get away with keeping quiet on this issue.

Woo • February 27, 2014 7:49 AM

Looking at the diff makes me wonder how that line ended up there.
I don’t see any reason to add it, considering the changes in the lines above it. It doesn’t match up, syntactically.
It’s also too obvious to assume it was added deliberately – the NSA or whoever could have done a much better job at plausible deniability (e.g. add a new if-clause above, together with the goto fail, commit that, and then remove the if-clause in a second commit and “forget” about removing the goto fail.)

Alan Kaminsky • February 27, 2014 7:49 AM

How well I remember Edsger Dijkstra’s 1968 letter to Communications of the ACM titled “Go To Statement Considered Harmful.” Back then Dijkstra was pointing out how using the goto statement leads to unstructured, difficult-to-maintain code. Now we see how using the goto statement leads to insecure code.

Have we learned nothing about programming in the last 50 years?

Why would anyone in this day and age use goto statements?

Brock Allen • February 27, 2014 7:54 AM

A discussion and link to the diff/checkin for the flaw:

https://news.ycombinator.com/item?id=7281378

John • February 27, 2014 7:55 AM

Name: It would not have been impossible in Python at all, it would just look slightly different. That is, during a code edit a tab is mistakenly removed, since whitespace is semantically significant, that changes the functionality and the line does not belong to the if clause anymore, just like in this case.

if 1 == 2:
   print "Hello "
print "world"

felix • February 27, 2014 8:01 AM

@Alan Kaminsky: if they would have used “return 0” or “throw SSLError()” instead of “goto fail”, they would still have the same bug.

folbec • February 27, 2014 8:01 AM

To me this looks like a botched patch merge job by some automated tool.

I get this kind of things occasionnally, even with modern tools like git or Mercurial. However, it should have been caught in code review and / or automated testing.

Gweihir • February 27, 2014 8:05 AM

To any halfway competent C programmer, this flaw will be glaringly obvious. Sure, if the code was written once, badly tested and never read by anyone, the flaw has some (low) level of plausibility. But even a single code walk-through or even cursory but complete read-through would immediately have found it.

My money is on “deliberate, low saboteur competence”.

Andrew • February 27, 2014 8:05 AM

The source control system should of course have full accountability for who made the change, including back-tracing to the bug report that spawned it and the code reviewers who approved it. It’s hard to believe something like that could get through code review but some pretty obvious things do get missed. One would think that security code would get far more formal inspection (it does at my company).

And, let’s not forget the Great Linux Backdoor Attempt of 2003.

JamesR • February 27, 2014 8:06 AM

Not that tricky to detect unused code… I’m sure some of you saw this:

https://twitter.com/appcode/status/437896886649757696/photo/1

David Conrad • February 27, 2014 8:11 AM

A merge conflict that was resolved badly makes a lot of sense to me. If it were international I think either the first goto would be an unrelated line of code, or the if would have a semicolon after it and there would be just one goto. I agree that Apple needs to explain why an unreachable code warning wasn’t turned on and paid heed to, and how this escaped testing. But, pace Dijkstra, goto is not the problem here. You would have the exact same problem if a flag was set and later tested, if the assignment of the flag was duplicated in the same way.

Scott Jordan • February 27, 2014 8:13 AM

The testing is what has my eye.

How could such a fundamental function in a modern OS get short-circuited in this way and not be detected? …It’s not a rhetorical question; if I were Tim Cook it would be the thing I would want investigated with hot pokers and tongs, right now dammit.

It’s as if a file-write function failed to write data to disk: a fundamental failure. If it was missed in testing then either the testing was lame to a literally unbelievable degree …or something else is going on.

Keep in mind that Apple itself may have been a victim of this. It’s a company whose devotion to secrecy is well known, and a company that eats its own dog-food. This vulnerability potentially exposed all manner of internal conversations, data, passwords and transactions as employees work around the globe and converse with partners and suppliers around the globe.

Cui bono?

Alan • February 27, 2014 8:14 AM

You might be able to trace the change to a username, but you would have no idea who actually used that username to make the change. The user’s account could have easily been hacked or stolen and used to make the change.

Wayne • February 27, 2014 8:17 AM

With some existing revision control systems this is very possibly
a merge conflict that auto-merged as so the result of the merge
was never examined by a human. Since the resulting code compiled
and passed regressions no one noticed. And if this was legacy code no one bothered removing all compiler warnings so new warnings wouldn’t be noticed.

Disclaimer: I am a developer of BitKeeper who has obsessed over software merging for years. Forgive me if I say our merge is still significantly better than current open-source code.

Dre • February 27, 2014 8:26 AM

Incidentally, I end up doing this all the time in Notepad++. Ctrl-D means “duplicate current line” and Ctrl-S means “Save”. Sometimes when I go to hit Save via keyboard shortcut, my finger clips the D on the way…

Of course, I’m not Apple, I’m just a dude. So that’s no excuse.

paul • February 27, 2014 8:28 AM

How incomplete a test suite would you need to miss this case? There aren’t a whole lot of cases at the top level for secure connections.

Tamas Tarjanyi • February 27, 2014 8:31 AM

I doubt we can ever get the answer but VCS (Version Control System/Software) can easily answer when it was introduced and by whom. Via commit? Via merge? By human or by automated process?
The big problem is lack of trust. Whatever they say nobody belives anymore. That is really said and the real problem I think.
So many lies around.

Benni • February 27, 2014 8:33 AM

Actually, exactly this kind of bug helps nsa when they are running operation “FLYING PIG” where they impersonate google and yahoo using ssl encryption: https://www.schneier.com/blog/archives/2013/09/new_nsa_leak_sh.html

I wonder wether Microsoft behaves equally in similar situations.

In these slites https://firstlook.org/theintercept/2014/02/24/jtrig-manipulation/ it is revealed how gchq may put pressure on company employees.

It is possible that nsa just approached the responsible dev at apple by a honey trap and then blackmailed him.

For example, leading developers of microsofts crypto api were stunned when they learnt from outsiders that windows cryptoapi contained not only a second but a third nsa key:

http://www.heise.de/tp/artikel/5/5263/1.html

these things could be introduced by just one developer that was pressed that way.

Uh, Mike • February 27, 2014 8:40 AM

If the person who made the mistake were liable for damages, the whole progression here would have been quite a lot different.

Benni • February 27, 2014 8:52 AM

One should also note that, as far as I know there are export regulations which basically mean that apple and microsoft must hand out the code of their operating system to get reviewed by nsa (at least this was the answer of microsoft when asked about the nsa key, that they were forced to give the sourcecode to nsa for review). Under these circumstances, it is very easy for nsa to blackmail some devs that they introduce one small line of code.

It is also easy to bug servers at google this way (at least Bruce Schneier says that they have, appart from prism, where they ask the companies directly, and muscular, where they sit, with help by the fiber provider, on the rented fibers between the google servers, a third programm, where nsa introduced bugs into individual google servers, in order for collecting adress bugs and wlan passwords)

diydsp • February 27, 2014 9:01 AM

I’m with those who blame a merge issue…

…but I also blame lax practices including: 1. not using curly braces, 2. using goto instead of a flag (actually they had a flag and didn’t use it properly….) 3. not using/ignoring proper compiler warnings. All three of these use more hard drive space b/c larger code, but dammit, they lead to reliable code. Security is one place where you want reliable code, suggests me.

While we’re at it: What about mixing uint8_t notation and UInt16? What about no comments at all? What about mixing the type OSStatus with zero? How about the unnecessary/misleading comparison to zero? if OSStatus is an enumerated type (appears to be) how about labeling it with _e? How about putting the constant in the comparison on the left side (yes I know it’s not ==, but how about preparing for the future in case new return codes are added?) ?

And that’s just good syntax practices. I just realized the function is only using /one/ of the /five/ parameters passed to it. wtfbbq!

Yes, I’ve been coding for >0x14 years and I realize none of my suggestions is critical, but they’re all part of writing reliable, high-quality code. And this security breach is evidence that minimal-quality crap code will come back and bite you in the arse. C’mon guys, you’re /Apple/. You get paid well. Write good code. If this is typical code quality, I wouldn’t be surprised to found a fountain of errors.

Kevin Lyda • February 27, 2014 9:08 AM

To me it demonstrates the flaw of throw it over the wall open sourcing. Apple releases all of this code from time to time, but without all the commits that got it there.

That’s unfortunate.

Craig • February 27, 2014 9:24 AM

I think a merge error (possibly not even flagged by the revision control system as a conflict) may be to blame for this. I don’t know what Apple’s practices are like with regard to having developers working on different features in different branches, or merging fixes from older release branches to newer ones, but if you’re merging thousands of files all at once, you tend to trust the software to do it right and call out conflicts where necessary. Otherwise the merge process would take forever.

If this was instead the result of an individual carelessly editing the file, then that person ought to be taught proper practices for submitting changes. After I have edited one or more files, I always review my changes line-by-line using a visual diff tool before submitting anything. Something like this would have been caught that way. A required code review by another engineer would also have caught it, assuming the reviewer actually took the time to examine the changes in detail and understand them.

While I consider the use of the “goto” keyword to be a hallmark of incompetence in C and similar languages, I don’t think that it, or the lack of curly braces, really has anything to do with this. There are lots of ways that similar errors could have been introduced by an accidentally duplicated or omitted line. If someone says that this example proves anything about goto or C’s optional curly braces, they’re just taking advantage of the situation to argue for their own personal preference. It’s sort of like saying, “See, fast food really is bad for you” after hearing about some deranged gunman shooting up a McDonald’s. No one with a decent grasp of logic will be convinced by that sort of argument.

Jimfive • February 27, 2014 9:26 AM

I’m sure all of the programmers among us have seen diffs where it looks like a line was removed and then the same line was added.

I wonder how hard it would be to fool the automated merge into thinking that a duplicate line added by different commits was different and happily add it twice.

I agree that coding standards that required either braces or a single line would have made this easy to pick out in code review. e.g.
if (0==x) goto fail;

if(0==x) {
goto fail;
}

On a vaguely related note, I think == for comparison was the biggest mistake made in the C language. All of the other comparisons are single character (!, <, >) add = and mix and match at will. I would have preferred pascal’s := for assignment.

JimFive

Diego • February 27, 2014 9:27 AM

Stallman was right. Al the time.

Peter • February 27, 2014 9:36 AM

@felix: surely if they used “throw SSLError()” instead of “goto fail” then the bug would reject all SSL authentication attempts, and would have been found a lot earlier?

dandraka • February 27, 2014 9:46 AM

@all the people that point to a possible merge error:

Seriously ?????

Of course this can happen in software, be it a merge error, an accidental copy-paste or what have you. Any junior developer knows that.

THAT’S WHY THERE ARE MANY LAYERS OF DEFENCE AGAINST THAT.

Let me name a few, others can certainly complete the list:
1. Peer programming
2. Code review
3. Compiler warnings
4. Check-in rules
5. Unit tests (i.e. tests that check specific code functionality)
6. Integration tests (i.e. tests that treat the system as a black box)
7. Verification and validation (which is basically testing done by soneone who is not the developer)

I cannot, do not believe that a halfway competent company writing software, let alone Apple, failed to spot this in one layer or the other. I do not believe that there is no unit or integration test that checks the system behaviour when presented with a valid and an invalid certificate.

So, again: seriously ?

Benni • February 27, 2014 9:50 AM

By the way: gchq strikes again:
http://www.theguardian.com/world/2014/feb/27/gchq-nsa-webcam-images-internet-yahoo

the program “optic nerve” from gchq covertly saved images of 1.8 of webcams of private users in six months. Sexually explicit webcam material proved to be a particular problem for GCHQ, as one document delicately put it: “Unfortunately … it would appear that a surprising number of people use webcam conversations to show intimate parts of their body to the other person. it appears sometimes to be used for broadcasting pornography.”

name.withheld.for.obvious.reasons • February 27, 2014 9:53 AM

Couple of things…

1.) Change and revision control is not a panacea, unit test should have caught this. Relying on compilers, lexical analysis, or code reviewers to do 100% code coverage is a mistake.

2.) Configuration management should have thrown a flag, did the code size for the SSL library entry via a checksum or actual library object code size be used to suggest that a specific review during the test phase–forcing a unit test that would be thorough.

3.) A mature, possibly ISO 15408 compliant, suggests that these processes are still immature. Or at least the people in charge of these processes are…

Benni • February 27, 2014 9:53 AM

instead of 1.8 webcams it should read 1.8 millions of webcams

Peter • February 27, 2014 9:55 AM

I second Jimfive’s comment. It’s very easy to get duplicate lines of code when doing a merge. If the merge was from a release branch back to main, it’s even easier to see how it might be missed. After that, all you need is lax compiler settings and you’ll never notice the problem.

I also wouldn’t be surprised if Apple isn’t using static code analysis for their C/C++ code. Static code analyzers for C and C++ tend to generate huge numbers of false positives. This makes it easy to miss problems, even if they’re flagged.

Uh, Mike • February 27, 2014 10:11 AM

My iPhone offers to free me from passwords.

“Trust me.”

All devices, that have multiple security levels, are untrustworthy. Just treat them that way.

Thierry • February 27, 2014 10:25 AM

It is a shame that Apple engineering does not have good – or even basic – coding rules.

Such basic coding rules usually impose:

1) Always use {} even for one-line blocks. Would have prevent the bug.

2) Use the most paranoid warning mode of the compiler and turn warnings into errors. Compilation would have failed with a “dead code” warning/error.

Writing security code this way is pathetic.

JohnParry • February 27, 2014 10:31 AM

Take off your tin hats. Not deliberate. This would be an odd way for Apple to cooperate with the feds. Furthermore, it would jeopardize Apple, the company, if the cooperation ever came to light; i.e. Apple would go under. It would devastate the tech industry worldwide. The world economy would shutter. This list goes on and on.

Chelloveck • February 27, 2014 10:35 AM

Having seen similar things in the past I can easily go with “mistake” rather than “malice” here. It’s perfectly obvious once you know it’s there, but we’ve all seen the repeated-word trick — show someone a sentence like “A bird in the the bush.” and see how long it takes them to notice the repeated “the”. Sometimes we just see what we expect to see, even when the problem is blindingly obvious once it’s been pointed out. Yeah, code review helps, static analysis helps, comprehensive testing helps… But sometimes the reviewers miss things, the analyzers have checks disabled, the regression test isn’t as complete as it should be… It happens. I’ve seen it happen. It famously happened circa 1990 when a missing “break” statement in AT&T’s 5ESS code knocked out long distance service for the entire east coast. This instance could be malice, but I think a mistake is far more likely.

Nick P • February 27, 2014 10:37 AM

@ diydsp

Benni • February 27, 2014 10:40 AM

@JohnParry:
I think it is clear that there is no “contract” between apple or google and the nsa.
What is not so clear is whether they do not have approached some developer and pressed him to do that.
Or, if nsa can hack into the computers belgacom engineers to get into the gsm mobile network, it is even concievable that nsa could have hacked the computers of the responsible engineers at apple and planted this bug then there in the osx sourcecode. (The malware capabilities of nsa explicitely contains adding material, according to the slides).

Name • February 27, 2014 10:40 AM

@Peter
If they used “throw SSLError()”, they would have received a compiler error, of course. This is C, not C++ or ObjC.

The “goto error_handler;” idiom is fairly common in C, when you want to clean up before returning but you don’t want your if statements to be indented twelve or fifteen layers to the right. Take a look at, e.g., the Linux kernel source; it’s loaded with this.

@dandraka
The commit bypasses code review because it’s an automated merge (or the reviewers don’t catch this because it’s part of a 5000-line merge review and things fall through the cracks), the compiler doesn’t warn because clang doesn’t warn on this issue even when you have -Wall turned on, and the unit test only tests the happy path. Easy. I could see a shop with half-decent process (emphasis on “half”) falling into this bug.

CallMeLateForSupper • February 27, 2014 10:41 AM

As a gray-beard who, in a former life, was a Product Test engineer for a major (cough) business machines producer, I am in company with @ Scott Jordan and the several other posters who fault the test regimen that failed to catch this coding error. Compiler checks are fine, as far as they go, but they are only one arrow in the development/test quivers.

This piece of code does not serve its intended purpose. Devising a functional test for it is trivial: feed it “bad” input and assure that it sounds klaxons; feed it “good” input and assure that it says “All clear”. As released, this code says “all clear” regardless of input, so it was either poorly tested, at best, or was not tested at all.

Lou • February 27, 2014 10:53 AM

I agree with everyone else that this most likely smells like a merge error caused by an automated merge gone wrong. I’ve seen this sort of thing myself in my career all too often.

The big issue is with Apple’s QA department for not catching this in whatever testing they do.

Ozzie • February 27, 2014 10:56 AM

For me, an important question has not been asked:

Why did this module need to change between IOS 5 and IOS 6/7?

If it didn’t, then the odds are the hange was a deliberate malicious act.

tobias d. robison • February 27, 2014 11:04 AM

The error is NOT hard to spot, and the use of “goto” here, a source of many programming errors, was unnecessary. See my blog item on this code:
http://precision-blogging.blogspot.com/2014/02/the-ios-ssl-flaw-hard-to-spot-bad.html
– tobias d. robison

Alan Bostick • February 27, 2014 11:07 AM

As an aside: Bruce, it appears that notices for your blog posts aren’t showing up on Twitter. The latest blog notification on the @schneierblog feed is dated Feb. 21. (I was looking for the tweet for this post in order to retweet it, and couldn’t find it.)

Warning Track • February 27, 2014 11:09 AM

I’m no programmer, as I’m sure I will make obvious.

I’m a schmuck consumer user.

I am supposed to believe that, in released software, whether an invalid security certificate will be nevertheless be accepted is not tested?

I don’t believe it.

The stench is overwhelming.

Calyth • February 27, 2014 11:10 AM

The high deniability is a nice touch here. This can happen easily as a mistake when someone’s doing a 3-way merge manually. It’s doubly devious if this was intentional.

I wouldn’t go outright to say this is a backdoor. There’s no proof (and we’ll probably never going to find the proof).

John Campbell • February 27, 2014 11:13 AM

The problem I see with the “function test” and “product test” regimen is that… is there a test case that looks for “right” problem?

Testing for the kinds of problems that sneak through this bug are obvious in hindsight… and, unfortunately, some organizations have QA organizations that are exceptionally optimistic.

Also, never forget that there are folks who are under political pressure to accept a deliverable by a certain time and will ignore the testing protocols in order to get it deployed… and, frankly, I have run into systems where the minor design flaws hid the major design flaws.

No system is immune from failure because, somewhere, there is meat in the loop.

Testing is no more immune from failure than any other part of an organization.

Finally, IMHO, a “merge error” would make this a plausibly deniable exploit, wouldn’t it? It’s just a coincidence that it’s a USEFUL exploit, right?

Lou • February 27, 2014 11:14 AM

@Ozzie you can see a diff of the code here to show what changed for the entire file: https://gist.github.com/alexyakoubian/9151610/revisions

I only skimmed the diff so far, and I’m no crypto expert, but it looks like they added support for PSK client keys to the module and, in the process, apparently cleaned up and simplified some of the other code so that some of the function calls weren’t always slinging around an SSL context pointer when it wasn’t really necessary.

It really does look like some sort of merge screw up to me.

The difficulty with looking at this diff is that Apple doesn’t publish their incremental changes, only massive code-dumps, so it’s hard to know what exactly was the state right before the bad line was inserted to see if anything else was inserted at that particular time.

George Rogers • February 27, 2014 11:17 AM

This bug means that APPLE DOES NOT TEST SECURITY CODE. The first test is to give it a valid certificate and the second one is to give it an invalid certificate. If Apple had tested the code even once with an invalid cert they would have found this bug before shipping it. The fact that it shipped is proof that they DO NOT TEST AT ALL. This shouldn’t be surprising, we’re talking about a company that shipped an auto-update system for years with zero authentication until a third party researcher called them out on it.

Don’t attribute to malice what can be explained by simple incompetence. Apple is proven to be incompetent with security. They just don’t give a damn.

jshell • February 27, 2014 11:17 AM

If it was intentional, then why bother fixing it?

Lou • February 27, 2014 11:19 AM

@jshell Or even why bother putting it in an open source portion of your code where the entire world will catch it?

Ray • February 27, 2014 11:21 AM

I don’t think a bug that anybody could have discovered by just redirecting and echoing traffic counts as low chance of discovery.

Nick P • February 27, 2014 11:26 AM

This comment jumped out at me:

“Why did this module need to change between IOS 5 and IOS 6/7? If it didn’t, then the odds are the change was a deliberate malicious act.”

I’d like to know the answer to that.

Peter Marreck • February 27, 2014 11:29 AM

It looks like an errant merge conflict resolution, to me.

I second the need to unit-test EVERY code path, especially in ANY security-related code.

dragonfrog • February 27, 2014 11:30 AM

@JohnParry

RSA is a division of EMC. It was revealed on December 20 that RSA took $10 million from the NSA to weaken the security of BSAFE.

Here’s what EMC’s stock price did around that time
http://www.google.com/finance?q=NYSE%3AEMC&ei=FnUPU_H4EYXLqQHopAE

Impossibly Stupid • February 27, 2014 11:35 AM

People need to stop pretending that simply using braces are a cure-all here. It’s really about code standards, since there is no difference between:

if (foo)
goto fail;

and

if (foo)
{goto fail;}

The second one can just as easily lead to the same duplication “bug”. What is really needed to address the problem is a higher level policy. Even then, subtle sabotage is not going to be completely stopped, just made harder to explain aways as a simple mistake.

What I really would like to see is the change history of the file. The given code is so shoddy that you have to wonder how it got that way. Was this a long time in coming, or did some moron just jump in and start breaking up one-liners and adding goto statements? Apple needs to start firing people, and maybe a lot of them.

Anura • February 27, 2014 11:38 AM

“Never attribute to malice that which is adequately explained by stupidity.”

Someone writing that code and paying attention wouldn’t make that error, but going through and making lots of code changes, e.g. deleting and moving code, and not paying close attention you could easily make that problem.

To me, this is about coding standards, compiler warnings, unit testing, and proper code review. If you always required braces for your control statements, you would avoid those errors. It can be missed in code review, of course, but if it’s not in-place, it should be. Unit testing and compiler warnings about unreachable code would have stopped it if somehow it made it past code review, and coding style guidelines were enforced

The goto’s in that code can actually be removed easilly (resulting in actually less code), and that should really be named to cleanup instead of fail (since it’s always called).

if ((err = ReadyHash(hashRef, &hashCtx)) == 0)
if ((err = hashRef->update(&hashCtx, &clientRandom)) == 0)
if ((err = hashRef->update(&hashCtx, &serverRandom)) == 0)
if ((err = hashRef->update(&hashCtx, &exchangeParams)) == 0)
if ((err = hashRef->final(&hashCtx, &hashOut)) == 0) {
    //success

}

//cleanup

Of course, I have superior powers of hindsight.

Daniel • February 27, 2014 11:41 AM

It seems to me the real lesson here, the one that makes everyone vulnerable, is the fact that complex system fail in complex ways. The result it that it might never be possible to know whether something was a genuine bug or whether it was a deliberate hack. My computer exhibits behavior all the time that I cannot explain. Even if one can trace the error back to a specific person in Apple it may be difficult to to determine whether the person is a “bad actor” or whether he/she was simply incompetent.

John Gilmore • February 27, 2014 11:43 AM

Umm, do the test suites for all the other implementations of SSL and TLS catch this error?!

Umm2, does the browser you are reading this web site with catch this error? Try following this link: https://www.imperialviolet.org:1266/. My Firefox says “Secure Connection Failed: A PKCS #11 module returned CKR_DEVICE_ERROR”. My Crackberry says “The selected server returned an error when attempting to fulfill your request.” Both of them correctly failed to establish the SSL connection and failed to hand you a web page purporting to be from that server. This is good! Because that server simulates what happens after NSA, Joe Blow, or the Starbucks WiFi man-in-the-middles you with an encryption key that doesn’t match the signature sent.

Winter • February 27, 2014 11:45 AM

This does look “similar” to the OpenSSL bug in Debian
http://www.debian.org/security/2008/dsa-1571

I think no one was suspecting foul play then.

Czerno • February 27, 2014 11:46 AM

The dangers of the GOTO statement have been well understood since E.W. Dijkstra pointed them back in his famous 1968 address. The less widely known solution is the elegant COME FROM construct :

https www dot fortran dot com/come_from.html

yesme • February 27, 2014 11:51 AM

A lot has been said here already. I don’t know wheter the bug is deliberate or not, but if not, well it still is very crappy code. For a piece of Transport Layer Security code, that’s bad.

It reminds me of the FOSDEM video “a href=”http://video.fosdem.org/2014/Janson/Sunday/NSA_operation_ORCHESTRA_Annual_Status_Report.webm”>NSA operation ORCHESTRA: Annual Status Report“. A good and quite funny presentation about the alleged activities of the NSA, seen from the POV of the FOSS community. And as Bruce Schneier said before, this is exactly how they would do it.

Impossibly Stupid • February 27, 2014 11:55 AM

To followup on my own post, I tracked back through the links to the original source code (with revisions) at http://opensource.apple.com. That’s some of the worst coding practices I’ve ever seen. They do indeed need to fire a lot of people.

Programmer • February 27, 2014 12:00 PM

The code is difficult to read, so finding this specific error by an audit process into an ocean of millions of lines of code is against the odd, whatever some people here pretend and would like to think.

However, what would have been trivial, and should have been trivial, is a test system, which quite simply guarantee that the code does what it is supposed to do, no matter how it is coded.
Such tests are suppose to exist and be performed at 3 levels (unit tests, function tests, system tests). Nothing that blatant is supposed to get through.

That this particular error could get through the entire automated non-regression test toolchain ask questions about Apple organization.
Either they are unable to test correctly their software, which I find difficult to believe, and would largely destroy confidence in Apple software, or the bug was planted intentionally ; in which case, not only the code was modified, but also the (unpublished) test suite…

Apple is in the difficult position to select its poison : plead for incompetency, or for conspiracy.

yesme • February 27, 2014 12:00 PM

Sorry, typo here. The correct link is: “NSA operation ORCHESTRA: Annual Status Report”. And it’s still a very nice presentation.

Dan • February 27, 2014 12:12 PM

We all know it was intentional. We can’t trust any of these companies or the internet as it stands any longer. It’s time to collectively scrap this build before it’s too late and becomes autonomous. Hear me? Do you agree? If not, why?

David Leppik • February 27, 2014 12:15 PM

@Anura: there’s a similar but more language-independent pattern you see in all the time in Scala. It’s the dispatcher pattern, with what I refer to as table-of-contents formatting.

function foo() {
   if (condition1)  return  action1();
   if (condition2)  return  action2();
   return defaultAction();
}

Rules:

A function using table-of-contents styling is dominated by that style, so nobody gets confused.
The function contains no side-effects: it’s either a dispatcher or returns a value.
It’s a table, so format it with rows and columns.
If a condition doesn’t easily fit on the line, it gets a private function.

Advantages:

Easy to comprehend.
Decisions (policy) are separate from actions (behavior), making debugging easier.
Even more compact than two-line, no-curly-braces.

Disadvantages:

Looks scary if you’re not used to it.
Is scary, if a developer doesn’t follow the rules and the dispatcher has side effects.
Lots of short single-use functions, which some developers hate.
Discourages functions with lots of parameters. Wait, that’s an advantage.

Of course, as others as pointed out, the real problem here was that Apple didn’t test its code.

MyTwoCents • February 27, 2014 12:19 PM

It’s got to be deliberate. Isn’t it quite convenient that the “mistake” appeared exactly where it did, causing complete subversion of the certificate verification process? Here’s my own analysis of it: http://mytwogaycents.com/2014/02/25/ios-ssltls-mitm-vulnerability/

Alex • February 27, 2014 12:23 PM

As many others have already posted, the lack of testing is the suspicious thing. You’d expect them to build a test when they coded the logic in the first place. If they had, the bug would never have survived.

So one interesting thing would be to hear from Apple folks about the way the company uses tests. Does the lack of a test in this case seem unusual? Or does the code base have lots of examples of missing tests for important functionality?

John • February 27, 2014 12:24 PM

@Impossibly Stupid: Your idealized version doesn’t work because other code is present surrounding these tests. There is really nothing out of the ordinary here.

http://koblents.com/Ches/Links/Month-Mar-2013/20-Using-Goto-in-Linux-Kernel-Code/

Dan • February 27, 2014 12:27 PM

Does all the conjecture really matter? I know it’s interesting to speculate and expound on the issue, but why aren’t we talking about circumventing this cyber trap by building a new infrastructure? How about that?

Gareth Owen • February 27, 2014 12:35 PM

Presumably it would have been better to just change the if condition because I assume the hash functions always succeed unless something is seriously wrong. That’d be a lot less hard to spot than an extra goto.

Anura • February 27, 2014 12:46 PM

@David Leppik

There are not multiple actions here, just one that occurs if all conditions are true. If there is an issue with the certificate, the intention is that “success” code doesn’t run and it runs the “fail” code. if there is no problem with the certificate, the intention is that the success code runs, followed by the “fail” code. The “fail” code is really just cleanup code.

John • February 27, 2014 12:57 PM

@Anura: But these tests occur in two places with code in between so you would need to duplicate the clean up section and wrap it up in an else clause. And that is only for this particular example. I quite like the use of macros for a “test and goto” in one.

IfIWanted • February 27, 2014 1:01 PM

This looks like incompetently sloppy coding combined with incompetently sloppy testing, rather than deliberate. A competent evildoer could do something far more obscure. (I have been programming since the 1970s, if it matters, but I do think I would have been able to construct something far less obvious should I have been so minded).

Brian M. • February 27, 2014 1:05 PM

Czerno is right, GOTO has been well-known as a problem for quite some time. I’ve seen “do {} while(0)” with break statements used as an alternative to the SSL sample.

What it looks like is that someone did a copy-and-paste of the code, then simply screwed up. And if a code review is done fast and sloppily, as many are, then something like this gets missed.

Many times, the programmer is simply the guy who bullshits himself into the position, not the guy with the best expertise or concern for quality. Couple that with stupid managers, and stuff like this is normal.

Facts are hard • February 27, 2014 1:08 PM

Just to clarify since a lot of people are missing the point here. This is NOT a case of “just throw an invalid security certificate” at it. It’s not about a self-signed certs, or certs signed by untrusted keys, or anything like that.

It’s about the SSL handshake itself. The cert is valid. The signatures on the cert are correct. The cert is the correct cert. The problem here is after presenting the cert, the client and server engage in an SSL handshake, and in this handshake the server should present the client with data signed by the server certificate, and then the client should validate that signature against the public certificate they already received earlier.

That is the part that’s missing. The client receives the cert, the SSL handshake begins, and the client does not validate whether or not the signed data in the SSL handshake was signed by the same cert previously received. So anybody that can sit on the network and see your client establishing an SSL handshake and can send forged responses back to you before the remote server can respond, can nearly effortlessless MITM your communication.

Brian M. • February 27, 2014 1:10 PM

If the Apple auditing system is any good, they would be able to trace this errant goto line not just to the source-code check-in details, but to the specific login that made the change.

This is called “source code control.” This stuff has been standard for decades. It is standard for a change to contain the time and date, along with the name of the person who made the change.

Anura • February 27, 2014 1:16 PM

@John

You’re right, I was looking at the wrong function. They actually have that code copy-pasted 6 times in the file. They should really make that a function. If they did, the code I modified could be used, it would just have a single return statement at the end (return err) and the last condition would become just a statement.

Spaceman Spiff • February 27, 2014 1:16 PM

Our group (performance, quality, analytics) at a tier-1 mobile device manufacturer, were all banging our heads when we looked at this code. This is NOT a mistake, unless Apple is totally incompetent! Personally, I don’t think they are, and that this is deliberate… 🙁

Impossibly Stupid • February 27, 2014 1:18 PM

@John

I’m not sure what you’re talking about. I never present any “idealized” anything, nor do I make any particular point about goto that makes your link relevant. My point does remain that the code (not just the line with this particular error, but in many other functions and probably many other files) does not appear to be written to be maintained by or shared with others.

Apple deserves all the hassle it’s finally getting for this style of coding. If they don’t do some major firings and clean up this garbage code, I think this incident is only going to mark the tip of the iceberg. If Linux does things wrong, too, they’ll have their own issues to deal with. It absolutely does not excuse either party to point fingers at each other; neither is following reasonable practices (let alone best practices), and failure to address that problem will just lead to other problems down the road.

Anura • February 27, 2014 1:18 PM

To clarify my above post, you would still have gotos in your code, but it would be cleaner and less likely to result in an issue, and if you make a change you have to make it in one place and not 6.

Anderer Gregor • February 27, 2014 1:29 PM

What strikes me is that people complain either about “goto” or about the missing braces, but everyone seems to agree that it was a good idea to (a) set “err” to a new value in each of the expressions (i.e. have a condition with troublesome side effects), and (b) not to assert that err is != 0 when we have reached the “fail” label. The error would have been caught way, way earlier if the “fail” section had refused to return a “0” value (which was causing all the problems, because this caused the whole function to return “all is good” when only half of the tests ran through). And using a “return value” from an assignment is not possible in many modern languages (e.g. python), and it seems like for a reason …

0day • February 27, 2014 1:29 PM

To all who say “it looks like an actual mistake”:

That is exactly why this would be the right way to embed an intentional bug.

After a job like this, NSA would be glad if no one believed it to be intentional.

Anura • February 27, 2014 1:36 PM

@Impossibly Stupid

I find that horrible code and a lack of standards are very common among a lot of open source projects, sadly. The larger they are, the worse they tend to be in that regard. Code review tends to not include things like making sure the code actually follows an organization-consistent styling.

This is by no means limited to open source, of course, it’s just that I have access to more open source code than closed source code. I think a lot of large companies probably skip code review all together.

tim • February 27, 2014 1:54 PM

Based on many of the comments I can only assume that most here have never worked for a large engineering organization. Engineers make mistakes. Tools fail to find errors. QA teams miss obvious unit tests. Code review misses a bug because they were arguing for an hour over the tab setting should be 3 or 4 spaces. That’s the nature of managing a large engineering organization.

This is a stupid error. But its neither a sign of gross incompetence or a conspiracy. I suspect that many within Apple are taking a hard look at their processes. As it should be.

Johan Bentzen • February 27, 2014 1:55 PM

First of all the real culprit here to me is the lack of a full-coverage unit test suite for security code. It would not only guard against this specific problem, it would make it very difficult to sneak in an ‘accidental’ merge error of this kind, because you would have to modify the test as well.

But secondly, I am presently making my living doing code review, and when I look at the diff presented at the first comment in this thread, it is blatantly obvious something is wrong. There are three major changes going on in this change-set (a context has been made obsolete, a return code has been made more specific and a method has been renamed). If you do a lot of review you learn to filter out such ‘expected’ changes as you go through the code, and then the offending line becomes not one of a lot of changes, but one of a handful.

My conclusion is that Apple neither does unit testing nor code review professionally.

Mike Amling • February 27, 2014 2:08 PM

Not that it makes a lot of difference, but …

The code is not checking for “an invalid certificate”. Since the code is hashing things called “serverRandom” and “signedParams”, it’s not checking the certificate per se, but instead ephemeral data generated by TLS after the certificate is checked.

martinr • February 27, 2014 2:17 PM

As it has just been mentioned, this double “goto fail;” is in the code that verifies the digital signature on the ServerKeyExchange contents, i.e. on the ephemeral Diffie Helman Parameters and Server Key (DHE or ECDHE). With this bug, a man-in-the-middle attacker can substitute the (EC)DHE key with his own and successfully impersonate the server in the handshake.

The many gotos in the code and the single cleanup handler is perfectly OK, but not initializing the return value of a security critical function with error, and ABUSING the function return value to store technical interim results of subfunctions is extremely dumb.

Such a function MUST NOT assign “success” to the function return value before the very final verification step has determined a positive verification result.

This code is poor because it does not fail safe. Programming fault tolerant and showing fail safe behaviour is very important for security-critical functions.

Mike Amling • February 27, 2014 2:30 PM

As long as we’re bashing the language…

In C, SSLHashSHA1.update(&hashCtx, …) returns a value apparently because it may, e.g., find something it doesn’t like about &hashCtx. In a better programming language, a hash update would return void. If update found something it didn’t like, it could throw an exception. There wouldn’t be any need for the ifs.

GOTO reconsidered • February 27, 2014 2:42 PM

@Czerno “The dangers of the GOTO statement have been well understood since E.W. Dijkstra…”

There is nothing inherently wrong with GOTO statements.

Some people like Dijkstra simply find it impolite to point, being offended by directness itself.

This tends to be a disease of the dialectical mentality, where indirect, mediated experience is the norm.

Cass • February 27, 2014 2:54 PM

Do you really expect us to take credit for this? We dont take credit, EVER.

Andrew Ludgate • February 27, 2014 3:11 PM

Sniffing around, I found the following code that appears to be an older CPP version of what they had and modified at http://www.opensource.apple.com/source/Security/Security-54.1.3/SecureTransport/sslKeyExchange.cpp?txt

    if ((err = ReadyHash(SSLHashSHA1, hashCtx, ctx)) != 0)
        goto fail;
    if ((err = SSLHashSHA1.update(hashCtx, clientRandom)) != 0)
        goto fail;
    if ((err = SSLHashSHA1.update(hashCtx, serverRandom)) != 0)
        goto fail;
    if ((err = SSLHashSHA1.update(hashCtx, tempPubKey)) != 0)
        goto fail;
    if ((err = SSLHashSHA1.final(hashCtx, hashOut)) != 0)
        goto fail;

tempPubKey was changed to &signedParams, but there appear to be no extra tests in the older code. That makes it less likely that the error was just due to moving code around and forgetting a line. It also shows that the line wasn’t always like that — I’d like to know when the error was introduced. Is it in OS X 10.6, for example?

Impossibly Stupid • February 27, 2014 3:34 PM

@Anura

I’ve also found that a lot of open source projects are organizationally . . . wanting. Whether it’s lack of a common code style or higher level issues (e.g., failure isolate and refactor some things into an independent library), the same “brogrammer” chest-beating idiots often set the tone for the project, treating commits as the only thing that matters. That’s one of the main reasons I avoid contributing to most open source efforts.

But this is Apple we’re talking about here. They’re supposed to be all about clean and elegant things. They have a ton of cash they could be using to hire developer that make their open source software the envy of every volunteer effort out there. Instead, we get this sort of crap, which may not be any worse than the things thrown together by hobbyists, but that’s so far away from expectations that it is baffling that Apple has given jobs to these people.

jones • February 27, 2014 3:44 PM

The quality of Apple’s software engineering has been declining for some time. They get lots of media attention for their glossy interfaces, but they don’t even follow the human interface guidelines that made them a success anymore.

Czerno • February 27, 2014 3:45 PM

@GOTO reconsidered :

“There is nothing inherently wrong with GOTO statements.” The real point is structured vs. ‘spaghetti’ code, and while it’s certainly possible to produce well structured programs using GOTOs,
excessive and/or lazy use of the latter tend to favour bad, unmaintainable, unprovable code.

However my earlier post was not meant to be taken so seriously, as you’d certainly realised had you followed the provided link (the “come from” statement. Twas an April fool’s day article in CACM ca. 1974 which I can remember made me LOL, back then)

Sean Riley • February 27, 2014 3:45 PM

I’m not a security expert, but isn’t the argument here essentially, “Boy, this sure looks like a mistake, ergo, it might be a conspiracy!”?

I get the logic that if you were going to deliberately introduce a backdoor you’d want to do so in such a way that it looks like a mistake. But mistakes do happen, and leaping to the assumption that it’s deliberate precisely because it looks so perfectly like a mistake seems odd to me.

Nick P • February 27, 2014 3:47 PM

@ tim

The only habits of a large organization that matter here are Apple’s. We already know how they approach security.

http://www.zdnet.com/blog/security/kaspersky-apple-10-years-behind-microsoft-in-terms-of-security/11706

http://www.macworld.com/article/1140793/macsecurity.html

Looking for critical flaws, I have at least one a year going back quite a while. These included easy ways to bypass passwords. Apple added new security features like the Capsicum-derived sandboxing system. However, Apple’s problem is that their development process doesn’t care about security at all from design to implementation to maintenance. That’s what the results of their process suggests, anyway.

Impossibly Stupid • February 27, 2014 3:50 PM

@Anderer Gregor

Well, the code is so bad that you can’t blame everyone for not picking apart every last bit of stupidity in it. The use of assignment values is hardly the worst thing a language might allow, but it generally is considered bad practice to do assignments in conditionals at all, because anyone new who comes along and starts looking at the code has to stop and question whether or not a = should actually have been a ==. There are all sorts of constructs in all sorts of languages that allow you to do dumb things. The professional programmer strives to not hang themselves (or others) like that. Apparently Apple doesn’t hire that kind of developer.

Impossibly Stupid • February 27, 2014 3:55 PM

@Sean Riley

The processes leading to this “mistake” is such a Rube Goldberg set up of things that it makes it hard not to ascribe some agency to this level of incompetence.

Czerno • February 27, 2014 3:57 PM

Re. Dijskstra : as you probably know, he didn’t /just/ show us how “harmful” GOTOs could be. He was a fierce proponent of procedural programming – after all, he had been one of the designers of Algol, which by the way /has/ GOTO – and later languages which lend themselves naturally to abstract proofs of programs. His book ‘A discipline of programming’ was very illuminating (to me, and I bet, many). It is somwhat sad that those ideas had so little influence over the world of commercial programming, and over the adoption of better programming languages ( C and derivatives are terrible in this respect! )

Anura • February 27, 2014 4:46 PM

@Impossibly Stupid

I don’t think it’s so much about hiring good programmers, as it is taking the time to recognize there is a problem and clean things up, which ends up being a managment decision. Where I work now, we have a lot of messy code, inconsitent naming and casing, inconsistent indenting (e.g. sometimes things are indented, sometimes they are not, sometimes 4 spaces, sometimes I’ll have three lines, one indented 4 spaces, one 5, one 7 and one 9 spaces), mixing tabs and spaces for indents. The only thing that’s consistent here is inconcistency. Next to nothing is broken out into functions, copy-paste is everywhere, it’s just one gigantic mess.

The thing is, when I work, I mimic the existing standards in place. If the code is already a mess, I’m not going to go back and fix everything; I have tasks I need to complete. I would gladly go back and fix everything, refactor the whole damned thing, component by component. There’s kind of a “if it works, don’t fix it” attitude, however, so it’s never going to happen.

I think we should have a standing order in the open source world. If the code is over 10 years old, freeze all non-critical changes, and refactor.

anon • February 27, 2014 5:46 PM

The second go to fail provides plausible deniability of compiler errors not linting properly. However, the economic problem is huge.

Disclosing and patching is a code canary and another non-disclosure disclosure.

Buck • February 27, 2014 6:17 PM

Plenty of Apple-directed flames to go around…
I wonder what the fact that this went unnoticed for so long says about the current state of our entire industry… :-\

Deliberate? Probably…
By whom? No saying…

And how many others may have noticed, but took a more tight-lipped approach..?

Did anyone happen to catch the timestamp of that commit?

Impossibly Stupid • February 27, 2014 7:01 PM

@Anura

I just don’t think you can separate the two: bad management and bad programmers are a hand-in-hand dysfunctional pair, each enabling the other. There should be enough firings to go around, from the lowly grunt coder who typed in the code, to the senior developers who approved it, to the managers who cultivate that environment, to the officers who do the same. This is exactly the thing that people can and should point to for years to come as why you need to rock the boat like a proper (computer) scientist.

Nobody is perfect, but there has to be some accountability. And you’re responsible for your own professionalism, too. Every place that I’ve gone into that had bad code readily admitted it, and I always tell them that if they don’t make a priority of cleaning things up, they better not plan on me sticking around for long. The good ones change, the bad ones make excuses. Apple is keeping quiet on the bigger picture for now, but for now they’re the bad guys here, from Tim Cook on down, until they make some serious QA changes.

Joe Buck • February 27, 2014 7:11 PM

I like the merge bug theory. Some source control systems, Perforce in particular but potentially any system that uses line-based differences, can be confused when doing merges by repeated lines, and match up lines to be inserted or deleted to the wrong place when work on a development branch is merged to another branch. The many instances of the “goto fail;” line provide ample opportunities for this. One method to make this kind of failure less likely for C or C++ code is to require as part of coding standards that curly braces always be used with if-statements, even if there is one statement in the body. Had this been done, an extra “goto fail” would be unreachable code, probably would produce a warning, and would in any case be harmless.

James Card • February 27, 2014 7:49 PM

I somehow doubt that this was deliberate. It looks more like a copy and paste error. If I were to do this deliberately I would have done the following:

if ((err = SSLHashSHA1.update(&hashCtx, &serverRandom)) != 0)
goto fail;
if ((err = SSLHashSHA1.update(&hashCtx, &signedParams)) != 0);
goto fail;
if ((err = SSLHashSHA1.final(&hashCtx, &hashOut)) != 0)
goto fail;

assuming the same conditions (no unit tests) this is harder to detect through visual inspection than the original defect.

supersaurus • February 27, 2014 8:06 PM

flagging unreachable code and making warnings into errors in a reasonable and general manner is harder than one might think. consider:

if(sizeof(int) == 4)
{
    do_something();
}

a compiler might have code like that and so might code that uses asm or intrinsics. of course you can use the preprocessor to get around the code shown, but complex code (as opposed to the simple example above) littered with preprocessor conditionals or #pragma blocks has its own issues (hard to read, hard to review, hard to debug, etc…).

note: I am not defending whoever didn’t see that double goto in the code in question (assuming it was reviewed), I am commenting on the difficulty of having the compiler do the checking (one can easily find much more subtle cases than the above, and the compiler is not psychic).

65535 • February 27, 2014 11:16 PM

I am at the bottom of the thread. I will make my comments short.

2] If Apple actually paid to get it through the copyright process, which is not certain, one would assume the code was thoroughly tested.

3] The mistake looks innocent – but at an extremely critical juncture.

It could be a merge error (possibly a cut-n-paste error) but it is one Big mistake. As others have noted, the NSA would cover their tracts with such a mistake and the NSA have indicated that they could p0wn any apple iPhone.

The 1,970 line file:

http://opensource.apple.com/source/Security/Security-55471/libsecurity_ssl/lib/sslKeyExchange.c

http://www.zdnet.com/apple-and-the-ssltls-bug-open-questions-7000026628/

Robin • February 27, 2014 11:25 PM

I’ve got nothing of value to add to this already long discussion except:

This would never have happened if Steve was there!

name.withheld.for.reasons • February 28, 2014 12:20 AM

This event brings the point I made in another post full circle. Vendors and suppliers cannot possible meet their ISO compliance (15408/27000) requirements, or a half dozen industry standards, based on what we know and don’t know (I bet you thought I was going to get all Donald Rumsfeld up in here). Even more damaging, companies that rely on their vendors for these products–like aerospace, energy, and transportation will have to reexamine their procurement process and procedures. Do they have mitigation and risk avoidance plans that address this type of product line failure. I bet not!

Clive Robinson • February 28, 2014 1:58 AM

@ Anura,

I don’t think it’s so much about hiring good programmers, as it is taking the time to recognize there is a problem and clean things up, which ends up being a managment decision.

The problem started with managment as I’ve commented on this blogand others in the past.

The problem stems from managment not having a clue about software development, and software developers not having a clue about managment in what is effectivly an adverserial environment…

As has been noted this bit of code has been “Copy-n-Paste” from another part of the code, and CnP of the same section has happened a half dozen times… And thus should have been put into a single function [1].

Now ask yourself if it has anything to do with managment deciding to use “lines of code commited” or similar metric to work out in their minds who is or is not a good code cutter who gets a bonus at the end of the quater/year… Thus other programers quickly realising that “it’s quantaty not quality” managment reward. Thus they start CnPing and a junior manager sees that “productivity has increased” which means they get a bonus as well… so each quater/year it’s “rinse wash and repeate” otherwise nobody gets a bonus instead they get a P45/Pinkslip or similar downsizing.

It’s similar reasoning that produces “feature rich” code, marketing types tell inefective managment “we must have…” managment issue an order and the code cutters know few if any people will ever use the feature, and that neither marketing or managment can tell if it works, so they use it as an oportunity to “up the bonus” not “up the quality”, especialy as they know the number of “fault reports” will be to low to attract attention.

The problem of managment deceiding who is or is not a productive worker is an age old problem, however it became an issue as a side effect of industrialisation. Managment would latch on to the latest “managment guru” thinking and implement the process. The results would initialy be good, but then the “workers” would work out how to “game the system”. The classic example of this was “Time and Motion” back in the 1960s and 70s manufacturing, the managment would send in the T&M consultents with their clip boards and stop watches, the workers would then switch into “look busy mode” where they would make a simple straight forward task look complex and time consuming. The result was a short term benifit for the workers, however in the long term it made automation by computers look more cost effective than it was and in the 80s and 90s many jobs were either automated out of existance or outsourced to where labour was much cheaper.

With software the work is more cerebral than physical and measuring the effectivness of an employee is much harder, but it’s the way managment think and behave so they have to have some metric to measure “productivity” and the workers still know how to game the metrics that most managers can conceptualise.

So yes this “feature” in the code can be shown to have originated in the failings of managment and it’s managments inabilities / incompetence to recognise their own failings and make meaningfull changes that has alowed it to happen. But… you have to ask if managers want to change, afterall simple metrics the workers can game also means a simple way for the manager to show they are “effective” and thus the “gaming” goes on up the ladder where the executives play the same “game” with the shareholders.

With such a system in place it’s easy for others to “game the system” and not only is it virtualy impossible to stop them doing it, its also near impossible to prove it is taking place thus giving near perfect deniability.

[1] There are reasons why the “overhead” of functions might be better avoided but they are few and often pointless in a very short time frame. There is an old story of K&R C and libraries, apparently “R” went around telling programers there was no overhead and that they must use functions in libraries, as one of the programers later said “He lied to us, but by the time we found out it was to late”…

dandraka • February 28, 2014 6:05 AM

@Name :
Again, that’s why there are many layer’s of defence.
THERE IS DEFINATELY IS AN INTEGRATION TEST FOR THIS.
To believe anything else is simply shortsighted.

@0day :
+1111111

@James Card: how can you assume no unit tests, let alone integration (also called system) tests ? For God’s sake, this is Apple we’re talking about, not some 20 year olds working in their mom’s basement.

Adam • February 28, 2014 6:09 AM

This sort of thing can happen when merging branches. I don’t know if it’s deliberate or not, but it’s potentially innocent. Anyway, this error is why a lot of coding guidelines say you must use curly braces for if / else. I’m sure it wouldn’t stop other kinds of exploit but it would negate this particular one.

Clive Robinson • February 28, 2014 6:20 AM

With regards the “curly brackets/braces” argument others have gone into this a little further,

https://blog.codecentric.de/en/2014/02/curly-braces/

However the last paragraph also reflects some of the comments hear about Apples QA and reputation for quality products…

Perhaps it’s me –I stoped liking Apple in the 1980’s after the Apple ][e– bu,t as far as I’m concerned they are a “cult” with fanbois genuflecting before the wooden altars laden with white plastic iconograpy in their hallowed stores. Whilst corporate is perhaps the worlds worst offender at both tax avoidence and pattent / copyright litigation.

Apple are thus exhibiting the worst tendancies in the electronic device arenna to maximise profit, is it thus realy that surprising that they have cut back on QA on what was formaly “hidden from view” software development?

Clive Robinson • February 28, 2014 6:34 AM

@ Adam,

I don’t know if it’s deliberate or not, but it’s potentially innocent.

That raised a wry smile and the though of “how the times change”.

Less than a year ago such a statment would have caused many to think you “a tinfoil hat wearer”, now post Ed Snowden revelations it just makes you look normaly cautious…

dandraka • February 28, 2014 6:44 AM

@Warning Track:

“I am supposed to believe that, in released software, whether an invalid security certificate will be nevertheless be accepted is not tested?”

It’s a bit more clever than that; see, if this was the problem, it would have been found 10 days later by the first developer who would test his/hers iOS application with a homemade certificate.

As others have explained, this is designed to facilitate man-in-the-middle attacks –which of course the NSA is in a perfect position to execute. Ofcourse Apple has to have (and in my opinion does have but it’s also subtly “turned off” somehow) adequate testing for that.

“I don’t believe it. The stench is overwhelming.”

100% agree.

Daniel Serodio • February 28, 2014 7:54 AM

What about the timing of this bug’s introduction and Apple’s inclusion on the PRISM program? Does this mean simply that the NSA found out about this bug before Apple?

Me • February 28, 2014 8:54 AM

Everyone here seems to think the issue is not testing the code, am I the only one thinking WTF are they doing with gotos!?!?!

Benni • February 28, 2014 9:06 AM

Here ghcq describes how it uses sex traps to blackmail company employees:
http://www.eff.org/document/07022014-nbc-gchq-honey-trap-cyber-attack

and here they get many sexually explicite material from 1.8 mio webcam chatters:

http://www.theguardian.com/world/2014/feb/27/gchq-nsa-webcam-images-internet-yahoo

They run face recognition on these webcam images.

So it is easy for them to blackmail an employee with that.

The fact that gchq says in its slides that they blackmail employees with sex traps, this is a whole new level of industrial espionage.

Now the companies must fear that their most loyal employees are blackmailed by some secret service to introduce backdoors into products.

dandraka • February 28, 2014 9:13 AM

@Me: no, you’re not, but look at the bigger picture. This is not a freelance developer doing work for a client paying him $1000. It’s a big multinational company writing software used by millions of people.

The idea is that there have to be many layers of defence against these problems.

If Apple, or my employee (I work for a company that manufactures medical devices) or any company really, depends on a each and every developer doing their work correctly all of the time then they will go out of business very, very quickly.

And Apple knows this. The fact is that iOS is a good piece of software; sure, many people can say this has failed or the other has failed but overall, it is used by many, many people every day and it performs extremely well.

That’s why I do not believe the “ooops, honest mistake” story.

Felix • February 28, 2014 9:34 AM

I’m so glad that Bruce has finally realized that Conspiracy Theories are useful and worthwhile.

Marc Espie • February 28, 2014 9:54 AM

In fact, if you want to hide bugs, duplicating a goto is one of the smartest things you can do.

Remember the old Dijkstra article “goto considered harmful” ? That was not because gotos are harder to read for the programmer, it’s because gotos are more difficult to analyse programmatically. Dijkstra (with Hoare) more or less founded the whole area of denotational semantics, and automated code analysis. Gotos are harder to analyse because you have several points where the code can come from (especially in spaghetti code), so it’s much harder to write invariants, and to try to prove fixed points.

I’ve juste been bitten by… a goto bug in kernel code… that bug amounted to using a variable that was not actually initialized in the code path that was taken after the goto.

Guess what ? gcc didn’t help! it’s usually rather good at finding these kind of bugs. But it turns out that, since gotos are hard to analyze, it doesn’t even try in such a case !

So yeah, peer harder around gotos… they’re a very good hiding place.

MikeA • February 28, 2014 11:39 AM

Yeah, “GOTO considered harmful”. That’s why the cool kids are all using COMEFROM, only it’s been renamed “Exception”. Coding is hard. Testing is harder, and way too many people, especially in “hardware” (i.e. VHDL) verification test only “does it do the expected thing for ‘normal’ inputs”, rather than “does it catch abnormal inputs”.

“Nobody ever got fired for shipping on time”

Patrick • February 28, 2014 12:00 PM

I refactored the code here:

http://fexl.com/goto-considered-harmful

Anura • February 28, 2014 12:25 PM

@Patrick

You’re too concerned with the use of goto. I don’t think your code is any better. For one, if the hash function fails, you keep calling it; since you don’t know what the hash library does internally, you should not assume it’s still safe to call after a failure. On top of that, you return the most recent status code, so while one function call might return an erorr code, the final might return success, which means the verification is skipped and the function returns 0.

You also don’t deal with the biggest problem: the copy-paste code. With absolutely no gotos or extra variables to track success or failure, you can break the SSL code into a function that returns that status code.

Any refactoring needs to focus not on removing goto, but removing the copy-paste code and cleaning up the logic.

Anura • February 28, 2014 12:31 PM

Note that I’m assuming ||= is valid; I have never ever actually seen it used in code.

Nick P • February 28, 2014 12:44 PM

@ Patrick

That refactored code simply doesn’t have enough goto’s to pass inspection at Apple.

@ All

Nadim Kobeissi did a blog post on this issue. Comments on his blog are interesting with one arguing that we’re seeing a code/design pattern common in Apple development. If true, their SDL has some real issues.

http://log.nadim.cc/?p=126

@ Marc Espie

You hit nail on the head with regard to the “Goto’s considered harmful” piece. He wasn’t necessarily against goto’s. He just preferred that control flow uses structured methods that could be analyzed. This is also the same guy that advocated correct-by-construction approach to development where specs/proofs and code were developed side-by-side. I can imagine dropping goto’s into a theorem prover would be even more disastrous.

Dijkstra’s dream is making a return. SPARK Toolset, Design by Contract (eiffel/ada2012), Perfect Developer, SCADE, etc are all Correct by Construction approaches that include proving invariants into development process. They’ve each been used on quite a few commercial products with good results. A pleasant side effect is that designing for verification forces one to get rid of risky programming practices such as throwing gotos all over the place.

Roxanne • February 28, 2014 1:29 PM

What do you think of the Google Maps man-in-the-middle attack?
http://gizmodo.com/its-ridiculously-easy-to-troll-google-maps-with-fake-l-1531646581/1532886453/+ericlimer

Whit N • February 28, 2014 2:42 PM

This is an interesting thread with a fairly limited number of theories. No theory or reason for this error removes the core problem.

The one undeniable issue is that Apple has for a long time been grossly incompetent in managing code related to customer security for over a billion (has to be close) users. As an avid Apple consumer this scares me and at the same time pisses me off at the prices charged when essentially basic level QA is not being done on the code that protects much of what I hold valuable.

I am a little surprised to see that a class action lawsuit has not come out of this. It always amazes me how Apple can escape unharmed/unfazed by these types of issues.

One can only hope that lessons are learned and changes are made going forward. -W

Clive Robinson • February 28, 2014 4:10 PM

@ Me,

There is nothing wrong with the use of gotos at the end of the day as you get closer to the metal the more of them you need (in RTL and microcode every non sequential instruction contains a goto, and arguably even an increment of the PCreg can be seen as equivalent of a goto at the hardware level).

Thus the problem is better rephrased as not being about the gotos but those using them.

And thus you realise the problem of gotos is a subset of a larger problem which is that of complexity and the 99.999% of the population that cannot handle it to an acceptable degree for the designs the majority demand for the functioning of their everyday lives.

Thus the solution since the 1950’s –where all programers were engineers or PhD level mathmaticians– is to manage complexity by shackeling it in various ways we now genericaly call abstraction. The price of this is of course inefficiency, but untill recently Moore’s Law has shielded us from this…

But living with the consiquences of trying to keep Moore’s observation has been a further constraint due to the laws of physics specificaly the constraint of “propergation speed” after all light in a vacuum only travels only 1ft in a nano second which means with a 1GHz clock the return trip distance of a mirror is six inches. Throw in a few gate delays and you are down into millimetric distances or you have to skip clock cycles.

So the trick has been to design hardware to work with sequential addresses to fake shorter distances with multiple cache layers.

Thus the “real bad” of gotos is that of CPU stalls…

This actually became an issue long befor on chip chaches which is one reason we ended up with Complex Instructio Set CPUs. The joke of it was and still is that not only is CISC silicon real estate expensive it’s also very power hungry, so much so it’s limited not by device geometry but “heat death”. Luckily cache memory is close to being “dark silicon” thus it’s limited by dev ice geometry instead so our current CPU’s are realy “Wolf RISCs in Sheeps CISC clothing”…

And to further help prevent the stall issues our compilers are being optomised to get rid of the stalls by adding complexity to the code the programer does not see (CISC machine code, which inturn becomes internal streamlined RISC instructions). The downside is to get the streamlining advantages automaticaly it puts constraints on what can be fed to the compiler.

Needless to say this makes gotos further bad…

Benni • February 28, 2014 4:10 PM

What this basically means is that
a) one should search through the entire apple source code
b) open source browsers such as firefox or chrome can not really relie on the closed source operating system but should code most, especially security functions, by themselves in an open source way. All closed source code must be regarded as suspicious.

c) Microsoft should do the same as apple and allow an inspection of its source. Apple did not get ruined by publishing the sources. So microsoft should do the same. Then we could perhaps look for similar errors, or we would know what nsa_key is for.

DavidTC • March 1, 2014 2:35 PM

This would be a strange way to hide an secret backdoor. As David Conrad said, the easiest thing to do would be to sneak in a semicolon at the end of the if, which is a lot harder to see on a casual glance.

Or do what the person who attacked the Linux kernel did, and put an ‘=’ instead of ‘==’ in a test. It wouldn’t work in this specific line, but it would be easy enough to find somewhere it does.

Incidentally, does anyone find it slightly confusing that going to fail results in success? The problem is, I guess, that err is set to 0 at that point. So the function returns 0, and hey, everything just worked, right?

This is where some sort of ‘can’t happen’ code would be useful…if we’re in the error handling section, and about to return an error code, we might want to check that we actually have an error code to return and aren’t accidentally returning ‘success’.

Instead of ‘return err;’, use ‘return (err ? err : 255);’, where 255 or whatever is a code meaning ‘I have no idea what’s going on.’.

A smart compiler would know that such a situation cannot happen anyway, and remove it. OTOH, apparently we’ve decided to dumb down compilers, and I’m shocked to learn that compilers don’t warn about unreachable code anymore. Seriously, there’s a lot of somewhat silly compiler warnings, but that’s not one of them.

DavidTC • March 1, 2014 2:52 PM

Actually, you could in fact do the traditional trick of switching a == for an =, except backwards.

if ((err ==SSLHashSHA1.update(&hashCtx, &signedParams)) != 0)

Would be an incredibly subtle error and cause a false positive, if I’ve gotten everything right. (err is already 0, and SSLHashSHA1.update returns 0, so that paren would match, making it true, which does not equal 0.)

So I’m going with not deliberate, because there was such an obvious place to better hide the mistake, and it would look just as accidental. (OTOH, maybe that’s just what they want us to think.)

Buck • March 1, 2014 3:05 PM

@DavidTC

I agree, that is totally the worst part about the readability of this code!

How can ‘goto fail’ possibly mean ‘verification success’..? Barf!

The language used in one’s could should describe the intended path to be followed by the language of code.

BruceL • March 1, 2014 9:32 PM

Anyone ever accidentally checked in a print statement used for debugging? Could the developer have neutered cert validation in order to test other changes and forgotten to remove the neuter? Is that incompetence or plausible deniability?

re: PRISM and Apple, most likely, NSA noticed this flaw and simply kept quiet (um, celebrated).

What is astonishing is that this went undetected for so long.

–Bruce

Clive Robinson • March 1, 2014 11:49 PM

@ Benni,

open source browsers such as firefox or chrome can not really relie on the closed source operating system but should code most especially security functions, by themselves in an open source way

Back in 1995 I gave a talk to part time MSc Students in a London University about issues to do with browser security. Due to the nature of the course many students were from “the city” and were at the more senior levels of ICT but still on the technical side.

I indicated back then what I thought was obvious,

1, Webbrowsers were not just going to be the new desktop but also the new “multi service terminal” for work.

2, That multi-service also ment multi-tasking and multi-state.

3, That due to the fact that browsers were statless by design state would have to be sent backwards and forwards across the network as “tickets” with no real security and thus represented a severe security risk.

4, That due to the fact browsers ran in a single memory space for all “open windows” information that should be kept secure in one window would be visable in another window as the browser had no security features to prevent it.

I raised a number of other issues as well before I got to the ‘meat of the problem’ which was,

The change of use to “multi-service terminal” was a significant security risk and that the techniques used to develop proper multi-user multi-tasking needed to be “lifted up into the application level”.

The talk did not go down well the feeling I got from them was that I was ‘a scare mongering unix man’ who was ‘trying to kill the goose that was going to lay the golden eggs’ or that ‘I was wrong to think the industry was going the webbrowser way’…

For many years I repeated my warning but I was looked on as ‘slightly cranky’ untill Google anounced “Chrome” and how they were in effect taking OS security up into the browser.

The simple fact is that browsers even now are still written as “applications with a bit of multi-tasking security bolted on” which still causes all sorts of problems today. They realy should be written as Multi-tasking Multi-user OS’s with a network based display system put on top… Which is kind of where your comment was going.

Clive Robinson • March 2, 2014 12:47 AM

@ DavidTC,

A smart compiler would know that such a situation cannot happen anyway and remove it. OTOH, apparently we’ve decided to dumb down compilers and I’m shocked to learn that compilers don’t warn about unreachable code anymore.

The problem is not what compilers could or could not do or even what they have done in the past…

The problem is “speed” not just of the resulting executable code but also of the compiler.

The simple fact is many “shop tools” are purchased because they “look cool” or have “new/neat technology” and are justified to “managment” as “a boost to productivity”.

Those designing the tools know they have to address both speed issues. The result is that when combined with “poor programing practice” in ‘code shops’ many usefull but “resource intensive” functions get moved into obscure switches or moved out into other tools (which some think is an opportunity to “double your money).

There is also a second less well seen problem sneaking up which is “Chuck Moore’s little law is not a law”. The reality of hardware design these days is how to cheat both the speed of light constraint and heat death constraint. The faster you switch signals the more power you need to use which creates a lot of heat as a byproduct, but the shorter the distance a signal can travel which makes the density higher which makes the heat problem worse.

The problem is many of the “cheats” in hardware are also power hungry, which means we have gone about as far as we can to up CPU speed, and “cheats” are at the point where they only give benifit when the code is ordered and presented to the CPU in an optimal way. And the optomisation falls to the compiler to sort out due to “programer limitations”.

But the issue has gone beyond single CPU capabilities and we are now into the relm of muti-cores and “poor mans parellel programing”, whilst the majority of programers are stuck in the sequential thinking programing model. Compilers are expected to “fill the gap” but even they have gone about as far as they can due to OS and “task sharing” issues.

But to compound the issues, as new code cutters learn how to use the “shop tools” “on their own dime” at home/college/Uni they tend not to learn the switches that “get in the way” of their small coding projects and has oft been said “if you don’t use it, you lose it!”

The programing world has long since passed the “one size fits all” paradigm and trying to second guess programers to fit their incorrect world view onto hardware optomised for a different world view has turned compilers into tourtured beasts chained by the limitations and expectations of their abusers…

grkvlt • March 2, 2014 2:10 AM

@65535

If Apple actually paid to get it through the copyright process, which is not certain, one would assume the code was thoroughly tested.

Except that copyright happens simply by adding the ‘Copyright DATE by ENTITY’ text to a piece of work, so Apple didn’t need to pay for anything. Also, there is no requirement that copyrighted code be tested. One can copyright untested, tested with failing tests, plain wrong, obfuscated, misleading and even non-compiling code, if desired…

Clive Robinson • March 2, 2014 2:23 AM

@ DaveTC,

On thinking about it I should have included a link to give anothe perspective on the problem,

http://highscalability.com/blog/2013/5/13/the-secret-to-10-million-concurrent-connections-the-kernel-i.html

Whilst the author does not specifically call out the programmer-hardware issue on the compiler you can easily see how it is part of the problem. And whilst I appreciate not everybody is worried about trying to acheive 10 million concurent connections at the CPU level it’s only marganily different to other issues to do with databases and algorithms.

Likewise I should have given you a link on the “dark silicon” and the impact it has on the future direction of computers,

http://www.cc.gatech.edu/~hadi/doc/paper/2012-toppicks-dark_silicon.pdf

Likewise a link on the “cheats” used to try to get around the speed of light distance issues,

http://www.lighterra.com/papers/modernmicroprocessors/

Finaly some links on compilers and optomisation issues, firstly a general overview,

http://en.m.wikipedia.org/wiki/Optimizing_compiler

Secondly a look at the methods of deciding which architecture ans optomisations best suit a problem domain,

http://homepages.inf.ed.ac.uk/cdubach/papers/dubach12tecs.pdf

Thirdly (and lengthy) a look at architecture and compiler issues in a complex and resource demanding problem domain (multimedia on a client end computer),

http://euler.slu.edu/~fritts/papers/fritts_thesis00.pdf

DavidTC • March 2, 2014 7:25 PM

Clive…I have no idea what you’re talking about. I was not talking about ‘cheats’ in hardware, or, well, anything at all about what you’re talking about.

Code optimization via compiler is a fairly well-understood field at this point, and there’s absolutely no reason not to optimize out unreachable code…in fact, all modern compilers do this. There is no reason to include a code section that cannot be reached. What they have apparently stopped doing, for no discernible reasons, is not throwing a warning when they do this. They still detect the code segment, and remove it, they just don’t warn anymore.

Now, there are all sorts of compiler warnings for legitimate shortcuts in coding, and, yes, not having those by default can make sense. Many times you know damn well you just did an implicit cast or whatever and don’t need to be warned about it.

However, there is literally no reason you would ever want unreachable code sitting in your source…all that can do is confuse programmers. It does not accomplish anything at all, by definition. If someone is intending to leave unreachable code in a program, they need to comment it anyway so people know it’s not currently used…and the correct way to do that is to comment out the code itself, and, hey, look, it’s no longer ‘unreachable’. It’s like magic!

Leaving deliberately-unreachable but unmarked code in laying around is such horrifically bad coding practices that, frankly, anyone who does that on purpose and then complains about compiler warnings when that happens should be heaved bodily out of the programming community and not allowed back in. That is not how you make code not execute, at least not in code you’re going to show to anyone else. (We’ve all added a ‘0 &&’ at the start of an if statement to test something, but we don’t give that code out…and a compiler warning is very useful to remind us we still haven’t undone that. We want reminded of that!)

And doing it accidentally is such an annoying error that programmers immediately want to know about it. Accidentally unreachable code isn’t some trivial syntax mistake that the compiler is making allowances for, such a thing indicated a pretty serious programming screwup that is going to be hard to detect, and yet still compiles…exactly the thing that compiler warnings exist for. (And, of course, that’s exactly what happened here.)

Seriously, I’d love to see the justification for compilers not including that by default.

Clive Robinson • March 2, 2014 9:41 PM

@ DavidTC,

Seriously, I’d love to see the justification for compilers not including that by default

Actualy history shows it has not a question of “not including” but actualy “removing”. The question you have to ask is why?

As I indicated the finding of code that is not entered consumes considerable resources to be reliable one of which is time. This makes the compiler look slow which the marketing department of the compiler development house would not be keen on.

There is also the age old argument about what goes in “lint” and what goes in “cc”. Some would argue that finding unused code is a “lint function” not a “cc function” because of the performance hit amongst other reasons. Personaly I care not I will work with it either way, but if pushed I could find reasons to say “lint function”, after all it’s the programers responsability to know what they are doing…

Which brings us onto,

Code optimization via compiler is a fairly well-understood field at this point…

Sorry no it’s not…

What is well understood is optimization of serial code to produce a single thread of execution on an assumed CPU architecture.

Compilers don’t take into account a whole load of things which are know to kill scalability. It is generaly implicitly assumed that the programer knows how to structure their code to deal with this. Whilst the run of the mill programers generaly implicitly assume it’s either not a problem or something “down stream” that profiling tools will resolve…

Which is why extending the optimisation out of just the CPU is a growing research area.

Which brings us to,

I was not talking about’cheats’ in hardware, or, well, anything at all about what you’re talking about.

Perhaps I did not make it clear most compiler optimization currently is about making the serial code written by a programmer not trip over all the hardware cheats by trying to prevent instruction que stalls and cache misses etc. This is not an easy task and can take up a lot of resources.

Thus the compiler designer guided by the marketing department has choices to make.

The choices will usually reflect the desires of marketting not for specmanship bells and whistles and things seen as not mainstream will get first depreciated (get it’s own obscure flag) and then removed (and possibly promoted to another product).

With regards,

…and there’s absolutely no reason not to optimize out unreachable code…in fact, *all* modern compilers do this.

Some do, others I know having walked the assembler output don’t, some need you to uses a compiler flag,

It depends on what the compiler designer chose to do.

As for,

However, there is literally no reason you would ever want unreachable code sitting in your source…*all* that can do is confuse programmers.

Not true many “old school” programers do it all the time DURING development. A simple example is empty functions in an include file to keep the compiler quiet, the habit started back in K&R C days, due to resource issues. It was done to make development faster in a way that we would now accept makes unit testing easier. You would put the code unit in one file and a simple “main” in a second file, you simply copy the include file and fill in the functions with appropriate stimulie and returns to test the code unit.

And when the code unit has passed the tests, the file the unit is in can be copied into the source tree or included in the usual way. This sort of development is still common amongst hardware engineers developing embbeded systems and initial test driver code.

Autolykos • March 3, 2014 3:32 AM

@martinr:

The many gotos in the code and the single cleanup handler is perfectly OK, but _not_ initializing the return value of a security critical function with error, and ABUSING the function return value to store technical interim results of subfunctions is extremely dumb.

I agree. Most of the problems with this code are just bad habits and sloppy programming, but having security critical stuff not failing in a safe way is so actively stupid it borders on malicious.

Clive Robinson • March 3, 2014 5:27 AM

@ Autolykos,

Most of the problems with this code are just bad habits and sloppy programming, but having security critical stuff not failing in a safe way is so actively stupid it borders on malicious

Hmm I disagree with you not in sentiment or perception but scope.

To see why replace “security critical” with “safety critical”,I suspect you would agree that both are equally true?

That is the “malicious” behavior arives because of incompetance you identify as “bad habits and sloppy proograming”.

This raises the question of when such incompetence is not “malicious”?

I would argue and I suspect our legal brethren would concure that within your meaning of “malicious” such incompetant behaviour is always “malicious” and thus actionable under many different parts of health and safety and fit for merchantability legislation. Thus the only deffence against the charge of mallicious would be “unknowing” incompetance due to either mental incompetance or cognative disassociation of the Dunning-Kruger effect in the programmer. However whilst that might excuse the individual it does not defend against the lack of “Corporate Responsability” because “employee appraisal”, “test techniques”, “Code reviews” and “automated code checking/verification” are common Quality Assurance techniques way way down the ladder of “Best Practice” as exhibited by others in the software industry.

Which raises questions about Apple’s competance and corporate governance…

As I’ve argued in the past “security” is a “Quality Process” and should like QA be inplace before Day0 of any project as is the case with Safety. Sadly many corporate view points appear not to agree with mine or I suspect yours.

CBGoodBuddy • March 3, 2014 12:18 PM

Folks that are saying it looks like a certain kind of “innocent” error, like a copy/paste error or a merge error, so it’s probably not deliberate.

That might actually be the case, but bear in mind that a three letter agency is only going to do something like this unless there is plausible deniability. The fact that the bug looks innocent is exactly consistent with the way an implanted security bug would have to look, in order to avoid arousing direct suspicion. So I don’t think what the bug “looks like” has any bearing on whether it’s deliberate or not.

If we hadn’t known about the NSA programs from Snowden’s leaks, we would have just attributed this to sloppy programming or testing, or whatever. Never attribute to malice what you can attribute to incompetence, etc. But we do know about NSA’s programs though, and it does raise our suspicions to a higher level.

I suspect that a lot of NSA’s surveillance capability is based on unpatched vulnerabilities. Which surveillance technique gets activated depends on the target and which technology (and associated vulnerabilities) the target uses. So the NSA would be happy to have this vulnerability in place for a large number of potential victims, er… targets, even if some people later patch and remove that particular vulnerability. The larger the menu of attack vectors, the better. So the fact that this bug is easily patchable also doesn’t argue against it being deliberate either, in my mind.

Here’s why I think this bug is less likely to be a spy agency implant. The code itself might appear innocent enough, but the person who checked the code in is trackable. Whoever made this check-in is going to be under intense scrutiny now. Any private associations the person has could come out into the open. If it were a spy agency implant, that might shine a light too bright into the agency’s inner workings. So to get some code like this implanted, an agency needs to find a shill distant enough from spy agency to look innocent even under some intense scrutiny, but gullible enough to be willing to risk his job for said spy agency. I’m not sure how many of these people exist, but generally you’re going to want to look for people under financial or emotional distress.

DavidTC • March 4, 2014 12:07 PM

Clive, I, again, suspect you’re just rambling, as very little of what you’re saying actually makes any sense at all.

You don’t make anything clear. Luckily, I already knew that, and it is completely unrelated to any of this.

What is an easy task and does not take up a lot of resources is finding unreachable code, and if you think there is any optimizing compiler that does not optimize out unreachable and otherwise dead code by default, you are crazy. That is literally the very first thing that is optimized. (Because there’s no point in optimizing that code any further, so it should be thrown away as soon as possible.)

Incidentally, trying to avoid instruction queue stalls would involve figuring what instructions would execute next, which pretty much automatically results in determining what instructions wouldn’t execute at all.

The idea they’re running complicated branch predictions over code but then don’t bother to do anything when they have a probability of 0% of some code executing is nonsense. (Of course, that sort of thing happens well after dead code is removed.)

Compilers don’t take into account a whole load of things which are know to kill scalability. It is generaly implicitly assumed that the programer knows how to structure their code to deal with this.

Uh, no, it’s not. That is actually the opposite of what is assumed by optimizing compilers. Most of the work of optimizing compilers is, in fact, ‘restructuring’ things while compiling. Functions are moved inline, things are moved outside of loops, instructions are reordered, etc, etc…

I am completely baffled as to what you even think optimization is. You seem to think it’s all complicated low-level CPU stuff, when in actuality that’s the tiny 10% speed up that happens after the same, non-CPU-specific generic optimizations have been applied to code, which is like 90% of the speed up.

Almost all optimizations that compilers do are completely unrelated to what CPU they are on, or related only to the extent of ‘this CPU has X registers’. Actually worrying about pipelines and CPU cache is the newest innovation in optimization, and it’s something that happens after the normal optimizations have run. (In fact, it happens after the machine code has already been generated.)

Empty stub functions are not ‘unreachable code’. They are, in a metaphoric sense, the opposite of unreachable code…they are a lack of code at places where code is called, not actual code somewhere it is never called.

And your explanation of the origin of empty stub functions is, BTW, complete nonsense. Stub functions don’t make the compiler ‘quiet’, they make it work at all. Compilers, I don’t care how old they are, don’t compile if you call functions that don’t exist.

Alternately, it’s possible by ‘copying the include file’ you’re under the delusion that you could have the same function defined twice in the source, somehow.

On the third hand, it’s possible that by ’empty functions’, you’re talking about function definitions in an include, which means you understand nothing about coding at all. (Function definitions are not compilable code, they are merely instructions to the compiler as to the correct layout of function calls. And they certainly are not ‘unreachable code’, and would have nothing to do with compiler warnings for unreachable code.) Also you appear to think programmers have stopped using them for some reason.

Michael Moser • March 4, 2014 12:19 PM

i think this is a bug that they should have found via automated testing; so it is hard to tell if deliberate or not.

So most probably they just did not test at all, since they are doing a derivation of openssl and were assuming the openssl is good enough.

There are well known test suites for SSL/TLS; for black box testing of the protocol implementation

for example the Netscape SSL test suite

https://www.mozilla.org/projects/security/pki/pkcs11/netscape/ssltest.html

ssl_scan/ssl_test

https://www.owasp.org/index.php/Testing_for_SSL-TLS_(OWASP-CM-001)

Also openssl itself does some regression testing on its own.

probably others that i am not aware of;

sooth_sayer • March 4, 2014 9:31 PM

Bruce .. you seem to be seeing you own ghost .. and once you start carping on nsa . you are likely to get this way.

Apple, like many others could be stupid (i never liked them much .. though I still have 3 ipods and 2 iphones) .. but you sir, are a gone case of your own making.

There are too many here who egg you on .. and they are worst enemies of thine!

Mark in CA • March 5, 2014 3:04 AM

More fuel for the conspiracy fire? Just a coincidence?

blockquote>
Critical crypto bug leaves Linux, hundreds of apps open to eavesdropping”

Hundreds of open source packages, including the Red Hat, Ubuntu, and Debian distributions of Linux, are susceptible to attacks that circumvent the most widely used technology to prevent eavesdropping on the Internet, thanks to an extremely critical vulnerability in a widely used cryptographic code library.
<\blockquote>

http://arstechnica.com/security/2014/03/critical-crypto-bug-leaves-linux-hundreds-of-apps-open-to-eavesdropping/

T • March 5, 2014 6:15 AM

It looks accidental, but:

if ((err = SSLHashSHA1.update(&hashCtx, &signedParams)) != 0);
goto fail;

would have been far harder to catch by eye.

Benni • March 5, 2014 4:58 PM

@Mark in CA

I think ive seen some diffs in a discussion that show the person who introduced that bug was and is still a gnutls leading developer. Perhaps this was accidential.

But the sad truth is that this may not matter. It could be that nsa knows these vulnerabilities and uses them in their FLYING PIG program where they impersonate google in ssl connections, anyway.

It may be that the FLYING PIG program was created after the nsa learnt from some of their hackers that many browsers have flaws in their checks of ssl certificates. It can well be that they do not even have to insert anything because the software that we use is hoplessly flawed anyway.

Frank • March 5, 2014 5:04 PM

Management problem.
a) That bug is inexcusable. The code is crap. Not the programmer’s fault though.
b) Because something as monumentally important as SSL security should be the subject of quality controls so tight that changes to source like that would be accompanied by religious ceremonies
c) Deliberate? Quite probably. Only after the management around that area was relaxed and ‘destabilised’

OB • March 10, 2014 10:00 AM

doubled goto: easy to see, if it’s not a programming newbie. (But Apple is “hip” and maybe they have some script kiddies as “experienced professionals”? Maybe they are good in self-PR, so they got the job?)

Unreachable code test?

SHOULD work, but wait…
“The -Wunreachable-code has been removed, because it was unstable”
( http://gcc.gnu.org/ml/gcc-help/2011-05/msg00360.html )

But isn’t Apple using clang for everything, and not gcc anymore?

OB • March 10, 2014 10:51 AM

The code is ugly.
Many (most) programmers think, that code layout has no influence on quality. (Same for comments).

They are wrong.
Code readability does matter.
Write code for the human, not for the compiler.

I know, that’s “uncool”, but “being cool” and generate crap is really “uncool” (IMHO).

I have seen this kind of ugly code in many places… open source, closed source, non-commercial and commercial.

People don’t learn….
Or maybe they have just the wrong teachers.