Let's Encrypt Vulnerability

The BBC is reporting a vulnerability in the Let's Encrypt certificate service:

In a notification email to its clients, the organisation said: "We recently discovered a bug in the Let's Encrypt certificate authority code.

"Unfortunately, this means we need to revoke the certificates that were affected by this bug, which includes one or more of your certificates. To avoid disruption, you'll need to renew and replace your affected certificate(s) by Wednesday, March 4, 2020. We sincerely apologise for the issue."

I am seeing nothing on the Let's Encrypt website. And no other details anywhere. I'll post more when I know more.

EDITED TO ADD: More from Ars Technica:

Let's Encrypt uses Certificate Authority software called Boulder. Typically, a Web server that services many separate domain names and uses Let's Encrypt to secure them receives a single LE certificate that covers all domain names used by the server rather than a separate cert for each individual domain.

The bug LE discovered is that, rather than checking each domain name separately for valid CAA records authorizing that domain to be renewed by that server, Boulder would check a single one of the domains on that server n times (where n is the number of LE-serviced domains on that server). Let's Encrypt typically considers domain validation results good for 30 days from the time of validation--but CAA records specifically must be checked no more than eight hours prior to certificate issuance.

The upshot is that a 30-day window is presented in which certificates might be issued to a particular Web server by Let's Encrypt despite the presence of CAA records in DNS that would prohibit that issuance.

Since Let's Encrypt finds itself in the unenviable position of possibly having issued certificates that it should not have, it is revoking all current certificates that might not have had proper CAA record checking on Wednesday, March 4. Users whose certificates are scheduled to be revoked will need to manually force-renewal before then.

And Let's Encrypt has a blog post about it.

EDITED TO ADD: Slashdot thread.

Posted on March 4, 2020 at 6:46 AM • 24 Comments

Comments

PeterMarch 4, 2020 7:15 AM

Let's Encrypt sent out mails yesterday titled "ACTION REQUIRED: Renew these Let's Encrypt certificates by March 4" to anyone affected. I wonder how many active certificates will be revoked. They only give about one day of time to renew the affected certificates.

Here is the content of the mail I received:

We recently discovered a bug in the Let's Encrypt certificate authority code, described here:

https://community.letsencrypt.org/t/2020-02-29-caa-rechecking-bug/114591

Unfortunately, this means we need to revoke the certificates that were affected
by this bug, which includes one or more of your certificates. To avoid
disruption, you'll need to renew and replace your affected certificate(s) by
Wednesday, March 4, 2020. We sincerely apologize for the issue.

If you're not able to renew your certificate by March 4, the date we are
required to revoke these certificates, visitors to your site will see security
warnings until you do renew the certificate. Your ACME client documentation
should explain how to renew.

If you are using Certbot, the command to renew is:

certbot renew --force-renewal

If you need help, please visit our community support forum:
https://community.letsencrypt.org/t/revoking-certain-certificates-on-march-4/114864

Please search thoroughly for a solution before you post a new question. Let's
Encrypt staff will help our community try to answer unresolved questions as
quickly as possible.


Your affected certificate(s), listed by serial number and domain names:

REMOVED

If you are receiving this email in error, unsubscribe at:
http://mandrillapp.com/track/unsub.php?u=REMOVED
Please note that this would also unsubscribe you from other Let's Encrypt
service notices, like expiration reminders.

PeteMarch 4, 2020 7:17 AM

Read that only certs with multiple domains was impacted.

So, if you deploy 1 cert for 1 domain, should be fine.

We didn't get any emails from LE about issues.

Vincent ArcherMarch 4, 2020 7:29 AM

The validation code concerns CAA records. If you required a certificate for N domains, instead of checking the CAA to verify if letsencrypt was allowed to issue for each of the N domains, it checked... the first domain N times.

So, it was theoretically possible to issue a certificate for domains for which the CAA will fail, tagging it as invalid.

Robert de BathMarch 4, 2020 7:39 AM

I am seeing nothing on the Let's Encrypt website. And no other details anywhere. I'll post more when I know more.

FYI It's under "About Us->Service Status".

PeterMarch 4, 2020 7:42 AM

True, the website doesn't mention the issue at all. However, on the status page of Let's Encrypt it is mentioned. There is also a subscribe button for anyone that wants to make sure they get notified about the next incident.

Impossibly StupidMarch 4, 2020 10:39 AM

@Peter

I wonder how many active certificates will be revoked.

It's probably very close to the ~2 million non-duplicate valid certificate count. Their certs are valid for just 90 days, and they say the buggy code was deployed on 2019-07-25.

I've have about a dozen certs issued by them, 3 were affected, 1 is a dup. The biggest hitch in the process were the errors introduced by their Multi-Perspective Validation process. They use Amazon as their main cloud provider, and my servers have been attacked by those IP ranges too often not to have them widely blocked at the firewall. That resulted in errors when I attempted to replace the bad certs, which resulted in a rate-limiting block on top of it! I eventually did get them replaced (after a bit of a wait and temporarily whitelisting a large chunk of AWS IPs), but LE really should be more mindful/helpful when it comes to providing security services to anyone who is serious about security.

PeterMarch 4, 2020 11:03 AM

@Impossibly Stupid

I wonder how many active certificates will be revoked.
It's probably very close to the ~2 million non-duplicate valid certificate count. Their certs are valid for just 90 days, and they say the buggy code was deployed on 2019-07-25.

I should have phrased this clearer. What I actually meant is how many certificates will be revoked while still being in active use. It's really easy to miss the deadline just because you didn't read your mail, specified no mail address, the person in charge has a day of or some other reason.

hildMarch 4, 2020 12:16 PM

LE really should be more mindful/helpful when it comes to providing security services to anyone who is serious about security.

The whole process of "verification" using data vulnerable to the very attacks it tries to protect against (BGP attacks, MITM, etc.) is absurd. DNSSEC is the only verifiable way to confirm domain ownership. The ACME spec. should say (and browsers should require) that CAs MUST obtain, verify, and store a valid, signed, DNSSEC opt-out record; or a DNSSEC-signed CAA record allowing insecure verification; before allowing insecure methods.

BrianMarch 4, 2020 1:55 PM

They should have revoked possibly-affected certificates immediately. Why wait at all? That’s the kind of move money-making organizations do.

SpaceLifeFormMarch 4, 2020 4:10 PM

@ Impossibly Stupid

"They use Amazon as their main cloud provider,"

[ Why? I've lost count of AWS leaks. ]

@ hild

The whole process of "verification" using data vulnerable to the very attacks it tries to protect against (BGP attacks, MITM, etc.) is absurd. DNSSEC is the only verifiable way to confirm domain ownership.

[ Are you sure you can trust DNSSEC? I have no reason to. And why has it been so slow to be adopted? ]

The only thing we know for sure is that LE has collected a lot of public keys.

But, via random bad luck...

My private key may actually be your public key.

My public key may actually be your private key.

Impossibly StupidMarch 4, 2020 4:20 PM

@Peter

What I actually meant is how many certificates will be revoked while still being in active use.

Ah, yes, there is that. There were already people on their site yesterday leaving comments complaining about being on vacation when this issue popped up. It'd be really nice of LE provided an after action report to give everyone a sense of how well their servers held up given the massive rush to reissue certs, providing a count of how many were not renewed in a timely manner.

@hild

DNSSEC is the only verifiable way to confirm domain ownership.

Well, I give them some leeway because I don't think their aim is to establish "ownership" of a (sub)domain, merely to validate its control to the extent necessary to "trust" an encrypted connection (which, honestly, shouldn't even require a CA). It'd also be interesting to know how many people using LE even have a working DNSSEC setup already; my guess is that making that a prerequisite would reduce the usefulness of LE to nearly zero.

@Brian

Why wait at all?

It was just a vulnerability, not an exploit (presumably). It'd be a jerk move to shut everyone down if it amounts to a bookkeeping error. The certificates of mine that were affected were created years ago and simply renewed over and over again. I never even considered trying to slip in an attempt to get a cert for what would be some other random domain on the same shared server. My guess would be that nobody else would risk that either, and that LE would/did immediately revoke anything in their records that looked suspicious like that.

SpaceLifeFormMarch 4, 2020 4:44 PM

@ hild, Clive

Old, but not stale.

Good Entropy is your Friend.
Bad Random is your Enemy.

hxxps://arstechnica.com/information-technology/2012/02/crypto-shocker-four-of-every-1000-public-keys-provide-no-security/

hildMarch 4, 2020 8:14 PM

how many people using LE even have a working DNSSEC setup already; my guess is that making that a prerequisite would reduce the usefulness of LE to nearly zero.

If you don't have DNSSEC set up, your domain will have a valid opt-out record signed by the registry. A CA should then be permitted to fall back to the web-based check, provided they store the opt-out record for the lifetime of the cert.

Are you sure you can trust DNSSEC? I have no reason to. And why has it been so slow to be adopted?

Well, it could be combined with the web-based validation and then couldn't make things any worse. But would it make sense to distrust DNSSEC while trusting DNS? With the web-based method, an unsigned DNS reponse is really all that's protecting you.

DNSSEC has solid crypto at least. Lots of the same techniques, like multiple network routes, can be used to check DNS (and detect DNSSEC discrepencies as a side-effect). Other techniques, such as pinning each TLD's keys, could be added. For OCSP-stapling purposes, the CAs should be occasionally re-checking while the cert is valid, and maybe raising alerts on unexpected key/data changes.

Beth MacknikMarch 4, 2020 11:28 PM

Your summary of the Boulder bug is missing a key element of the problem. The bug only occurred when the entire certificate creation process was not completed within 8 hours. (ArsTechnica also missed this caveat.) This is a minority of the certificates issued by Let’s Encrypt.

CAA records were properly checked at the beginning of the process, but if anything delayed the issuance by more than 8 hours, Boulder needed to check the CAA records again because its data was stale. It is this recheck code that had the bug. I’d like to see Let’s Encrypt report the number of affected certificates. And if they are able to identify the certificates that had greater than an 8 hour delay, it will be much fewer than the speculative estimates.

The certificates that were issued in error all had a this pattern: there were multiple domains on the requested certificate; the issuance process verified the domain ownership of all of the domains and CAA records allowed Let’s Encrypt to issue a certificate; something delayed the issuance by more than 8 hours; a change was made to the CAA records of one or more of the domains that denied Let’s Encrypt the authority to issue certificates for that domain; Boulder rechecked only one of the domains’ CAA records and that domain was not one that had a CAA change; a certificate was issued that would have been valid a few hours to 30 days before, but was invalid because Let’s Encrypt was no longer authorized.

This is a worrisome bug, but not an extremely dangerous one. I am curious as to how many folks were actually affected. Did Let’s Encrypt consider rechecking the CAA records for all of the certificates issued in this time period and only invalidate those that had a CAA problem?

Simon LeinenMarch 5, 2020 1:31 AM

@Beth Macknik: Thanks for the explanations, I'm also trying to understand the actual extent of the issue.

> Did Let’s Encrypt consider rechecking the CAA records for all of the certificates issued in this time period and only invalidate those that had a CAA problem?

Maybe they did, at least I have the impression that they did *not* revoke all those 3 million certificates, even though the deadline they had announced for this (2020-03-05 03:00 UTC) has now passed. I did some spot checks using https://certificatetools.com/ocsp-checker , and some unrenewed certificates/domains from the list still report as "good". Well, maybe that's not the correct way to test this, and/or I'm too impatient with this "OCSP" thing, but then why is it called *O*CSP? :-)

Clive RobinsonMarch 5, 2020 5:33 AM

@ SpaceLifeForm,

Old, but not stale.

Ahh I remember it well, it's more than half a decade now... What's the betting on,

1, It's now almost unknown.
2, The problem's got worse.

My reasoning is not just that we have a very short memory in ICTsec, but those comming into the field don't get told these things, or it goes in one ear and out the other as they don't relate it to what they are doing (or don't care as money in pocket is based on the number of ticks on the list).

But also "IoT developers" they are definately pushing stuff out as fast as they can, usually with somebody elses code. Such code is often "example code" showing how to get the minimum of functionality clearly, not securely...

And yes I'm as guilty as many others of using others code when it comes to writing "python scripts" to do odd jobs of automation etc. But I keep such code operating in a "mitigated zone" where others can not get at it.

SpaceLifeFormMarch 5, 2020 4:40 PM

Looks like 445 domains lost in a turtle race. So LE backs off to avoid disruption because they know the 445 that definitely have issues. Better than the millions that have not been renewed yet.

hxxps://arstechnica.com/information-technology/2020/03/lets-encrypt-holds-off-on-revocation-of-certificates/

hxxps://community.letsencrypt.org/t/baseline-requirements-revocation-requirement/114999/2

MeMarch 6, 2020 9:05 AM

@Beth Macknik

I also was wondering why the idea of re-checking the certs at this point rather than just invalidating them wasn't explored (or at least talked about in the summary).

It does seem that this would be a trivial thing to exploit now, but there isn't evidence of an exploit.

I'm also really curious as to why they used different code to re-check than to perform the initial check. That seems very odd.

Mike D.March 7, 2020 7:43 PM

I've got several LE certificates in service, but was not notified. Looking at the bug, it would not have resulted in an erroneous certificate being issued to my servers because, for each certificate, its CN and all its SANs are covered by the same CAA record.

That wouldn't stop someone else from trying to get an illegitimate certificate by listing their own site as CN and mine as a SAN (and they'd have to compromise my site [or DNS for a wildcard certificate] to pass the ACME verification), but then they'd get the email because presumably they didn't use mine.

Of course, if they've compromised my site to pass ACME, they could probably just grab my private key, too.

Leave a comment

Allowed HTML: <a href="URL"> • <em> <cite> <i> • <strong> <b> • <sub> <sup> • <ul> <ol> <li> • <blockquote> <pre>

Sidebar photo of Bruce Schneier by Joe MacInnis.