Podcast
Root Causes 383: Delayed Revocation Events by the Numbers


Hosted by
Tim Callan
Chief Compliance Officer
Jason Soroko
Fellow
Original broadcast date
May 2, 2024
An epidemic of delayed revocations has infected the public CA community. We track delayed revocations since the beginning of 2021, examine the trend line, and discuss root causes.
Podcast Transcript
Lightly edited for flow and brevity.
And what we wanted to do was, we wanted to do a little bit of research and really put some numbers on this. And so I want to thank my colleague, Martijn Katerbarg. Martijn is pretty well known inside of the web PKI industry. He did this research. So I don't want to take credit for Martijn’s research, but I do want to share it because it's really, really interesting. And what we have here, Jason is, what we did is we went back and we tried to track, we tried to put numbers on how has the frequency of these failure to revoke episodes, as have been publicly reported and acknowledged in the Bugzilla platform, how is that different now from kind of the normal baseline? That's what we're interested in understanding. Does that make sense?
Now we move on to 2022. Quarter one, three new bugs opened. Quarter two, zero new bugs opened. Quarter three, zero new bugs opened. Quarter four, six new bugs opened.
Now we move on to 2023. Quarter one, two new bugs opened. Quarter two, zero new bugs opened. Quarter three, four new bugs opened. Quarter four, six new bugs opened. Okay. So we see kind of, that's averaging like 24.
That led to this giant surge in bugs in Bugzilla, in general. And I think that's a valuable and important point that we may want to return to. But the first thing that happened was, there were a bunch of new, kinds of new misissuances being discovered, and they were incidents being discovered against the kind of CAs, against a group of CAs who weren't used to having incidents reported against them. And that's a valuable point, too. We're seeing names we don't usually see. These are very specific, I'll call them niche CA's. They're serving some kind of niche market, usually a geographic segment, but not necessarily and they're very specialized. And they're just not the sort of people who are undergoing daily scrutiny the way a very large CA like Sectigo or Let's Encrypt might be. Right? And so that was the first part because there's just more bugs and because there's more bugs out there, to some degree, if the failure to revoke stayed consistent, then you would expect it to grow. But there's not six times more bugs and you wouldn't expect it to grow six-fold right.
So the second piece of that is what I just got to. It's the set of CAs that got these bugs written against them. And it's not a group of CAs that is typically doing this kind of thing and I think as a consequence, we've seen a lot, not all of them, not all of them by any means. Some of them responded very well. But we've seen a lot of these CAs have dysfunctional responses to getting this kind of report. We saw another rash of bugs in people who didn't respond to their certificate report correctly because they didn't know how. Because they didn't know how, because they didn't have an established practice. They kind of weren't expecting this. They were sort of, you know, living their life out of the public eye. And they weren't really expecting to get a certificate report and when it came in, they didn't realize that got it, or they didn't know how to deal with it. So we saw a big bunch of failure to respond to certificate reports correctly. And then many of those CAs also turned around and failed to do the revocation correctly. And so that’s a valuable and important point but I don't think that's the whole story.
And I think if we just left it at that we'd be missing a very important part of the story, which is the nature of the reason that they failed to revoke on time. And what I mean by this is, there's a few reasons a CA might fail to revoke on time. They might fail to revoke on time because they have some kind of technical failure, or procedural failure where they don't get it right. They don't know what to do. They run their software, but their software has a bug, and the certs don't get revoked, or something like that. We've actually had that in the past. I remember a revocation event within my span at my job where there was a bug in our revocation engine and was supposed to revoke the certs and it didn't. And we were late because we realized it got stuck and we went and we did them manually, and we wrote ourselves up. So that kind of thing can happen. But if you look at the open. At present, there are 22 open delayed revocation bugs. So of those 22 open delayed revocation bugs, 18 of them were a deliberate decision by the CA not to revoke on time, as opposed to a mistake of a technical or procedural nature. It wasn't a CA who intended to revoke on time, and it didn't work out. These are CAs, who just plain decided to be late.
So, again, we're seeing this trend, right? We're seeing this extreme homogeneity in terms of the responses to these events to the point where we have a very typical - - More than half the time, we see a typical pattern and the typical pattern goes as follows. CA has a certificate report made against them about misissuance that they didn't self-report. CA goes and looks into it, writes up a Bugzilla bug. After the reports have been made against them determines that there is some number, a positive number of misissued certificates - and by the way, sometimes these numbers in the 10s of 1000s, Jason, just to be clear - there is a positive number of misissued certificates and then the CA decides that they will not force the revocation timeline, which in every one of these 22s is a five day revocation by the way. There's no one day revocations in this mix. So will not get it done within 120 hours, even though they technically could and then the CA turns around and their justification for that is that this does not have any security impact. And then again, the common thread that you see throughout this is they will turn around and say the consequences to the ecosystem of doing this revocation before these certificates are replaced, are greater than the consequences of sticking with the misissued certificates. So this is a very clear pattern.
So one of the things that's been discussed a lot at the public level is that traditionally, browsers have taken one of two actions, which is they continue to trust you, or they distrust you. So it's the equivalent of the death penalty. And the problem is if the only punitive measure you have is the death penalty, and if you don't want to be some kind of genocidal sociopath, then all kinds of crimes go unpunished, right. And if you live in a sensible society that is not going to punish you for running a stop sign, then at that point, there's no consequence to running stop signs. And as a result, a lot of people start running stop signs. Right? And this is what you're seeing now. You're seeing people looking here and they're saying, you know, I don't think I'm gonna get distrusted over this one incident and there's no percentage, there's no upside, in revoking the certs on time and pissing off my customers, so I'm just not gonna because I will get away with it. And sadly, unless something changes they will. And so there's, again, been a lot of public dialogue to say, perhaps what we need is more granularity in the response. Perhaps we need a way for the misissued certificates that are not being revoked on time for the damage of those certificates to be mitigated inside of the ecosystem on a decision that can be taken on the browser side without requiring the revocation from the CA and at the same time, this might have a demotivating effect on CAs for future misbehavior. If that makes sense.
So in other words, misissuance is inevitable, and it is somewhat constant. And so therefore, with that in mind, the pain of mass revocation, which was the whole point of Episode 380, you can mitigate it right now.
And I hope that that's what the future is is, you know, regardless of how this all falls out with the browser trust of the different CAs that are playing these games, I think that the end customer of public trusted certificates, needs to employ certificate lifecycle management, employ automation, and embrace shorter certificate lifespans.

