Podcast
Root Causes 370: Drama on Bugzilla


Hosted by
Tim Callan
Chief Compliance Officer
Jason Soroko
Fellow
Original broadcast date
March 19, 2024
An evolving incident on Bugzilla has garnered a lot of attention and touches several important issues in the WebPKI ecosystem. We report what went on and unpack the issues involved.
Podcast Transcript
Lightly edited for flow and brevity.
So you know, Bugzilla is where people record these incidents and these incidents are really for purposes of the public CA community maintaining quality and getting better. And it's important to understand that there are a lot of extremely specific rules for CAs to follow. Those rules are evolving over time and that reported incidents in Bugzilla are very common things. There are new ones every week, and CAs self-report incidents or someone else discovers an incident and writes it up and it gets dealt with on Bugzilla. And this is common. It's normal. It's not a big deal for CAs to have incidents. There are many incidents that are open all at once at any given time and many CAs will have multiple incidents per year. If they are any kind of decent volume CA they'll have multiple incidents per year. This is just part of being a public CA. Part of being part of the web PKI is you work in Bugzilla and you deal with this stuff. So that's the first thing to understand.
And so this particular incident was posted by Entrust. Right there in the name and you know a little under two weeks ago as of the time of recording, it was opened on March 6, 2024, and it starts with an incident report, which is how these things often start, which is a specific, codified format that an incident report is supposed to follow with different things like a timeline, affected certs, lessons learned, etc. And these things are there in the incident and you're supposed to go through it, and it's published and the actual incident, the actual thing that occurs is very banal. Um, and what it is, is, there's - - in Extended Validation certificates - - according to this incident report, in Extended Validation certificates, there has been a failure to include a specific reference. It's a reference to the CPS and that's something that is required, and it's not there. And, in particular, it's required in the EV guidelines. The EV guidelines say that EV certs must do this. There is no equivalent of that in the standard baseline requirements. So there's no requirement to do that for an OV cert. There is a requirement to do that for an EV cert. And so that's the bug. And it's unambiguously a bug. And it's definitely not the way that the guidelines say and all of that starts out very clear.
And this bug report gets written up and it sits there and for almost a week, nothing happens with it. There's no response. There's nothing in the community, anything. And then finally, six days later, a couple comments show up and the first one says are you going to be including - - they're basically asking for things that are supposed to be in a complete report that aren't there. By the way, this also is common. Sometimes all the information isn't available, CA puts up a report and they augment it later, and they augment it later and that's also considered to be normal and acceptable.
So in this case, these questions come when it's almost a week later saying, hey, do you plan on including this stuff? It is certificate data for the mis- issued certs. It is number of certs mis-issued. Things along these lines. And then it follows pretty shortly with a question from another observer, which basically says, hey, do you guys ever stop mis-issuance? And so one of the things to understand here is there's an expectation that when the CA is mis-issuing certificates, what they're supposed to do is they're supposed to stop and fix the problem and then continue issuing certificates correctly. You're not supposed to just keep pouring non- compliance certificates out into the world. So these questions start coming – did you stop issuance? And hey, do you know that you're still actively issuing mis-issued certificates to this day? This is now almost a week after the original bug post.
And then where things start to heat up is there's a response from the CA, a response from Entrust, which basically says we have not stopped the issuance, and we are not going to revoke the affected certificates and the reason for that is that there's a conflict between the BRs and the EVGs. And the gist of this is, I said that the EV guidelines say that this particular field, this reference to the certificate practices statement is required, right. But the BRs have a line in them that says that it is not recommended that you include the same reference. So you could argue in that way that the two of them are in conflict with each other. Does that make sense, Jason?
One is the idea that this rationale actually holds water. The other thing the community wants to push back on is that this idea that the mis-issuance is still going on a week later, essentially, right? That a week passes and it's still going on, and that there's been a declaration that these certificates are not going to be revoked, even though they're mis-issued. And this picks up a lot of heat and we get a lot of people weighing in. As I said, there's something like 30 comments on this bug. A lot of bugs might live their whole life with three comments, right? There's something like 30 comments on this blog. It picks up a lot of heat in a very short time period. And everybody was very professional but, you know, it feels like people who feel very passionately about the viewpoint that this is not what a public CA should do.
And so, at this point, the narrative changes a little, which is that rather than arguing whether or not it's mis-issuance, Entrust shifts gears to say that we don't think it's in the best interest of the web PKI to revoke these certs. That the damage will be more than the benefit. And again, this is a debate we can have, right? Maybe it should. Maybe it shouldn't. But at this point, this debate is going strong. And it keeps right on going. And the response that essentially is well number one, you don't - - This feels very convenient, right, that because it's inconvenient for you and your customers, you don't want to revoke the certificates. And on the other hand, when it's somebody else, you have a different perspective, and you think you should want to revoke the certificates. And then, you know, another point, again, is that people keep saying, but look, the number of mis-issued certs is growing. You still haven't fixed the problem.
And so, you know, a little bit of side activity that's going on adjacent to this, that probably matters, or parallel to this I should say, number one is that Entrust very quickly introduces a potential new ballot for discussion about changing the wording of EV guidelines to match what the Baseline Requirements say, which is fine and that's a good thing to do. But it's being held up as a remedy for this problem. And of course, then the community comes back and says, wait a minute, hold on. You changing the rules in the future, doesn't change the fact that something was non- compliant in the past, right? Time only goes one direction. So you can't go and change the regulation tomorrow and say, okay, the fact that I was non-compliant yesterday is now erased. That is not how it works. You were still non-compliant yesterday, and a cert that was issued yesterday was still non-compliant and needs to be dealt with. And, you know, this isn't nitpicking. Rules change all the time and it's important to understand that the rules that are enforced when the cert was issued, are the rules that were enforced when the cert was issued. So that's not actually a Picayune point. It's a valid and important point.
Another thing that goes on is one of these commentators who's very active on this thread actually goes off and creates his own blog. The very first entry of the blog is a description of what's going on here on this particular thread. So you can see again, people feel passionately here and there's no real progress being made. And this goes on for almost two weeks. Mozilla itself weighs in with what its expectations are. And then finally, yesterday, the 18th of March, we get a message from the leader of the Google Chromium project that is very long and has a number of detailed comments about what Chromium’s expectations are and has a number of very detailed questions. And so at this point - - then that night, there is now an announcement from Entrust that says, and I'm gonna read this verbatim:
“We have stopped issuing mis-issued certificates and fixed the EV certificate profile. All impacted customers will be advised that their certificates will be revoked. We will create a delayed revocation bug and will follow up on other questions in the next few days.”
So like two weeks after the bug was created, and one week after this firestorm blows up, we get this declaration from Entrust and that more or less is where we stand right now.
The scope of this is certainly part of - - surely part of the factor. So Entrust reports that there's something north of 24,000, in the ballpark of 25,000-ish certs and the number is moving around a little - It's not quite nailed down but it's in that ballpark - that are affected. That basically are mis-issued according to this incident. And they're all Extended Validation certificates and there's an implication that isn't made for sure that many or most of them are in the hands of large enterprises. And this could be part of the trouble. And some of the dialogue here is around this is a large number of certs, these enterprises can't swap out the certs in the time that they have to do so. And not doing so is disruptive to the relying parties that ultimately depend on these services. And it's bad. It's bad for consumers. Not good for consumers.
And so that's also part of this dialogue, because then again, the commentators online come back and say, well, wait a minute. What do you mean you can't revoke these certificates? As a public CA, you may have to. Right? What if there is a giant private key compromise or a zero day or at the equivalent of Heartbleed, or the equivalent of some of these other things that have gone on? What if you do need to revoke hundreds of 1000s or millions of certificates? What if - - and these enterprises, right? To say that, oh, well, these enterprises, they can't swap things out. They're not able to get it done in five days. The response to that is, well, you know what, this is a public facing PKI that governs the crown jewels that your organization has. Your large bank or your large enterprise. You better be able to swap your certs out. If you can't swap your certs out, that's a pretty serious problem. So that's in the dialogue as well and that's part of what's going on here, too.
The need to be agile for cryptographic reasons, cryptographic agility. It all basically points down to me in my mind. Look, you said at the very top of this podcast, mis-issuances happened. There's all kinds of things on Mozilla that are discussed all the time. This isn't it, you know, we're talking about the story and how the response to this particular Bugzilla event went. But I think for those of you who are just thinking, what are the implications is here, I think there's maybe two, at least, that come to mind.
And then number two, Tim, just because it's staring me right in the face. I'm assuming these are mostly one year certificates?
Now, you're also seeing here someone arguing on the other side, which is to say, well, this is difficult for these companies to do. And, you know, one of the commentators threw up - - this is just a commentators list. So I'm not gonna - - I didn't validate this myself, but this is what the person put on Bugzilla by looking at the CT logs, and I'm just gonna rattle off some of the names on this list. JP Morgan Chase, Delta Airlines, Bank of America, Tesco, Fidelity Investments, American Airlines, Westpac, Banking Corporation, ING Group, Experian, Price Waterhouse Coopers, Toronto Dominion Bank, M&T Bank, Citizens Financial Group, and it goes on. There’s a lot more. And so, you know, these are large organizations with a lot at stake and these are some of the ones who have affected EV certs. Now, this doesn't necessarily mean that these organizations that are listed here are incapable of doing a rapid swap out but there is a representation being made by the CA that their enterprise customers on the whole, their EV customers on the whole, are not going to be able to deal with this.
So if that applies to the people I just rattled off, then you've got to say, you know, wait a minute, guys, you people in the enterprise like, this is risk. This is vulnerability. Its risk of outage. Its risk of essential services not working, and we see that here, because this kind of thing happens. In this case, all these guys are going to be revoked. Right? The final declaration that came out Monday night is that all of these certificates, 24,000+, are going to be revoked in a five day time period and that means that these organizations, if they're not able to deal with that, they're going to have a bad day.
So this lack of certificate agility is definitely a theme that comes up here. There's this theme about doing things like in a certain way and in the right order and then, you know, it's also illustrative I think of the rules around reporting and discussing these episodes, and following what happens when those rules don't get completely filled in, right, because there were these bits of information that were missing and that became part of the story to hear.
So part of the thing to emphasize is the actual error, I think, is very understandable. And CA's make errors, and this is an error where you see why it occurred. Somebody looked at, you know, some guidelines in the BRs, they didn't connect them back to what the EVG said and that resulted in an error. That's kind of a mundane story. That's not an interesting story. It was about how this bug was reported and discussed and dealt with and the timeline around dealing with it that made this bug particularly interesting and illustrative and, dare I say, a little bit dramatic. And, it's still not done. There's still dialogue. The bug is not closed. We still don't have an exact list of affected certificates and so this might not even be entirely over.

