Podcast
Root Causes 465: Twelve Bugzilla Sins for CAs to Avoid


Hosted by
Tim Callan
Chief Compliance Officer
Jason Soroko
Fellow
Original broadcast date
February 7, 2025
In the wake of the Bugzilla Bloodbath, we list and describe twelve sins CAs commit on Bugzilla and its like, why they're detrimental, and how CAs should avoid them.
Podcast Transcript
Lightly edited for flow and brevity.
Here I have a list. So I have 12 things. CAs avoid these 12 things in your Bugzilla incidents. Here's what I was thinking, Jay. Let's go down the list and define them. Then let's go back down them and talk about why they're a problem and how to rectify that.
So there are a number of sets of governing rules that public CAs need to follow, and one of them is that all root programs have some kind of root program requirements. One of the big set of root program requirements that we really care about is Mozilla. They were early on. They were kind of the first browser with a real codified, public set of root program requirements and Mozilla online properties that encourage online dialog are fundamentally open, and there's an expectation that everything is done in the public because it is one of the world's largest open source projects. So, as a result, if you want to be in the Mozilla root store, you're expected to be very transparent and very public in your dealings with what you do as a public CA. So some of these tools that I just rattled off, there's probably three that are important to us here.
One of them is called Bugzilla, and this is the Mozilla bug base. Every time a CA has an incident that meets certain criteria, or it qualifies as an incident, that CA has to open an incident report, respond to public questions, follow a certain set of codified behaviors, which include fixing the problem and demonstrating that the problem should be expected not to occur again, and then at the end of that, you close out the incident. All of this is done in public, and anybody with a browser can come and create an account and ask questions and challenge assumptions and do things along those lines.
There are a couple other ones that are noteworthy. One of them is called MDSP, which is Mozilla's online meeting message board. So for things for the Mozilla root program, and Mozilla as a public CA browser, that's where we go to have discussions that aren't CA incidents. Then there's another one called CCADB, which is the exact same thing, but instead of being focused on Firefox specifically, it's focused on the whole body of the CCADB participating browsers, which is Firefox, Chrome, Apple. If you have a topic that's bigger than just Firefox, you bring it to CCADB. If you have a topic that's specific to Firefox, you bring it to MDSP, and if it is a misissuance or another CA operational error, you bring it to Bugzilla.
So I sort of said avoid these 12 things in your Bugzilla incidents, just because Bugzilla is the most frequently used of these. But, this, what we're talking about here, applies to all of them. That's a good edit. So why don't we run down the list? We'll say what it is, and then we'll come back, and we'll comment on them. And again, I've got thoughts on why this is bad and what to do about it.
I'll say a couple of other things before we get going. Just by the nature of a list like this, there's a certain amount of overlapping. So if we look at any individual episode of bad behavior, there's a very good chance that it ticks more than one of these boxes. But by having these boxes here, we've got a real useful codification of what to do and not to do. This is my first glance at it. Feel free to go on LinkedIn and tell me what I missed, and maybe we'll update this over time. But this is where we're sitting right now.
The second thing I'm going to say is, of these 12 items, I try to kind of batch them. So the first three are close to each other. And I would say, like, you shouldn't do any of these things. But the first three are really unforgivable because they're fundamentally dishonest in their dealings with the WebPKI, and the last three are really unforgivable. And then the ones in the middle are you still shouldn't do them.
Then the last point I'm going to make is that I have seen all of these things occur multiple times in the last year.
So number one on the list - obfuscation. People trying to look like they are telling everything, but really not tell everything. Carefully crafted things, weasel wordy stuff, partial answers to questions. If there's four questions in a post, maybe you answer one, two and four, and just kind of forget about three. I see a lot of this sort of stuff going on.
So, if you're committing integrity problems, this is bad. Your integrity. Without your integrity, you can't be a steward of the WebPKI. So obfuscation, that's the first one. Trying to hide stuff.
Second one - obstruction. Trying to prevent the process from happening. And this takes a variety of forms, like refusing to answer questions or deliberately answering questions that were not the questions that was asked, hiding behind excuses. Things like, well, we have customer NDA, so we won't discuss that. Or my legal team says, I can't discuss that. Silly things that are fundamentally obstructionist.
Then the third one, I think this is connected to these first two, is letting negative emotions take the driver's seat. And this happens. It’s kind of crazy, but you can look at some of these posts from CAs, and it comes across as defensive or angry or whiny, and that's not the point. If there is an error in your operation, that's a fact, and if that's upsetting to you that other people understand what facts are, you got to get over that. You're a public CA. This is our jobs. We're grown ups. And in particular - I made a list of things. So here's some specific things I noticed.
First of all, being insulting to other community members, and I just, for the second one for B, I wrote churlishness. Just general churlishness. Just nastiness. And then third one is pettiness. I'll give you an example of pettiness. I'm not going to name any names, by the way, on this episode, but there's an ongoing set of incidents right now, and we've got a CA that is entrenched in a position about a major browser’s stated policy, and this is as indefensible as positions come. There's plain language, and their stated policy is just plain language, but this CA does not want to admit that they were wrong. So what they're claiming is that this browser's policy has changed between when the bug was written up, almost a year ago, and today, as a way to get out of admitting they were wrong while also closing the bug. They're saying, well, they changed their policy, so moving forward we're going to do it the way they want. No, they always did it that way. You just screwed up. The only reason I can come up with that they don't want to do it is they just don't want to publicly say that they got something wrong. It's just petty.
So all of this stuff looks bad. These first three, these are a bad look. And the reason these are unforgivable, CAs, is just because we have to question your commitment to the process when you do these things. When you do these things, I don't believe that this is somebody who is really openly trying to improve.
Number four - lateness. Lateness. So there's a few expectations where things are supposed to happen in specific time frames. When you find out you have a misissuance error, you're supposed to report it within 72 hours. Or a Bugzilla bug, which isn't always misissuance, you're supposed to report it within 72 hours. Not 73 hours. 72 hours. If there are questions on Bugzilla, you need to answer them within one week. Exactly seven days, not eight days. And also, you need to maintain a weekly cadence of some kind of update. When an incident occurs, you have two weeks to have an incident report and if other parts need to be filled in, but you need a complete incident report to the full extent of your knowledge within two weeks. And then the last one is if you have what we call a next update date, which is where Mozilla can set a date and say, look, we don't need to hear from you until this day. Like, let's say, I say, look, it's all done, but we got an engineering project. We want to fix this one thing. It's going to take 45 days, Mozilla. Mozilla will pick a date in the future. They say, okay, I'll set an update. Next update here. When you have a next update date, you have to update by that date, even if it's to say we're not done yet. So these are well understood. They're well codified. There's no ambiguity on any of this. And we just routinely see people just miss these deadlines.
Number five - improper markdown, which can include no markdown at all. Now, these are text-based bulletin board systems, and sometimes we're putting up big, complicated messages. And there is a set of markdown, very common, very HTML-ish that you can put in that will give you bolds and change font sizes and bullets and all of that good stuff. And there are published rules around how the markdown works so you can go read, so you can see exactly what your options and mark down. There’s a preview mode so you can look and see what it actually is gonna look like before you commit to it. Then, for certain actions, like an incident report, like a closing incident, there is a specific format that they want you to use, including the markdown, so that readers can navigate it. We just frequently see people either just do terrible, horrible markdown, like, what were you thinking, or none at all. Maybe that's a little bit of a judgment call if it's in your bug, and you're just supposed to use the markdown to help people navigate it, but in these formatted, prescribed, formatted incident episodes, like a preliminary incident report or a final incident report or a bug closing statement, it's clear that the markdown is codified, and if you don't use it, you’re just making it harder for everybody. That might go back to obstructionism or obfuscation. That's a way to obfuscate.
So again, some of that could be a bad attitude, like having negative emotions in the driver's seat. Some of that could be obfuscation, or it could be the next one on the list, number seven, which is failure to understand the expectations of a public CA. And we might put this on the unforgivable list, too. But kind of the poster child for this one was last year, one of the two CAs that was distrusted last year, eCommerce Monitoring in I'm going to say the Spring. Part of the incident report is a section called root causes. You like that, Jay. And what do you think the section called root causes is supposed to do?
Then one of the things that they do - that was number seven - number eight - one of the things that the browsers do expect, and that CCADB does expect is they expect CAs to follow and extrapolate from all Bugzilla bugs. As a CA you're supposed to follow the resources I said before - Bugzilla, MDSP, CCADB, and not just the stuff that's about you because the idea is if CA-A has an error, CA-B is supposed to go look at their own operations and say, oh, we're making the same mistake. Let's go fix it. And one of the things that browsers get really grouchy about, and commentators on the public CA list get very grouchy about is when the same mistake gets repeated, and they get extra special grouchy when the same mistake gets repeated by the same CA. And so not only do we see CAs that aren't learning from each other's bugs, we see CAs who aren't learning from their own bugs, who do the same thing again. And I've got open bugs right now that I'm monitoring where that's going , and it sort of undermines the purpose of the whole thing.
Number nine – see this a lot - shallow root cause analysis. You’ve got a software error, and it causes a certificate report to get lost, and it doesn't get dealt with. And they come back, and they say, well, there was a software error that caused the significant report. Bugs been fixed. That's my action item. Root cause analysis. I'm all done. And then the browsers will come back and say, no, do you have QA? Do you have like, there was some other flaw. Was this QA’d? Maybe your test beds weren't designed correctly. Maybe your initial architecture wasn't correct. Like, how come this happened in the first place? What's going on in your process that allowed that software error to exist? And you got to go fix that. So what they're looking for is a deeper root cause analysis. What they don't want to do is do a bunch of whack a mole. They want to have a deeper root cause analysis that will cause meaningful quality improvement in the CA, where that we know there's still going to be bugs, but we want those bugs to happen for new, undiscovered reasons, not the reasons we knew about but didn't do anything about. So shallow root cause analysis you see a lot.
Now that last three – 10, 11, and 12, once again, I'm back into unforgivable. And I think you'll see why.
Number 10 - lies and cover ups. So I can't prove it, but I follow Bugzilla very closely. We have resources. We mine CT logs. We look at other things like this. I am quite confident in 2024, that I'm aware of four outright lies that we're told on these public forums by public CAs. Where I just know they were lying. Or cover ups, where I don't think anyone's gonna find out I did this bad thing, so I'm just gonna not say it. And that's an integrity problem. So we're not getting into the lies. Lies and cover ups. It's a real thing.
Number 11 - refusal to admit your errors. I was just talking about that as an earlier example. They just won't admit it was an error. It's an error. Everybody knows it's an error. It's there in black and white. It's proven factually that was an error and you got a CA that just won't admit it. Just won't. Keeps saying no, keeps saying no, keeps saying no. This is part of what got Entrust distrusted. Won’t admit it's an error, and then directly connected to that number 12 - refusal of change. That was also part of what got Entrust distrusted. Just nope. We're not going to change. We're just not going to change. We won't change. So that's the 12.
So obfuscation, obstruction. Let's put those together. So why is this a problem? Why is it a problem?
Putting negative emotions in the driver's seat. Why is that bad?
Lateness. We kind of got into that one. The whole point, the reason we have these rules around these cadences is because things got to get forward. Things got to get solved.
Improper markdown. We kind of got into the problem with it. The problem with improper markdown is it makes it harder for the community to navigate and work with these bugs. Big walls of text are just hard to deal with. And perhaps it is deliberate obfuscation, and deliberate obstruction and I do think sometimes that goes on. Perhaps it's just laziness or lack of understanding of how to use the tools, or lack of understanding of the expectations. But whatever those two reasons are, at the end of the day, they have the effect of impairing the public dialog and the public understanding and the learning from the whole community, which is the biggest point of Bugzilla, not the only point, but the biggest point of Bugzilla and MDSP and CCADB, is that ability for the community to learn.
Failure to follow and extrapolate from all Bugzilla bugs. We kind of discussed this. That the point is that if another CA is having an error, I may be having the same error, and I didn't know, and when I read this bug, it's a chance for me to go fix my error. Maybe that means that I get lucky, and I fix it before I have a misissuance incident. I say, oh, good, that would have got me too, but now I can fix it. If I try to fix it, but I fix it wrong, and I have an incident later, then the community will be forgiving of that. But if I just didn't know about it because I wasn't reading other bugs or asking how do those apply to me, then this multiplying effect that we're looking for that these tools provide is lost. Shallow root cause analysis. What's the problem with that?
Lies and coverups. Where do we begin? You are a steward of the public trust. You are one of approximately 50 organizations on the globe who has been given the opportunity to vouch for public identity, and you're going to tell lies in public? Really? This is horrible. This is like being a crooked cop. This is like being a judge who takes bribes. It is the opposite of what you're here to be.
Refusal to admit your errors.
So Tim, I’d like to talk about being the best CA that you can be. We've both been around a long time in this industry, and we've known some of the personalities that are behind some of the human side of this because if it was truly an automated process - -
I think, Tim, you reading off this list today and explaining it very well is useful for people who are in compliance programs of other Certificate Authorities.
I'm going to suggest that perhaps it's time to organize your company as a CA so that there's an adult in the room who is overseeing this process for the benefit of your shareholders, other people who have a stake.

