Podcast
Root Causes 431: New Mozilla Proposal to Combat Delayed Revocation


Hosted by
Tim Callan
Chief Compliance Officer
Jason Soroko
Fellow
Original broadcast date
October 11, 2024
Deliberate delay of mandatory revocations has plagued the WebPKI in 2024. A new proposed policy from Mozilla stands to eliminate most of this behavior. In this episode we go over the proposal and explain its potential consequences.
Podcast Transcript
We've had a number of episodes just on this, so I don't want to belabor that too much, but this has been an ongoing problem that did not appear to be resolving itself in any meaningful way. Even when Entrust was distrusted, we've still seen delayed revocations occur subsequent to that. Like major delayed revocations. So this has been a big problem.
And so with that background, there was a message that occurred on September 19 on what we call MDSP. And MDSP, that's an archaic name. It made sense once upon a time. It's the Google group that Mozilla uses to discuss root program issues. And on MDSP, there was a message from Ben Wilson, who is in charge of the root program over there at Mozilla. And it's very interesting, and I would say very important. It's not super long, and I would just like to read the whole thing to the listeners. Sounds good?
Hi, folks. We have discussed delayed revocation a bit internally, and wanted to come back to the community with some thoughts. We agree that expanding beyond the existing revocation timelines, (24 hours/5 days) is undesirable. While we think some exceptional delayed revocations are necessary as a current practicality, we do want to eventually sunset this policy. To that end, we'd like to refine our existing policy so that it is more effective and equitable during the interim. There's pin number one.
We're skeptical about proposals. To pre-identify domains that will require delay revocation. Pin number two.
We expect that many sites might ask for such exceptions, and an extensive amount of deliberation would be required in order to process these requests. Worse, in practice, doubtless some sites impacted in a revocation event would not have followed the procedure, and CAs will still be left with a last minute decision about whether a revocation will inflict substantial harm. Instead, we would like to seek the community's feedback on the following two proposals.
Number one, clarification of existing requirements. We would be more explicit about what would be required for delayed revocation. Some ideas include, and then there are six bullets here. We'll go through all six.
1. A explicitly clarifying that CAs must revoke certificates by default, that any delayed revocation must be the result of an explicit request by the subscriber containing the necessary information and meeting the requirements under such interim policy.
2. That subscriber requests contain a clear claim or explanation about the critical nature of the system and why timely revocation is not possible. (More detailed requirements to be discussed.)
3. That the requests are signed by a company officer or similar legal representative, stating that the information in the request is accurate to the best of their knowledge.
4. That the information contained in the subscribers request be accurate to the CA's understanding (e.g. not materially contradicted by other facts known to the CA)
5. That each granted request be published for the community and Mozilla to scrutinize (allowing CAs to redact PII prior to publication.) and finally,
6. That CAs be required to produce summary statistics in their reports alongside the individual granted requests, detailing how many requests were received, how many were well formed, how many were granted, etc.
That was part one. Here's part two.
Consequences of delayed revocation. This one doesn't have bullets. It just has a couple paragraphs.
We believe that if a domain hosts critical infrastructure that cannot tolerate timely revocation, then it is deeply damaging to the WebPKI. In order to help these domains transition to effective certificate management practices and automated tooling, we propose that domains that are granted delayed revocation must then be limited to shorter lifetime certificates as a consequence of such decision. This also ensures that future revocations impacting such domains have much less impact. Concretely, the domains accepted for delayed revocation by a CA are already public. If this proposal were to be adopted, root programs would manage a shared list of such domains, e.g. via the CCADB - We had an episode on that in the past - and require CAs to limit the lifetimes of certificates issued to these domains. (e.g. to 90 days.)
Okay, now he's wrapping up.
We stress that both of these proposals are presented for early feedback, and we look forward to the community's thoughts on whether they are likely to be suitable and effective and how they might be improved. We would also look to align these policies across the root programs in order to provide clarity for the entire community. Thanks and best wishes. Ben.
Okay, that's all of the words of the email.
And if you are granted - because there's definitely a note saying there are legitimate reasons for delayed revocation but if you are going to, if you're on that exceptions list, you're going to have to use the 90 day cert. So I mean that, I think that's really what it comes down to, is that shift of responsibility and that it - -
Why don't we drill down a little on some of these? If we go back to the top. We agree that expanding beyond the existing revocation timelines is undesirable.
So there has been another thread that's been running for about four years to say is a 120 hour revocation window really the right revocation window for certain, let's call them clerical problems with certificates? And those could be very explicitly defined. And I am of the opinion that the WebPKI would be better off if there were three different bands of revocation based on the nature of the problem, rather than two, and that there's room for a band of revocation for something that is deemed to be much more trivial. And if you use analogies from other sections - like the common practice with security vulnerabilities is that critical vulnerabilities have to be patched within a few days, high priority vulnerabilities have to be patched within a month. Those are still high priority vulnerabilities, and they get patched within a month. So it is physically impossible for there to be a certificate misissuance that could tolerate more than 120 hours? I don't buy it. So there was a talk around having a third tranche.
What Ben is saying here, which echoes things we've heard from other browsers, is that's off the table right now. Stop talking about it. We're not even open to considering it at the moment. So that's basically what the first paragraph says. And if you didn't make sense, that's why that's there.
The second paragraph is there were proposals floating around that people would build these pre-approved lists of domains or certificates that did not warrant - that got an exception if a mandatory revocation event were to occur. The second paragraph saying we are skeptical about proposals to pre-identify domains that will etc. That is basically shooting that down, saying that's not really going to work. And I completely agree it's not going to work for the same reasons that Ben said. We don't need to repeat those.
Then we get to the proposal itself. So let's break down clarifications of existing requirements, because I think this is really interesting.
A - A is just explicitly clarifying that CAs have to revoke on time and that any delayed revocation must be the result of an explicit request by the subscriber.
So what we had was, we had this practice of CAs just granting delayed revocation to huge tranches of misissued certs or entire misissuance events. And we're talking about things that could be in the 10s of 1000s, hundreds of 1000s, where they just didn't they just decided we're going to give everybody a delay. And this is explicitly making that illegal, which granted, it was illegal anyway, but everybody seemed to be getting away with it. And I think this language is to clarify that you're not going to get away with it anymore.
The next one, B, says, what you said. So the subscriber must place a request, it must have a clear claim or explanation about, number one, the critical nature of the systems. And number two, why timely revocation is not possible. This is extremely important, because right now we got these vague, hand wavy, oh, babies are going to die. Wave your hands. Vague, vague, vague. And this is where now the subscriber has to come back and say, this is very specifically the harm that will occur. And it's better than the harm to the WebPKI by not revoking misissued certs, and this is the reason we can't take care of it.
And I think that's important just because it requires a lot more specificity. It requires a lot more work on the part of the subscriber. I've been contending, Jay, that a lot of this is laziness on the subscriber’s part. Like, why? I don't really want to do this. It sounds like it's miserable, so I'm just not gonna, and this puts more work back on them. They've got to prove that it really makes sense to do, and they got to be explicit, and their name is going to be on it. So if you're a major financial institution and you can't deal with a certificate event, then you kind of deserve to be publicly shamed for that. Or if there are lives at risk and you can't deal with basic IT operations like changing certificates, and people are going to die if you have to change certificates, then the public deserves to know that. So that's what occurs there.
C. And then connected, C, that the requests are signed by a company officer or similar legal representative. So there's got to be the company's official position. Company is making this position officially and living with the consequences of that.
D. That the information contained in the subscriber’s request be accurate to the CA’s understanding. Fine.
E. That each granted request be published for the community to scrutinize. So this is the visibility. The point is, no, you're gonna have to say - and note this is on a domain by domain basis. So they can't turn around and say all the domains that belong to Tim's bank. Tim's bank is just getting an exception. They've got to list the individual domains at Tim's bank that they're going to give this exception to, and they've got to have a justification for each of them individually because Tim's bank isn't a big homogenous mess of certs. Different systems do different things, and maybe some of them are more critical than others, maybe some of them are less agile than others. So they're forcing us to, again, he's taking a lot of the laziness out of this, which I think is great, and the thoughtlessness and that, F, CAs be required to produce statistics.
So like you said, the first part is not only because the subscribers have to make this request, but also that it has to be really granular to the level of a domain by domain basis, which is very important.
And then number two, we believe that if a document hosts critical infrastructure that cannot tolerate timely revocation, it's deeply damaged the WebPKI, and therefore we would reduce it. Now the 90 day is, e.g., it says to limit the lifetimes of certificates issued to these domains, (e.g. to 90 days). So it wouldn't have to be 90 days, I suppose, according to this proposal. In the responses so far as of a couple days ago, which is the last time I looked, I did not see anybody proposing that the timeline be different. So the timeline, it feels like people are taking 90 days as the timeline. And that makes sense, because there's been so much focus on it because of what Chrome has said and done, that you would think they would pick 90 days. It'd be weird to pick something else. And so a couple other subtleties about number two that I'll point out.
Number one, this isn't limited to the CA that granted the exception. That domain would be limited to 90 days for any CA. And I think this is very important, because you can't just go shop down the street. If you get this exception because you don't want to deal with it, you can't just turn around and buy certs from somebody else. You're done. You're at 90 days everywhere.
The other thing I'll point out is there's no end date. He doesn't say we're going to reduce you to 90 days for a year, or we're going to reduce you to 90 days until you prove you have the agility in place. We're just plain going to reduce you to 90 days. Like until the sun goes supernova. That's it. We're at 90 days. It's done. And so I think that's important as well, for two reasons. One of which is it shows, again, the root cause if, these systems are so critical and so bad that we can't do a mandatory revocation, then they just need to be on 90 days. We're not going to change. It's not going to change in six months. The other point about it, though, is that I feel like this is written by somebody with an awareness of the fact that it's not going to be so long before everybody is down to that time period anyway.
I would say, I think when you and I bring up the term certificate lifecycle management, when you've brought up the term certificate agility, and go back to Root Causes Episode 117, on that topic - -
It’s going to be all the trust programs. And I think if you're a CA, and also not playing ball with this, everybody's in on this. Everybody will have to be in on this.

