Podcast

Root Causes 431: New Mozilla Proposal to Combat Delayed Revocation

Hosted by
Tim Callan
Chief Compliance Officer
Original broadcast date
October 11, 2024

Deliberate delay of mandatory revocations has plagued the WebPKI in 2024. A new proposed policy from Mozilla stands to eliminate most of this behavior. In this episode we go over the proposal and explain its potential consequences.

Podcast Transcript

Tim CallanTim CallanThere's a number of things that we've been talking a lot about recently that kind of come into our topic today, the main one of which is revocation delays - revocation and delayed revocation, and what I'll call deliberate delayed revocation, which is where the CA can revoke certs technically, but chooses not to. And there's been a very interesting and I think optimistic development in this area, but before we get into that, maybe let's take a minute or two, and just sort of ground everybody and frame everybody, in case you didn't get all those other episodes. Sound good.
Jason SorokoJason SorokoThat sounds great, Tim.
Tim CallanTim CallanSo if you're a public CA, and you have public certs, then there are various forms of what we call misissuance, which is there are certs that, for some reason or another, do not conform to the rules around being a public CA. And some of that misissuance shows up in the actual certs themselves. Some of it shows up in the process that generated the certs. And in any case, when this occurs, a certificate must be revoked. We had a whole episode on this just a few weeks ago where we talked about the different certificate types and the rules. But for the most part, for leaf certs, they have to be revoked within either 24 hours or 120 hours. And it's usually within 120 hours. Sometimes it's 24. And I would say this is a problem that's been going on for years and years and years. But in particular, in the first half of this year, first nine months of this year, we saw a huge problem with CAs deliberately delaying revocation, with the flimsiest of excuses. And we had a bunch of episodes about that. But basically the basic excuse would be that these certificates are critical. If they're taken down, then critical services won't be available that society depends on, and there's no way that the subscribers can get these certs swapped out in time, which is a very problematic stance to take, because sometimes you have no choice but to swap out your certs. Sometimes your certs just are insecure. And if these subscribers are incapable of doing them when they're insecure, then it's a problem. And if, on the other hand, they're perfectly capable of doing them when they're insecure, then they should be capable of doing them now too. So it wound up being a very problematic position.
We've had a number of episodes just on this, so I don't want to belabor that too much, but this has been an ongoing problem that did not appear to be resolving itself in any meaningful way. Even when Entrust was distrusted, we've still seen delayed revocations occur subsequent to that. Like major delayed revocations. So this has been a big problem.
And so with that background, there was a message that occurred on September 19 on what we call MDSP. And MDSP, that's an archaic name. It made sense once upon a time. It's the Google group that Mozilla uses to discuss root program issues. And on MDSP, there was a message from Ben Wilson, who is in charge of the root program over there at Mozilla. And it's very interesting, and I would say very important. It's not super long, and I would just like to read the whole thing to the listeners. Sounds good?
Jason SorokoJason SorokoLet’s do it, Tim.
Tim CallanTim CallanHi folks. I may put pins in a few things on the way. Let's read it, and then we can go back and unpack it.
Hi, folks. We have discussed delayed revocation a bit internally, and wanted to come back to the community with some thoughts. We agree that expanding beyond the existing revocation timelines, (24 hours/5 days) is undesirable. While we think some exceptional delayed revocations are necessary as a current practicality, we do want to eventually sunset this policy. To that end, we'd like to refine our existing policy so that it is more effective and equitable during the interim. There's pin number one.
We're skeptical about proposals. To pre-identify domains that will require delay revocation. Pin number two.
We expect that many sites might ask for such exceptions, and an extensive amount of deliberation would be required in order to process these requests. Worse, in practice, doubtless some sites impacted in a revocation event would not have followed the procedure, and CAs will still be left with a last minute decision about whether a revocation will inflict substantial harm. Instead, we would like to seek the community's feedback on the following two proposals.
Number one, clarification of existing requirements. We would be more explicit about what would be required for delayed revocation. Some ideas include, and then there are six bullets here. We'll go through all six.
1. A explicitly clarifying that CAs must revoke certificates by default, that any delayed revocation must be the result of an explicit request by the subscriber containing the necessary information and meeting the requirements under such interim policy.
2. That subscriber requests contain a clear claim or explanation about the critical nature of the system and why timely revocation is not possible. (More detailed requirements to be discussed.)
3. That the requests are signed by a company officer or similar legal representative, stating that the information in the request is accurate to the best of their knowledge.
4. That the information contained in the subscribers request be accurate to the CA's understanding (e.g. not materially contradicted by other facts known to the CA)
5. That each granted request be published for the community and Mozilla to scrutinize (allowing CAs to redact PII prior to publication.) and finally,
6. That CAs be required to produce summary statistics in their reports alongside the individual granted requests, detailing how many requests were received, how many were well formed, how many were granted, etc.
That was part one. Here's part two.
Consequences of delayed revocation. This one doesn't have bullets. It just has a couple paragraphs.
We believe that if a domain hosts critical infrastructure that cannot tolerate timely revocation, then it is deeply damaging to the WebPKI. In order to help these domains transition to effective certificate management practices and automated tooling, we propose that domains that are granted delayed revocation must then be limited to shorter lifetime certificates as a consequence of such decision. This also ensures that future revocations impacting such domains have much less impact. Concretely, the domains accepted for delayed revocation by a CA are already public. If this proposal were to be adopted, root programs would manage a shared list of such domains, e.g. via the CCADB - We had an episode on that in the past - and require CAs to limit the lifetimes of certificates issued to these domains. (e.g. to 90 days.)
Okay, now he's wrapping up.
We stress that both of these proposals are presented for early feedback, and we look forward to the community's thoughts on whether they are likely to be suitable and effective and how they might be improved. We would also look to align these policies across the root programs in order to provide clarity for the entire community. Thanks and best wishes. Ben.
Okay, that's all of the words of the email.
Jason SorokoJason SorokoTim.
Tim CallanTim CallanInitial response is Jay.
Jason SorokoJason SorokoI'm gonna let you take a breath, Tim. And I'm gonna attempt to bring that down to as few words as possible, at the risk of extreme oversimplification here. And I think what he's saying is we're gonna take the decision making and the request making of delayed revocation out of the hands of the CAs and put it to the subscriber, because they have to now submit, why - -
Tim CallanTim CallanI think maybe a different way to put it is they are going to codify process whereby the subscriber must request the delay - -
Jason SorokoJason SorokoThat's right.
Tim CallanTim CallanAnd the CA still doesn't have to grant it, and that delay, there's much more process around requesting the delay.
Jason SorokoJason SorokoPerfect. Therefore, that process is going to be a lot more in the hands of the real requester of delayed revocation, which is the subscriber, and not so much the CA. So to me, it's a shift of onus of responsibility here.
And if you are granted - because there's definitely a note saying there are legitimate reasons for delayed revocation but if you are going to, if you're on that exceptions list, you're going to have to use the 90 day cert. So I mean that, I think that's really what it comes down to, is that shift of responsibility and that it - -
Tim CallanTim CallanAnd then the new and then the change, right, the new cap. At that point, 90 day certs become your new cap.
Jason SorokoJason SorokoThat's right. And you know what? I tell you what? So that's my thinking of what was said, and here's the reaction. I think this is completely rational.
Tim CallanTim CallanAbsolutely.
Jason SorokoJason SorokoI think it's completely smart.
Tim CallanTim CallanWe posted a response to this, and our first, the first sentence of our response was something to the effect of, we applaud you and the CCADB committee for this good work. I think it's great. I think there are things to be figured out, and then we had detailed responses, and there's corner cases and stuff that needs to be discussed that are going to be out of scope for this episode today, but I think at a high level, it's really great.
Why don't we drill down a little on some of these? If we go back to the top. We agree that expanding beyond the existing revocation timelines is undesirable.
So there has been another thread that's been running for about four years to say is a 120 hour revocation window really the right revocation window for certain, let's call them clerical problems with certificates? And those could be very explicitly defined. And I am of the opinion that the WebPKI would be better off if there were three different bands of revocation based on the nature of the problem, rather than two, and that there's room for a band of revocation for something that is deemed to be much more trivial. And if you use analogies from other sections - like the common practice with security vulnerabilities is that critical vulnerabilities have to be patched within a few days, high priority vulnerabilities have to be patched within a month. Those are still high priority vulnerabilities, and they get patched within a month. So it is physically impossible for there to be a certificate misissuance that could tolerate more than 120 hours? I don't buy it. So there was a talk around having a third tranche.
What Ben is saying here, which echoes things we've heard from other browsers, is that's off the table right now. Stop talking about it. We're not even open to considering it at the moment. So that's basically what the first paragraph says. And if you didn't make sense, that's why that's there.
The second paragraph is there were proposals floating around that people would build these pre-approved lists of domains or certificates that did not warrant - that got an exception if a mandatory revocation event were to occur. The second paragraph saying we are skeptical about proposals to pre-identify domains that will etc. That is basically shooting that down, saying that's not really going to work. And I completely agree it's not going to work for the same reasons that Ben said. We don't need to repeat those.
Then we get to the proposal itself. So let's break down clarifications of existing requirements, because I think this is really interesting.
A - A is just explicitly clarifying that CAs have to revoke on time and that any delayed revocation must be the result of an explicit request by the subscriber.
So what we had was, we had this practice of CAs just granting delayed revocation to huge tranches of misissued certs or entire misissuance events. And we're talking about things that could be in the 10s of 1000s, hundreds of 1000s, where they just didn't they just decided we're going to give everybody a delay. And this is explicitly making that illegal, which granted, it was illegal anyway, but everybody seemed to be getting away with it. And I think this language is to clarify that you're not going to get away with it anymore.
The next one, B, says, what you said. So the subscriber must place a request, it must have a clear claim or explanation about, number one, the critical nature of the systems. And number two, why timely revocation is not possible. This is extremely important, because right now we got these vague, hand wavy, oh, babies are going to die. Wave your hands. Vague, vague, vague. And this is where now the subscriber has to come back and say, this is very specifically the harm that will occur. And it's better than the harm to the WebPKI by not revoking misissued certs, and this is the reason we can't take care of it.
And I think that's important just because it requires a lot more specificity. It requires a lot more work on the part of the subscriber. I've been contending, Jay, that a lot of this is laziness on the subscriber’s part. Like, why? I don't really want to do this. It sounds like it's miserable, so I'm just not gonna, and this puts more work back on them. They've got to prove that it really makes sense to do, and they got to be explicit, and their name is going to be on it. So if you're a major financial institution and you can't deal with a certificate event, then you kind of deserve to be publicly shamed for that. Or if there are lives at risk and you can't deal with basic IT operations like changing certificates, and people are going to die if you have to change certificates, then the public deserves to know that. So that's what occurs there.
C. And then connected, C, that the requests are signed by a company officer or similar legal representative. So there's got to be the company's official position. Company is making this position officially and living with the consequences of that.
D. That the information contained in the subscriber’s request be accurate to the CA’s understanding. Fine.
E. That each granted request be published for the community to scrutinize. So this is the visibility. The point is, no, you're gonna have to say - and note this is on a domain by domain basis. So they can't turn around and say all the domains that belong to Tim's bank. Tim's bank is just getting an exception. They've got to list the individual domains at Tim's bank that they're going to give this exception to, and they've got to have a justification for each of them individually because Tim's bank isn't a big homogenous mess of certs. Different systems do different things, and maybe some of them are more critical than others, maybe some of them are less agile than others. So they're forcing us to, again, he's taking a lot of the laziness out of this, which I think is great, and the thoughtlessness and that, F, CAs be required to produce statistics.
So like you said, the first part is not only because the subscribers have to make this request, but also that it has to be really granular to the level of a domain by domain basis, which is very important.
And then number two, we believe that if a document hosts critical infrastructure that cannot tolerate timely revocation, it's deeply damaged the WebPKI, and therefore we would reduce it. Now the 90 day is, e.g., it says to limit the lifetimes of certificates issued to these domains, (e.g. to 90 days). So it wouldn't have to be 90 days, I suppose, according to this proposal. In the responses so far as of a couple days ago, which is the last time I looked, I did not see anybody proposing that the timeline be different. So the timeline, it feels like people are taking 90 days as the timeline. And that makes sense, because there's been so much focus on it because of what Chrome has said and done, that you would think they would pick 90 days. It'd be weird to pick something else. And so a couple other subtleties about number two that I'll point out.
Number one, this isn't limited to the CA that granted the exception. That domain would be limited to 90 days for any CA. And I think this is very important, because you can't just go shop down the street. If you get this exception because you don't want to deal with it, you can't just turn around and buy certs from somebody else. You're done. You're at 90 days everywhere.
Jason SorokoJason SorokoCan I put in a tongue and cheek statement there, Tim? You can't find a CA that will more easily cave than another bite at your request.
Tim CallanTim CallanAbsolutely. You can't shop around for the sucker CA. You also can't just go to the next one, go to the next one, go to the next one. Can't just keep kicking the can down the road. Both of those are eliminated. And also from the perspective of, why are we doing this in the first place? If these systems are sufficiently unagile, or are insufficiently agile, that you can maintain basic certificate operations on them and are sufficiently critical that a lack of certificate, that an outage would have horrible effects on society, then you know what? That's true regardless of who your CA is, and you need to be on short lifespan certs no matter what. So, that's that.
The other thing I'll point out is there's no end date. He doesn't say we're going to reduce you to 90 days for a year, or we're going to reduce you to 90 days until you prove you have the agility in place. We're just plain going to reduce you to 90 days. Like until the sun goes supernova. That's it. We're at 90 days. It's done. And so I think that's important as well, for two reasons. One of which is it shows, again, the root cause if, these systems are so critical and so bad that we can't do a mandatory revocation, then they just need to be on 90 days. We're not going to change. It's not going to change in six months. The other point about it, though, is that I feel like this is written by somebody with an awareness of the fact that it's not going to be so long before everybody is down to that time period anyway.
Jason SorokoJason SorokoExactly.
Tim CallanTim CallanSo why put an end date on it that's going to be after the day that it's going to be a requirement anyway?
Jason SorokoJason SorokoTim, here's a final thought then on everything you just said, and I think this is the call to arms, even to you and I personally on this topic, and that is between mass revocation due to misissuance being inevitable, we're going to see this through time, 90 day coming for everyone, not just what you just mentioned, and everything else that you just talked about, which probably has not been talked about enough critical infrastructure, that's not agile.
I would say, I think when you and I bring up the term certificate lifecycle management, when you've brought up the term certificate agility, and go back to Root Causes Episode 117, on that topic - -
Tim CallanTim CallanWay back. Way back.
Jason SorokoJason SorokoThat's nearly a five year old podcast folks.
Tim CallanTim Callan300 episodes ago.
Jason SorokoJason SorokoSo I'm gonna say it this way, Tim. I think that when people hear you and I talk about that topic, total certificate agility, they might be making the mistake that we're talking about some sort of esoteric inside baseball for CAs and now, my friends, you've just heard from Tim that it's for all of us.
Tim CallanTim CallanI agree, Jay.
Jason SorokoJason SorokoTherefore, Tim, we need to double down on the message of total certificate agility, what it means, how you can achieve it. And everybody, including you people who are procuring and subscribing to certificates, you need to understand what that is, because your future without certificate agility is at an end.
Tim CallanTim CallanAnd if I can build on what you said there, Jay, I think what's interesting is that when you look at this proposed policy, it almost goes down to two branches. Branch number one is when a mandatory revocation event comes along, your organization displays certificate agility and deals with it within the mandated time frame. Branch number two is that your organization does not display certificate agility, and as a consequence, a forcing change occurs so that you must implement certificate agility. So what this proposal does is it allows institutions that are incapable of certificate agility to identify themselves as such and then be put on the program where they have no choice.
Jason SorokoJason SorokoWow. Tim, as you were saying that a thought occurred, and I'm thinking now that 90 day for everyone might just be that much more important now.
Tim CallanTim CallanSo Ben doesn't specifically say who he means by we, but he says we, several times in this we have discussed. He talks about things being run by the CCADB committee. I do believe that any policy like this that gets put in place will be materially matched by the other major root programs, and that this will become a general policy that you really need to use if you're in the WebPKI, and just the fact that Ben is suggesting that CCADB might maintain this list, the active members of CCADB are Mozilla, Chrome and Apple, so at a bare minimum, Mozilla, Chrome and Apple would all have to be essentially on the same page here, and at that point, that's the whole industry.
Jason SorokoJason SorokoI mean, you brought the term WebPKI, and it's all of that.
It’s going to be all the trust programs. And I think if you're a CA, and also not playing ball with this, everybody's in on this. Everybody will have to be in on this.
Tim CallanTim CallanThis will be absolutely forcing. If you want to remain a public CA, and remain a commercially viable public CA, you will have to comply. That will be that black and white, which again, I think is good.
Jason SorokoJason SorokoI think the era of CAs that want it to still be 1998, those CAs aren't going to exist.
Tim CallanTim CallanAnd, part of the reason we've had the problem we have is like vagueness in the rules and CAs, let's say, exploiting that vagueness. And I'll just say it - oftentimes, I think, in complete bad faith. And this is eliminating that vagueness.
Jason SorokoJason SorokoIt's doing it in a smart way. That missing component - and you've said it a few times, that missing component of putting the responsibility back onto the subscriber, so putting the pressure on the CA isn't work. Basically.
Tim CallanTim CallanWho knows better than the subscriber? And this is one of the things I've been maintaining through this whole debate. CAs can't judge whether subscribers are capable or not of swapping out certs. CAs also can't judge the true consequences of that. The subscriber is the only organization that can make those judgments. So now the subscriber makes those judgments. And the subscriber lives with the consequences of those judgments, and so I that’s great.
Jason SorokoJason SorokoNow, Tim, I'm just repeating myself now. I can't help but think this is so important. Subscribers, you are now part of the system and therefore total certificate agility has to be what you're doing, and otherwise this isn't going to work, and it's not going to work for you.
Tim CallanTim CallanIt’ll work fine for Mozilla. What's not going to work for is you, yes.
Jason SorokoJason SorokoSo I think the role we play, Tim, as the CA, and especially on this podcast, I think we got to double and triple down on making people understand what this means.
Tim CallanTim CallanI agree. I think, yes, yes, education is extremely important. First of all, listeners, we will keep you informed as this develops. Right now, again, this is a proposal, though it's a very concrete and explicit proposal, and specific proposal from Mozilla, Ben emphasizes at this point, this is just a proposal for feedback. So don't view any of this as a hard commit. That said, I will be deeply surprised if we don't wind up with a policy from one or more major root programs that look a lot like this this year.
Jason SorokoJason SorokoI think you have to plan towards it. For sure. And it's the right thing.
Tim CallanTim CallanAnd so as it becomes real and it solidifies, we'll tell you what's going on. But you know this is a big deal. As soon as I read this, I was like, that's a big deal.
Jason SorokoJason SorokoWell, thank you for reporting it to us, Tim, word for word, and it was great to contemplate it. Audience, you're going to hear a lot more about this down the road.
Tim CallanTim CallanI think so. Thank you, Jason.
Jason SorokoJason SorokoThank you.
Tim CallanTim CallanThis has been Root Causes.

Stay informed with expert insights

Subscribe to Root Causes for engaging discussions on PKI, digital security, and best practices for protecting your organization's critical assets. Don’t miss an episode!

Listen on Apple PodcastsListen on SpotifyListen on SoundCloud