Redirecting you to
Podcast May 26, 2020

Root Causes 94: Revocation Checking Through OCSP and CRL

One essential portion of the certificate lifecycle is the ability to revoke certificates. Public SSL certificates use a pair of mechanisms to communicate this revocation status to client machines, CRL and OCSP. In this episode we explain how these mechanisms work and some of their strengths and challenges.

  • Original Broadcast Date: May 26, 2020

Episode Transcript

Lightly edited for flow and brevity.

  • Tim Callan

    Today, we are going to talk about revocation checking mechanisms in the world of public SSL, which is to say CRL and OCSP.

  • Jason Soroko

    Yeah, it's a big topic. It's important when you're issuing SSL certificates.

  • Tim Callan

    Yeah.

  • Jason Soroko

    What do you do with that case when a certificate is revoked? Right.

  • Tim Callan

    So, let's go back—revocation right. There are good reasons why we need to have revocation. Those can be owner-initiated revocation, which is to say, I had this cert, but I'm not sure, I think maybe the private key was compromised so I just want to revoke it and get a new one, or I had the cert, but I know I'm not going to use it anymore and just in order to keep things clean, I want to revoke it. There's good reasons why the private certificate owner needs to be able to do that. There are also reasons why the CA needs to be able to do that. We think the certificate is being used incorrectly, the certificate was issued in error, the certificate was etc. and then you need to be able to go revoke. So, the mechanism, therefore, is that there are a list of certificates that are known to be revoked by the CA, right? The CA is this single hierarchical authority that sits at the top; thus, the name CA and the CA says, here is the list of certificates that are revoked. Now you need a way that the client machine that's out there in the world can compare to that list to say, is this certificate revoked, or is this certificate not revoked, and the first thing that the world had was CRL, which stands for certificate revocation list and CRL is just basically a big list of revoked certificates. And there are things that CAs can do to make that system a little more efficient like once the certificates would have expired, you take them off the list, like they can break the list up into shards. So. instead of one list, there's multiple lists and there's a way to get the right list so those are things that they can do. But at the end of the day, as you might imagine, it's a heavy system. Right? It's a heavy process and it's an especially heavy process, as the number of certificates in the world continues to increase by leaps and bounds. And so, a long time ago, well, more than a decade ago, the industry had created what we call OCSP, which stands for online certificate security protocol. And what OCSP does - - status protocol. Online certificate status protocol, excuse me. And what OCSP does, is it's a one-time check of the one cert in question. So, there's a SaaS system that lives, that takes a query on a cert and returns a status. This is revoked. This is not revoked. And OCSP is the more modern and the more current version. CRL is still available, but I think - - I don't think CRL is - - we couldn't use CRL as a cornerstone for our security. Let's put it that way. So go ahead, Jay.

  • Jason Soroko

    No, you're right, Tim. In fact, by the time we get to the end of this story and this podcast will end up finding that CRLs are actually used very commonly, it's just it's a subset. It's a small subset for specific purpose.

  • Tim Callan

    Right and some of that comes to some of the baggage with OCSP. So, you might say, okay, great well, that sounds lovely. Right? That sounds perfect. I think that's a great system and in principle on paper, it looks like it would be a great system. Unfortunately, OCSP is not perfect, either. The problem with OCSP has to do with what happens if that individual query doesn't return. Right? So, you've got lots and lots of computers all over the world, hitting OCSP responders that themselves might be all over the world and they're asking for the status of a cert and they want an answer back right darn away because if you don't get your answer back right away what are you going to do? Are you going to stop loading the page or you're going to keep loading the page, right? And if you stop loading the page then that's a problem because people have slow internet usage and it sucks or if you stop that process whatever it is, and if you keep going with that process or keep loading that page, then you potentially have the problem that a certificate validity - - the certificate validity status might not be obtained and a potentially revoked cert might be not known to revoked. If that makes sense.

  • Jason Soroko

    Yeah. And how do browsers have to deal with that is, a lot of times the policy was to fail soft. Right?

  • Tim Callan

    Right.

  • Jason Soroko

    Right, which I think Adam Langley had quoted the term, a seatbelt, that breaks, right. So, in other words, that was really quite problematic.

  • Tim Callan

    Yeah, so let's walk through that because it's not necessarily intuitively obvious until someone explains it. But I think it's a very, very important point. So, you go, okay, we will have a fail-soft system. So, what will happen is if the certificate is revoked, then what we will do is - - I'm sorry. If we don't get an answer, then what we will do is we will fail soft. So, the problem with that goes as follows.

    Let's look at the scenario where I do steal a key. So, I say I'm a sophisticated attacker. I steal a key. I hijack your DNS. I send you off to my fake website, that looks like the real website, including the URL. I stick your certificate on there. Right? Now, the problem with that at this point is that if that certificate - - so, the owner of the certificate finds out that this has happened. The owner of certificate revokes the cert. Now, if my captive audience that I've stolen their DNS comes to my site and they go off and they check for certificate revocation once I own your machine anyway, I can arrange for that OCSP response not to come back. And under those circumstances, if the software fails soft, then it says, okay, let's proceed, which means that in the exact scenario that the system is built to defend against, that is the scenario where the system ops not to work. So, it’s equivalent to a seatbelt that breaks when you get in an accident. Why bother? I wear my seatbelt and my seatbelts there across my chest and everything's fine all the time while I'm driving around but the instant that I get into an accident, the seatbelt breaks and offers me no protection, on the one occasion that I need it. And that is - - that's the scenario we just described with OCSP.

  • Jason Soroko

    Exactly right, Tim. So really, we're talking about a resource balancing act.

  • Tim Callan

    Yeah.

  • Jason Soroko

    Between the browser, the web server and the CA. And, as you said earlier, the CRLs can be too long, the CAs would have to serve that, they'd have to upload the whole thing and perhaps make the client do all the work. That just wasn't going to work. Yeah, it was there for a while - -

  • Tim Callan

    It's a scaling problem. It was fine in 1996. But not anymore. We're just so many orders of magnitude bigger and how we use these things.

  • Jason Soroko

    That's right and OCSP in its most basic form made a lot of sense except the fact that there was also privacy concerns with it. Meaning that the CAs could then potentially collect information on where are you browsing to?

  • Tim Callan

    Who's going where? Absolutely. Right.

  • Jason Soroko

    And so therefore, OCSP stapling was the next generation of that thinking, which was to put a - - to sign a timestamp directly on to the OCSP response from the web server. So, the web server itself would do the job of collecting the validity information and then assigning the timestamp of the actual validity rate to the OCSP response itself. So that is, it's all good, except, you know, Tim, they're still, you know, trying to answer the question today and looking up, you know, the latest information about OCSP support within various web servers and within various browsers. I think basic OCSP support has been there for a long time. OCSP stapling support has also been around for quite a long time and is fairly generally accepted, but some of the other concepts such as must-staple, right, because we've had whole podcast where we've talked about CAA, we've talked about CT, CT logs, and the various ways that you can protect yourself as a domain owner from having certificates issued outside of your purview. This is a whole other set of problems with respect to revocation in that, you know, how do you know even that, that the browser that's browsing to your website is handling your OCSP responses and your stapling and your must-staples correctly. There's a lot of choices to be made here and from there's a whole soup around this that that we could probably go through every single detail and it would take several podcasts about all the all the intricacies and the problems related about - -

  • Tim Callan

    Knowing us, we probably will. But not today.

  • Jason Soroko

    Yeah, I guess the conclusion we can come to this, Tim, is it's not clean. It's really not clean. Some of the major issues isn't just browser, you know, the typical problem of hey, does this do all browsers support newer, newer functionalities, etc.? It also has to do with things like, does Nginx and Apache server, which is the majority of servers out there - - are they doing - - are they supporting this correctly? Are the implementations really good? And the evidence that I'm seeing is, I think they're definitely trying, but I don't think we're there yet.

  • Tim Callan

    And even if you were like, even if we were, let's say we could snap our fingers and know that all of the software was working correctly and in the same way everywhere in the world, that wouldn't solve all of the problems with OCSP stapling, because one of the big concerns that people have about OCSP stapling is crypto agility. So, I have something, I have an emergency situation and I have to do something to my certs right now.

  • Jason Soroko

    That's right.

  • Tim Callan

    Well, guess what, right, because, you know, turns out someone stands up at a hacker conference and says, guess what, I blew all this crypto up and everybody goes, oh my God, I got to stop using RSA today, right? Or somebody says, I don’t trust the CA anymore, right? We've seen this with CA's DigiNotar, Certinomis, Symantec, right, stopped being trusted CAs and people add those certs had to move off those certs. So, these situations happen in the real world. And by the way, you know, if you've stapled - - now if you've used OCSP stapling and systems, you don't necessarily - - you can't just swap those certs out, they're not all interchangeable anymore. To further complicate things, what happens if I write something and it sits around and it's a legacy system and I've got a certain cert and I keep renewing that certain cert everything's fine and then one day, I decided to switch certificate vendors because the other guys got better tools, or he’s given me a better price, or the first one went out of business and I don't even realize that that decision was made until my systems are failing and I don't know why my systems are failing, because I installed the cert, it's right there. And so, there are a lot of concerns about OCSP stapling, that it's really not agile, that it's really not crypto agile, and then it's really not robust.

  • Jason Soroko

    That's exactly it. So, Tim, the whole concept of revocation anyway, when you know, we're talking now of quite a lot about the SSL use case, use cases, in other forms of PKI, other use cases, such as in DevOps with the very short life cycles, or an IoT, where they, you know, the way the world went with IoT was in the early, early days, there were a lot of fire-and-forget certificates that went out there.

  • Tim Callan

    Yeah.

  • Jason Soroko

    And, you know, the risks were known, but the ability and the technology to be able to swap out the search simply just wasn't there X number of years ago. Now that we have a lot more certificate automation put into place, especially in IoT, etc., for certain kinds of devices that are capable of it. revocation in IoT is really a problematic concept. I mean, it could definitely be put in there for certain use cases but from my experience, and what I've seen, what was a just a better option all around was to shorten the lifecycle of the certificates themselves and we've definitely seen that in SSL, right?

  • Tim Callan

    Yes.

  • Jason Soroko

    We've now gone down to one year. We have we have some CAs that are down to, you know, 90 days as either a policy or as an option and I think that that is going to be a continued trend. As automation technologies continue to evolve and get to the point to where they are now where we're not afraid of one-year lifespans. In fact, we're not afraid of even shorter lifespans if you employ the right certificate management technology. I think that the combination of revocation is tough, and we may not be able to solve it, and maybe we shouldn't even solve it. You know, that's a whole other question. The question then comes down to what, are other solutions? And I think that until one, one solution really comes up, certificate lifespans going down, I think is going to be the trend.

  • Tim Callan

    It is. It is. And I think that's definitely a net safety improvement. I'm not sure that that gets us all the way to home plate, though, right? Because, okay, you say, if I take what used to be six-year certs and I turn them into one-year certs, I've reduced the attack time to 1/6 of what it was. Okay, that's good. If I take one-year certs, and I turn them into 90-day certs, I reduce the attack time again. But gee, man, if I steal your private key, I don't need very much time to do something bad. 90 days is plenty. So, a 90-day cert doesn't get us there. Maybe a two-hour DevOps cert gets us there. Maybe. But that doesn't work in the world of static web servers. That doesn't work for my cert that's sitting on my public facing web page, or even on my cert that's inside my own owned hardware clients - - you know, in-house systems, that that doesn't solve it for them. So, we still aren’t at the point where we can live without revocation. Even with the kind of shorter lifespans that we're talking about, right? If there were a truly automated system - - if I had a truly automated system, where all of my systems internally could get a one-day certificate updated every 12 hours, continually, right, 365 days a year, then, okay, now we're getting there. But I think that's the kind of thing we need to be talking about.

  • Jason Soroko

    I agree, Tim. That's a really good thought. And I think it's a good time to really be having these conversations. I mean, these problems that we're referring to - - I remember that they were on the tip of our tongue for quite a long time, since at least, you know, 2012 onwards and even though all these different flavors and forms of OCSP stapling, and the various kinds of x.509 extensions to help to do certain kinds of things have been put in place, and a lot of good thinking has been done, I think still, at the end of the day, no matter how much effort we put into this, there's still going to be some kind of an issue.

  • Tim Callan

    Yeah, this has been one of the thornier and trickier issues, like all the way back in the very beginning, when we first started to say, we're going to use, you know, PKI-based certificates in a public set of circumstances, like the internet, people all the way back then thought about this problem. This revocation problem. And there aren’t a lot of sort of fundamental ideas of computer science, that somebody posed in the early 1990s, that still in the year 2020, aren’t really worked through. That's kind a rare animal. Right? And, and it's a thorny one, it's a tough one, and the amount of intellectual horsepower that has gone into this is very high. So, it's not unresolved through lack of attention, or lack of focus, or lack of trying, it's unresolved because it's genuinely really hard.

  • Jason Soroko

    Yes, it is hard and especially when you are dealing with multiple parties.

  • Tim Callan

    Yes.

  • Jason Soroko

    There's so many moving parts here – the CA, the web servers, and, and the browsers themselves.

  • Tim Callan

    Right. Right. And there's that right, like you were saying, and we talk about this a lot, right. One of the things about our standard PKI systems, our x.509, and our certificate types that we all use and we all know starting with TLS, one of the big, big strengths of it is this amazing, ubiquitous support. That I can reliably go buy any hardware or software application from any vendor in the world, or subscribe to any SaaS service from any vendor in the world and have damn near 100% confidence that my off-the-shelf SSL certificate is going to work, right? That is hugely important. And but what's the downside? The downside is agility, right? The downside is we have a situation like this and it's, it becomes very difficult to change these basic foundational elements of the system.

  • Jason Soroko

    Well, Tim, we live in a world, right, where the internet the way that it was built, unfortunately, things like DDoS are entirely possible. And so, anytime, you know, a system that needs to be queried that is publicly available, it's a target for downtime, unintended downtime because of even malicious circumstances or otherwise. And, you know, if we go all the way back to that CRL idea, I think that was it - - I think it is Chrome, and I think it is Firefox that support basically a subset form, each of them have their own names, but a subset form of the CRL list, which targets not necessarily, you know, all DV certificates, which would be a very, very long list, a very long CRL list, what the list that they're actually generate and create is, I believe it's intermediate CAs that perhaps have been revoked, which is important in case there's large scale problems, as well as potentially, very high value EV certificates as well.

  • Tim Callan

    Right and you can imagine something like that makes more sense, right? Where especially the intermediate, because the number of intermediates you have in the world is I don't know how many orders of magnitude lower than the number of end leaf certs you have in the world, but a lot. And that's where if you could see that somebody malicious was doing something, you know, in an organized systematic fashion and this is not unlike what we talked about with Kazakhstan last year, and, you know, you start to imagine state actors and high value criminal organizations and things like that, at that point, you can say, okay, maintaining a rigorously checked CRL for those intermediates, is scalable, can be fast, and can really matter.

  • Jason Soroko

    Yes, it - - until we get to a solution, it’s definitely, you know, the things are in place to make that happen. I will tell you, though as a CA, from our point of view, the amount of OCSP traffic that we support is really quite remarkable. I don't think people realize just how much traffic comes through and it's actually handled fairly well, for the most part. And, you know, it shows that even though the system is not perfect, I guess maybe Tim, the way we can end this - - this podcast is, you know, it does work to some degree, but there's definitely problems with it and we have to acknowledge the fact that we need to think differently, at some point in the future.

  • Tim Callan

    Yeah, it works. I think it would be fair to say it works most of the time. But, not the damn near 100% of the time that I said earlier on. Right? And that's, of course, what we all want to be better. Like most of the time in the world of security, most of the time isn't all that good. Like you want to do better than that. So, this is definitely an area where I think everybody feels some pain.

  • Jason Soroko

    Exactly. Tim, this is good.

  • Tim Callan

    All right. Well, thank you for listening listeners. Thank you for talking to me, Jay. I always enjoy it.

  • Jason Soroko

    Thank you.

  • Tim Callan

    And this has been Root Causes.