Podcast

Root Causes 429: ServiceNow Outage Due to Expired Root Certificate

Hosted by
Tim Callan
Chief Compliance Officer
Original broadcast date
October 8, 2024

A ServiceNow private CA root expired, creating outages across hundreds of enterprises. We explain what appears to have gone on.

Podcast Transcript

Tim CallanTim CallanSo we just a couple episodes ago, talked about a major outage due to an unrenewed certificate expiration. That was Bank of England. And we're going to talk about another one today.
Jason SorokoJason SorokoJust another small company nobody's ever heard of. ServiceNow.
Tim CallanTim CallanService Someone. Someone Now. Who is that? ServiceNow. I mean, what a giant, ubiquitous IT platforms that’s all over this globe. They had an outage. I read an article that said that something like 600 enterprise customers were affected. And how did this happen, Jay? What was the problem?
Jason SorokoJason SorokoTim, it seems around September 22 ServiceNow had identified an expired TLS cross chain certificate. So we're talking about a root certificate that looks to have come and, this is me reading into the journalism because it's not 100% clear, but it looks to have come from a private CA system within ServiceNow's infrastructure that helped to basically manage their internal applications.
Tim CallanTim CallanSo it was their root that they had set up in their PKI sometime in the past, and they were using this PKI for putting certs on internal apps and roots are certificates. They're root certificates, and every certificate expires, as we've discussed. So the expiration date for this root rolled around, and I guess nobody knew?
Jason SorokoJason SorokoI don't know if it was just some sort of a botched update procedure, which suggests to me this is something that was done by hand, or if it was something that was forgotten about. It's one or the other. But I'd like to lean towards the it sounds like from some of the stories I've read, because the journalists are trying to piece this together in their own way. It sounds like some sort of a botched update procedure. So I think they knew they had to update the root cert. But the problem is that in the manual doing of it, somebody made a mistake.
Tim CallanTim CallanSo this is a point you make a lot, Jason, which is that PKI is hard, and PKI experts are experts, and everybody who is a competent IT professional is not a PKI expert. And if you get it wrong, things like this happen. The stakes are very high.
Jason SorokoJason SorokoTim, you and I have a sort of an inside language and joke between each other. Computers are hard and PKI is the hardest part of the hard.
Tim CallanTim CallanIt really is. And, it takes a long time and a lot of work to become a PKI expert. It's not a thing you just wake up tomorrow and decide you're going to be.
Jason SorokoJason SorokoThat's absolutely correct. For those of you out there in organizations that look at these outages and say, geez, I wonder if that would happen to me. I think the risks are much higher, Tim, for organizations that are running their own private CAs, that don't come from a vendor that's doing a lot of the hard work for you. In other words, that term that you've used a lot, don't roll your own crypto. Don't run your own CAs, your own private CAs. Don't try to take some open source thing from somewhere and try to pass it off as a well governed private CA. It's just really hard. And I think here is a perfect example of what happens.
Tim CallanTim CallanAn organization that organizationally definitely has a whole lot of demonstrated technical competence. It’s ServiceNow. And yet this happens to them. And, one of the things we've mentioned in the past, part of the reason that we point out when these high profile certificate error based outages occur to these companies, is that they are the best of the best. And if this can happen to ServiceNow, or Bank of England, or, going back and some of the other people we've talked about recently, Microsoft. If this can happen to them, then surely it can happen to any of us.
Jason SorokoJason SorokoIt can, and it does, and I think, Tim, interestingly, a lot of the outages we've been hearing about recently are actually private CA systems within organizations where things like this, a botched upgrade or a forgotten upgrade is part of the story. Again, there's probably a lot of these outages that are also just the classic SSL certificate that has expired and was forgotten about or nobody ever knew about it. So a lot of scenarios, but I think the private CA outages are rearing their heads more often than not.
Tim CallanTim CallanIt's an interesting angle here. Because you and I talk about this, and when we talk about this, I think there is kind of an assumption that we're talking about a leaf cert that's sitting on a server somewhere or a device somewhere, and that leaf CERT is supposed to get renewed, whether it's public root or private root and that doesn't occur, and stuff starts to crash. That's the scenario we all kind of have in our mind’s eye. But this is interesting in that it's a little bit of an object lesson in the perils of your own internal CA and not operating the internal CA correctly. And of course, the problem there is you don't lose one leaf certificate in one system. You lose all the certificates that are on this root and all the systems associated therewith. And that could be a massive blackout very easily.
Jason SorokoJason SorokoI hate to be flip about it but we always end these outage podcast the exact same way, which is to point to certificate life cycle management. The technologies for it exist. Don't do it yourself. Go to a vendor that is doing it properly. And guys, Tim and I have done over 400 episodes, and this has been just constant drum beat repetition. To let you guys know that things are different in the year 2024 going into 2025. The technology to solve these problems is out there.
Tim CallanTim CallanAnd CLM, obviously is the obvious one for your leaf certificate problem. For your PKI, for your internal CA problem, like we see here, getting an expert partner in to help you with that and deal with those details for you is also a best practice. And you really want to look at both of those technology categories moving forward if you want to protect yourself from problems like this.
Jason SorokoJason SorokoThat’s it, Tim.
Tim CallanTim CallanAll right. Thank you, Jay.
Jason SorokoJason SorokoThank you.
Tim CallanTim CallanThis has been Root Causes.

Stay informed with expert insights

Subscribe to Root Causes for engaging discussions on PKI, digital security, and best practices for protecting your organization's critical assets. Don’t miss an episode!

Listen on Apple PodcastsListen on SpotifyListen on SoundCloud