Redirecting you to
Podcast Jan 21, 2021

Root Causes 143: The Four Pillars of Certificate Automation

In this episode our hosts explain the Four Pillars of Certificate Automation: deploy, discover, revoke/replace, and renew. They detail what these pillars entail and why they're important. They also discuss the umbrella capability of visibility, which affects all four pillars.

  • Original Broadcast Date: January 21, 2021

Episode Transcript

Lightly edited for flow and brevity.

  • Tim Callan

    So, today, we want to talk about what I’m calling the four pillars of certificate automation and this is a model that I think has evolved for me over time as I have talked to a lot of people about certificate automation and really tried to compartmentalize and codify what it means and what it does and why it matters and I think it’s worth our while just to kind of walk through this perspective that, certainly that I have, and I think a lot of other people at Sectigo have as well and explain how we think about certificate automation in terms of its capabilities and its mission and ultimately what it does and what it’s for.

  • Jason Soroko

    I love that, Tim. I always find breaking things down to their principal components, their principal patterns is a lot more useful than a lot of times getting, dunking into the weeds too, too far. This is my favorite kind of subject.

  • Tim Callan

    Yeah. And I’m a big fan of compartmentalization. I think it helps your clarity of thought a lot. So, this is also a compartmentalization exercise in a way.

    Ok. So, it’s four pillars and then there is one kind of penumbra capability that I think is very important, but it isn’t associated with any of the pillars rather it enhances and improves all of them. So, we’ll hit that at the end.

    So, why don’t I list them to begin with and then we can dig into them.

  • Jason Soroko

    Sounds good.

  • Tim Callan

    Ok. So, the first pillar is deployment. The second is discovery. The third is revocation and replacement, which sometimes we refer to as life cycle management, but what that really gets down to is revocation and replacement and the fourth is renewal. And then the penumbra capability that affects all of those is visibility.

  • Jason Soroko

    Yeah. You know what? I’m already liking this, Tim.

  • Tim Callan

    Good. Yay.

    So, let’s get into it. First of all is deployment. And this may seem obvious in terms of what we mean by that, but there actually is a little bit of nuance here. Deployment is being able to start from the idea of understanding that you need a certain certificate in a certain place for a certain purpose and take that as far as possible in an automated way to the certificate actually in production on the machine in question.

  • Jason Soroko

    Yeah. I’m thinking about important concepts here. Provisioning, registration. There are various words that have been used over the years. Those are the important ones today. But yeah. Deployment I think is a good word for sort of generalizing that initial conceptualization and that initial step of getting the material where it needs to go.

  • Tim Callan

    And there is sort of subtasks inside of deployment. For example, there is creating or ordering the cert. Right? If you are the CA, it’s creating the cert. If it’s a public CA, it’s ordering the cert, which is separate from installing the cert. Right? And automation - - in an ideal world, automation is doing all these things. It is ordering or creating, installing, and confirming. And, if that can all be done in an automated fashion that’s your perfect automation solution. But let’s not lose track of the idea that a partial solution is better than no solution. So, you know, for instance, when you get down into the brass tacks of things, depending on your server OS, installation can be various levels of difficulty. Right? If your automation solution is ACME-based then you care a lot about when or not you can use ACME to get all the way to installation, but even if you couldn’t, if you could still automatically obtain the certificates, that is a benefit to the company in terms of reliability, consistency, human work and so, in all of those ways that would still be a benefit. So, you know, ideally, we want it to go rail to rail. But, even in the event that it cannot go rail to rail, there is still non-trivial benefit for an automation solution to handle as much of the deployment task as it can.

  • Jason Soroko

    Yeah, Tim. And I got to tell you, to really break this down we could probably have a month of podcasts on this topic alone because there is so much here and so much depending even on what use case you are talking about. Whether it’s a web server or an IoT device. I think what’s common here is the sheer number of chicken and egg problems that have to be solved to make this happen, which is I think a big chunk of what you are referring to. If you look at even what ACME is doing for you, it’s probably one of the best protocols available for solving the chicken and egg problem of having to have an initial key to then go get another key. Right? It’s just that whole setup, you know, causing the user to have best practices in terms of setting up a cryptographic chain of communication properly to where the key material is coming from. And, of course, in other use cases that are not web server based, those kinds of problems have to be solved in different ways. All of these could come under the topic of deployment, but absolutely there is a ton of chicken and egg problems that must be put into place by certificate management.

  • Tim Callan

    And then probably one other point to make while we are on this point is also if you are in a heterogonous environment, which most of us probably are, then deployment options may not be identical for all machines, or even all like kind of machines. Right? It might be that my Windows desktops and my Mac desktops and my Linux desktops are different or that my different servers are different. And so, that’s part of the ultimate automation picture as well is deployment may not look the same or may not even be equally thoroughly accomplished based on OS machine type but nonetheless at a high level you want everything to get as close to the last mile as it can. Right?

  • Jason Soroko

    And, Tim, if you really want to complicate it, right, remember - -

  • Tim Callan

    Sure. Why not?

  • Jason Soroko

    Remember we are referring to asymmetric technology here. PKI. In the symmetric world, which is perfectly legitimate for certain use cases, they also have to solve all of these same problems in a different way because of the shared secret issues.

  • Tim Callan

    Yeah. Yeah. Sure.

  • Jason Soroko

    And, so, there’s so much here to think about. We referred to ACME. I think that’s a good example. I think enrollment over secure transport is doing a lot here. SCEP, right? We could probably ramble off a whole number of Open Source Protocols on this as well as, you know, the world is also filled with proprietary protocols also trying to solve this. So, when you are talking about heterogonous, it’s almost an understatement.

  • Tim Callan

    Right. Ok. So, perhaps - - we just touched a bunch of things that might be their own podcast later. In particular, I love the idea of getting into ACME, SCEP and you know, EST and talking about what they are but we can’t do that today or that’ll eat up our whole podcast.

    So, number two is discovery. So, deployment, you know, deployment is where I’m starting with a cert. I’m creating the cert. Automation is already in place and I’m gonna take it all the way through and I’m gonna get it live. But there are a number of scenarios where that full life cycle isn’t being run inside the automation environment. So, the easiest example is the first time you install automation. When we first start using this automation environment there is a bunch of certificates that are already in the world, right? So now I have two choices with those certs. I either manage them manually until they all cycle out or I somehow poke them into the automation system. Right? And so, this is where discovery comes in. Discovery is the idea that you go out and you find these certs and again, the more automated the better. You go out and you find these certs and you bring them into the system where at a minimum you can identify them and watch them and know when they are gonna expire and even better when you can take over the remainder of the life cycle for those certs and run them directly from the automation platform.

  • Jason Soroko

    It’s probably one of the initial steps in what you were calling a, you know, a horizontal concept of the visibility. You really have to take inventory of your world first. Assuming that you have a clean environment ahead of time is often a misnomer, especially in the web server world. But, believe it or not, it’s also true in the IoT world where discovery of transport keys and bootstrapping certificates that might come from the manufacturing process in order to subsequently personalize the device for a specific task, you know, a lot of people won’t call that discovery. That’s just not the word that’s used but it is a discovery process and so this is a very important vertical topic, Tim, and I think it definitely deserves to be one of the big four.

  • Tim Callan

    Yeah. I think discovery, whenever where is a cert that ultimately you depend on, that matters to you, that you didn’t control the entire life span of inside your automation platform, this is where discovery is of value. So, you know, we talked about the scenario where they existed prior to the automation platform, but even if you set that aside, you just identified one. There was somebody else who had control of the cert when I got it. So, there’s a lot of ways that could happen. Think about M&A. Right? I acquire a company. Well, guess what? They’ve got stuff running. Right? And that means they’ve got certs or skunk works projects. This happens all the time. What we often call rogue certs, which is folks are out there and they don’t roll up to the IT department. They were hired by the marketing department to put something together and it involves certs and so, you know, these things are little ticking time bombs. You know, every cert will expire and in principle, when the cert expires, something is gonna stop working correctly. Otherwise, why are you using the cert? And so, any scenario where you don’t have full complete control over that certificate from the day it’s born until the day it dies is a scenario where discovery is a boon to you because you can find these things and again, ideally bring them under management but at a bare minimum know what they are, know what their status is and know when they are expired.

  • Jason Soroko

    Yeah. For those of you who may not work in the industry, it may come as a surprise to you but there are a lot of large, mature organizations who go off and acquire publicly trusted certificates for their purposes and are not keeping really great track of those certificates and so at the most basic discovery is an important initial process to start to manage those certificates. To start to have automation. To start to have visibility. This is an important boot strapping sequence of events in order to get to visibility.

  • Tim Callan

    That’s a great one, Jay, is human error. Right?

  • Jason Soroko

    Yeah.

  • Tim Callan

    That’s another one that discovery can help with is human error. Where you are oh my, somehow this one fell through the cracks but we found it before we had a problem.

  • Jason Soroko

    And it surprisingly happens way more often than anybody ever thought. It’s just another pandemic we are dealing with.

  • Tim Callan

    Still. Right. Still, as late as 2020, I haven’t heard of a major episode in 2021 so I have to say as late as 2020, but I’m sure one will happen in 2021 as well.

    So, number three then is revocation and replacement. So, part of certs, part of the reason we have certs and not just keys, is because we want to be able to revoke them.

  • Jason Soroko

    Yeah. You take a private key, wrap it in a certificate, which is essentially an envelope and what’s written on the envelope is a set of policies and usually one of the most important policies is the expiry date and that expiry date necessarily means that you will have to replace that at a certain point in time.

  • Tim Callan

    Yep. And so, let’s hold that. That’s what I’m calling renewal. Before the cert reaches the expiration date, I may need to revoke that. I may need to blow that up and if I do blow that up, I may need another cert operating in the same place doing the same thing that co-terminates with the original cert and has all the same details and so that’s what we call replacement. Right? Replacement isn’t, you know, officially a PKI thing, but that’s what everybody means when they say replacement. And, so, revocation and replacement, again, this is where your automation platform can be incredibly beneficial. If you have already determined that you can provision that cert from its original source, if you have already determined that you can install it and have it running on the end machine and all of that can happen in automation then why not have replacement be a one-click process. Right? That’s incredibly good. You know the certain question or the set of certain question that you want to handle and you can just literally make the order, new certs are provisioned, they are automatically switched out and everything runs just great.

  • Jason Soroko

    It’s funny you mentioned that. That is also related in IoT use cases with manufacturing certificates. So where a valid certificate is replaced and it’s exactly the same concept as you just mentioned it just happens to be for that particular purpose.

  • Tim Callan

    Right. Right. And in that scenario, that’s because I ultimately want to be the person who controls this certificate once it goes out into the world right?

  • Jason Soroko

    Yes. And additionally, if you want to add some sort of personalization or extra policies, you know, because if you think about IoT devices, right, the initial chipsets are coming out of a particular factory goodness knows where. These chipsets don’t even know what they are gonna be doing yet later in their life. Once they are then installed within the finalized device that initial certificate that was implemented before the device even knew what it was going to be will be replaced once they do know what it’s going to be, right. So, it’s a very interesting bill of materials problem to solve there.

  • Tim Callan

    Perfect example. Yes. And an interesting supply chain, right? We are all living in the world of a major supply chain attack. So, it’s an interesting supply chain need for these things to be done right because we don’t want to have exposure for, you know, we don’t want the supply chain to become an area where either our security is exposed or our functionality is limited and so replacing certs if a key part of not having those scenarios come about.

    And then lastly of the four pillars is renewal. So, this is what you were referencing earlier, Jay, which is not only do I need to be able to revoke these certs but fundamentally part of the point of a certificate is that the key doesn’t work forever. So, the certificate expires, the whole thing stops working, it time bounds it and forces you to have a fresh cert at that point. Well, as these are happening and they are expiring out in the world, guess what? This is vulnerability. This is the time bomb we talked about before and so, automated renewal is hugely important and hugely important for me to be able to know that those renewals are coming, but also for the automation system, the machinery if you will to take care of the renewal for me rather than forcing individuals to become involved at that point because of labor, because of risk of error, the things we’ve already talked about.

  • Jason Soroko

    And this is the perfect example where human labor was typically the mechanism with which these certificates were essentially renewed. Certificates can’t renew themselves.

  • Tim Callan

    Right.

  • Jason Soroko

    That’s a - -

  • Tim Callan

    By design. Yes.

  • Jason Soroko

    By design and I think it’s a good design. The problem is there has to be something that is, you know, aware of the state of the certificate. Essentially, we are talking about a stateless design and so, therefore, you’ll have to make decisions. How do I handle this? Do I pull to check, hey is this certificate expired? Is it expired? Is it expired? Or, do you have a system that simply runs down the clock, knows an expiry is gonna happen and chooses an appropriate time beforehand to then swap the certificate. Right?

  • Tim Callan

    Right. Yeah.

  • Jason Soroko

    This is absolutely – and I’m glad we landed on this, Tim, because this is the absolute argument for you really just cannot anymore with any level of scale depend on human labor to be able to deal with this. Really tough problems like provisioning, we know that people are not good at it. So, there was at least some form of electronic automation, you know, even if the protocol was very old, at least there was some mechanism but what we are landing hereon, number four, this is where human beings were used for a very long time, even until today, and that really absolutely, the whole mindset of how we think about renewal needs to change. It has to be automated.

  • Tim Callan

    Yeah. Absolutely. And once again, there’s some nuance here. So, you know, we talked about earlier when you get back to provisioning what is renewal really? Right? Renewal is taking a cert that’s known about, ordering a new cert that matches the qualities of the old cert and deploying it. Right? So, it’s almost like a variation on deployment and similarly revoke replace is like a variation on deployment. So, a lot of these depend on the same capabilities and therefore what you are gonna be able to do is gonna be similar across these different pillars and so part of the reason that’s important is as I’m making architectural decisions, if I’m thinking about how I can make my certificate automation effective within my architecture you are not just affecting your original deployment of the certs, you are also affecting your renewal, which is hugely important because that’s when the outages happen.

  • Jason Soroko

    Yeah. Well said, Tim. This is an important point folks.

  • Tim Callan

    Yeah.

  • Jason Soroko

    I really think if you are dealing with digital identities at all in your enterprise, and I don’t know anybody who isn’t right now. I mean regardless of even the size of enterprise you are - -

  • Tim Callan

    There can’t be one. Right.

  • Jason Soroko

    You are probably dealing with multiple, even use cases of multiple types of digital identities not even thinking they are even the same thing. Cause it’s so disparate, so heterogonous. But this is why I think it’s so important to break it down into these four categories as Tim is saying because it helps you to think about this thing a lot more holistically.

  • Tim Callan

    And then what’s the penumbra that sits over all this? It’s visibility. And we’ve already touched on visibility in a few ways just in describing these things, right? When we talked about knowing what certs you have and when they are gonna expire, that’s visibility. When we talked about discovery in terms of finding out that certs existed that you didn’t know exist, that’s visibility. Knowing what they are, right? So, it’s one thing to say these certificates exist, but it’s also important to understand what’s the PKI behind those certificates. Are the key lengths long enough? Are the hashing algorithms secure or are they deprecated? Right? Are there SHA-1 certs out there? These are the kinds of things that you would like to know and visibility in principle could takes lots and lots of forms, right? We had Nick France on as a guest to talk about heart bleed, right? And, you know, in principle in the day it would have been good to know when were these certs provisioned and how did that compare to when we updated our software OS. And then you could have a marker on everything that was highly vulnerable and a different marker on everything that wasn’t. So, you know, visibility could take lots and lots of forms but it’s useful and it’s also useful as part of visibility to say, how are the humans going to manage this? So now when you start thinking about certificate automation, I’m thinking about things like alerting and reporting and dashboards and now these things become important because these are the ways that the visibility actually becomes usable by the people who are ultimately managing this whole environment.

  • Jason Soroko

    Tim, my favorite example – and you rhymed off several really good examples of where visibility is important. One of my favorites lately and you have done webinars and podcasts on this, has to do in DevOps, which is a lot of people don’t realize but when you are running an orchestration engine for containers, Kubernetes being a really good example, you are running by definition, you are gonna be running a CA and that CA has all the things, a Certificate Authority, which has all the things that you would even if you were issuing publicly trusted certificate you have to make a choice of the certificate definition, you know, the encryption algorithms that you are, the key lengths and where are those certificates being issued to? When are they being issued?

  • Tim Callan

    Right.

  • Jason Soroko

    What is the policy around the expiry timing? Almost every bullet point I just said, we’ve almost had a whole podcast on each of those bullet points.

  • Tim Callan

    Yeah.

  • Jason Soroko

    And right now, I would say that that is probably, this is homework for anybody listening to this podcast. If you are doing DevOps at all, you are probably using some containerizing technology and that containerizing technology if you have more than one or two containers, you are probably orchestrating that with something like Kubernetes, which means you probably have a CA down in there somewhere.

  • Tim Callan

    There’s PKI.

  • Jason Soroko

    Which means do have visibility to it?

  • Tim Callan

    Yep.

  • Jason Soroko

    The answer at this point in 2021 is probably no. And that’s scary.

  • Tim Callan

    Yeah. And now you are getting near another one of the points of visibility, which is compliance reporting or compliance awareness. Right? So, as you have policies how do you know that your certificates match your policies? Well, gee, that’s especially hard if you don’t know that the certificates exist.

  • Jason Soroko

    I bet right now there are very large mature organizations who have entire teams of people obsessing over their PKI governance for say a 20-year-old Microsoft CA implementation that’s been ticking for years for very important authentication purposes and yet, probably have rogue DevOp CAs underneath people’s desks. And not even knowing or understanding that those things are being created.

  • Tim Callan

    Yeah. Or sitting in the public cloud somewhere.

  • Jason Soroko

    Well, that’s - - the public cloud is the new underneath your desk.

  • Tim Callan

    Fair enough. So, yes, you’re right. It’s not actually under your desk anymore.

  • Jason Soroko

    Forgive me, Tim. I’m old.

  • Tim Callan

    You’re right. Me too. Me too. We both are.

    So, then the last point probably to be made, and this applies to the pillars and the visibility penumbra is the broader the coverage footprint, the better. There are many kinds of certificates in the world and as you’ve alluded to, we could take this basic idea and even apply it non-certificate- based keys and all these ideas would hold up. I guess except revocation. But, the more your coverage works, the better. So, if, you know, if you are any kind of sophisticated in a present environment, you have public and private certs. You have different kinds of certs. You have client certs. You have server certs. You have email certs, etc. And, the more all of this can be handled by your automation platform, the better off you are. Right? You reduce your risk of these outages that are due to unexpected expirations. You reduce the burden. The time and labor involved in provisioning and the error involved in provisioning. You ease your response to sudden events that require certificate agility. And you know what’s going on. So, the more certificate types and the more environments and the more of your digital footprint you can get automated – and I dare say, get under a single automation platform, the better off you are.

  • Jason Soroko

    This is a great podcast. Especially, to start the year. Let’s start fresh. Let’s start with really breaking down the problem sets into what they are. You know, the words that you’ve chosen I think are good general terms for the four categories. For me, because I exist in that world a little differently and some of the problems that I have to solve and the problems that I talk about with customers, sometimes we use slightly different terminologies, different words, but it all means the same thing and I think by choosing a more simple and more generic term, I think you’ve really come up with something here.

  • Tim Callan

    Cool. So, I think this is good. I’m glad we covered this because these concepts are things we’ve been talking about and we are gonna keep talking about and for the regular listeners if we bust out with one of these ideas later on, it’s nice that we’ve defined it and you know what we mean when we are talking about it.

  • Jason Soroko

    Exactly right, Tim.

  • Tim Callan

    All right. Jay, good talk, as always.

  • Jason Soroko

    Always.

  • Tim Callan

    Thank you, Listeners. This has been Root Causes.