Redirecting you to
Podcast Feb 28, 2020

Root Causes 69: Fundamentals of DevOps and PKI

In our ongoing series on DevOps and PKI, DevOps practitioner David Colon joins us to help describe the intersection of DevOps security and PKI. We explore how PKI fits in with orchestration engines like Kubernetes and some of the practical considerations in securely using keys in such environments.

  • Original Broadcast Date: February 28, 2020

Episode Transcript

Lightly edited for flow and brevity.

  • Tim Callan

    How are you today, Jay?

  • Jason Soroko

    Doing fantastic, Tim, and really glad to have a guest on with us today.

  • Tim Callan

    Yeah. Very excited. We do have a guest and we have a guest we haven’t talked to before. So, we are joined by David Colon. Dave is Senior DevOps Engineer at Sectigo. How are you doing today, Dave?

  • David Colon

    Everything is great over here.

  • Tim Callan

    Thank you so much for joining us. We love having guests, especially expert guests and in this case, not surprisingly, we would like to talk about DevOps. This is an area you put a lot of focus in and I think you have some real viewpoints on this topic as a practitioner, right?

  • David Colon

    I like to think so. I wouldn’t necessarily call myself an expert. I probably like Jason’s phrasing that he used in a previous podcast, I’m more of a student of the subject.

  • Tim Callan

    Right. Well, I think everybody is, right? It’s so new and it’s changing so fast. Even if you knew everything about the subject it would be out of date in a couple of months. So, we are all learning all the time. So, maybe we can start at a high level. Let’s just talk about, when we say DevOps, it can mean so many so many things. In your mind, what would be your pithy definition of DevOps, Dave?

  • David Colon

    I would like to take the culture aspect. I know DevOps is found in job titles and it may be used in certain organizations correctly. The culture would be more about taking a big siloed organizational structure and breaking down those walls and allowing your developers and operations to be in the same room and architect solutions for your business together as opposed to independently.

  • Tim Callan

    And the reasons, like the benefits and, you know, we’ve talked about this a lot. But, again, as you see it, what do we get out of that? Why does it make it better?

  • David Colon

    Well, with isolated teams, you get a lot of finger pointing. It also adds to the time. So, if one team is located eight hours away, you most likely put in a request at the end of your day; it may not get back to you in two days’ time. It also doesn’t allow good communication, meaning if the request was ambiguous, the other team may want some feedback immediately to process that request and therefore, sometimes the way it was processed was incorrect and actually adds more time to delivering something.

  • Tim Callan

    And so, the DevOps process you are saying helps us with all of that?

  • David Colon

    Correct. It brings forth better communication, culture through code basically.

  • Tim Callan

    So, then there’s this word we also hear – DevSecOps. So, we’ve got a Sec in the middle, obviously, and tell us about that. What does DevSecOps mean to you?

  • David Colon

    So, I look at it from two approaches. The first one is something that should resonate with most security professionals. Whenever I speak to a security professional, I notice that they always chirp about security should always be top of mind while you’re doing something and that’s something how I resonate with DevSecOps – meaning security shouldn’t be an afterthought. It should be part of the process.

  • Tim Callan

    Right. So, it’s the design in security, build in security philosophy?

  • David Colon

    Correct. And that’s done through code.

    Now, the second approach to the whole DevSecOps is kind of what I just alluded to is the code aspect. So, one thing about DevOps culture that enables better communication, automation, is by using toolsets that have been available to developers, like Git, ORequest and these workflows and in giving that same toolset to operations teams and security teams. So that way, everyone can do peer reviews, can learn from the process and have guardrails, which is very friendly towards compliance departments.

  • Jason Soroko

    Dave, thank you for that. So, there seems to be a lot of toolsets associated with DevOps. I remember back in the development days there were all kinds of preferred IDEs and various kinds of environments that people would work in and it seems with DevOps there seems to be another layer of this, a lot of it associated with automation. Do you find that security is top of mind as part of these toolsets at this point in time? Have they gotten to that level of majority yet?

  • David Colon

    To a certain degree, yes. Security is baked into the architecture especially because these toolsets happen to be modern. So, to pick a good example is Go seems to be the language for operations, DevOps and developers and Go has a pretty robust SLL library and because of that, you kind of benefit from a more modern approach to security but there are some other aspects that have been lost from a traditional legacy approach such as PKI where PKI may not be used everywhere or certain teams don’t have the same single pane glass view as a traditional sense as opposed to the modern DevOps approach.

  • Jason Soroko

    So, Dave, I want to stay at a high level before we go off and talk maybe more about container orchestration engines and what they are doing specifically with respect to PKI. That would be probably be the next set of questions, but in terms of your daily work life, with the toolsets that you have to use and learn and all just the fractionation of tools that you need to use, does that make your life a little bit challenging with respect to reporting upwards to people who are monitoring risk, people who are perhaps auditing, people who are trying to put into place governance. Is that a challenge for you or has that been solved?

  • David Colon

    Yes. It’s definitely a challenge and it’s also one that comes in mind when we are in the architecting phase and design phase of certain projects. If I had a Security Officer ask me what level of encryption are we using for all our private keys, it becomes a bit harder with the various services that I use in my DevOps toolbox. So, you give them that report fairly quickly.

  • Jason Soroko

    That is what I expected to hear, unfortunately, and I know that Tim and I have podcasted on that at a higher level previously but to hear it from you, it seems to mean a lot more to me to hear it from an actual, somebody who does this on a daily basis. So, when we are talking about containers, when we are talking about container orchestration engines – I’m thinking of Kubernetes clusters that require a CA to distribute certificates. I know Tim and I in the past have talked about all the different places where certificates are used which could surprise some people. What does, you know, the usage of certificates for TLS mutual authentication and various other reasons, how does it affect your just trying to get DevOps stuff working for lack of a more precise way of saying it?

  • David Colon

    It’s definitely a different way of looking at it. So, being part of Sectigo, I get to use one of our enterprise products where I get a single pane view of all certificates that are issued to our organization but as soon as we started playing around with Kubernetes, we don’t have an ACME provider yet to showcase that same single pane view in our enterprise product as of now. So, it becomes challenging to see what certificates are inside my Kubernetes cluster and I have to imagine that most other DevOps engineers or cloud engineers are that using Kubernetes have a similar problem as well.

  • Jason Soroko

    Of course, I do believe that’s on the roadmap but not to get too much into that product, it is though, Tim, I think interesting to hear that single pane of glass concept being important. That’s why I’m trying to tie risk and governance to DevOps and I’m glad that Dave is echoing that back to us.

  • Tim Callan

    Yeah, and I was just going ask, Dave, when you say you have this pain point with seeing what is going on inside of Kubernetes, how do you deal with that? You are using these tools day-to-day today and it’s great that it’s on the roadmap but how are you getting around that right now?

  • David Colon

    Basically, using a bunch of scripts. We are on the cusp of really seeing a production Kubernetes cluster with production services on it and right now we are starting off with a script. I’ve played around with some CAs and try to plug it in. One that would be really cool would be Let’s Encrypt folder but that would be way too monumentous of a task to put in for my specific project that I’ve been working on. Another aspect of using the PKI infrastructure is for mutual TLS authentication. So, because of the microservices paradigm, it mandates different components to talk to each other and naturally, you have a client server model but you may start using client certificates on both ends to mutually authenticate one another.

  • Tim Callan

    And so that is each - - each enveloped, each task has its own certificate. Right?

  • David Colon

    Correct. So, to use a really good analogy it would be as if Jason had a state-issued government ID and he meets you, Tim, who he has never met before, and you also have a state-issued government ID and you both have this implicit trust with that government entity that issued those state IDs.

  • Jason Soroko

    You know, I’m gonna back it up just a second here because what you said is actually incredibly important to understand well. So, for those of you who are not steeped in things like TLS mutual authentication, one of the things that you probably do on a daily basis, you browse to websites, if you browse to a website that happens to have an SSL certificate you of course know that the browser and the webserver itself establish an encrypted connection between each other and that’s great. Mutual authentication is typically between two devices or two pieces of logic that actually need to verify the identity of each other and not just have that encrypted session. So, for those of you who come from an enterprise IT background, most of you of course have used VPN services and if you really think about what Dave just said and what Tim was talking about, all these different services, all these different things, these discrete pieces of logic that are reaching out to touch each other across sometimes very hostile environments, you know, different public clouds. It could be partner or APIs, who knows what’s going on. Every one of these things essentially needs to have an analogous VPN connection to each other. The technology that Dave is referring to is not VPN but it is analogous in the sense that both systems know who each other are and essentially there is an encrypted tunnel of communication between each other. You could almost think of this like a mesh for a lack of a better term of encrypted tunnels between discrete pieces of logic that need to speak to each other. That’s a way of describing it.

  • Tim Callan

    Yeah. And it’s for both reasons, right, Jay? And Dave. Number one is that we do need that encryption because, as you said, you know, it is a realistic scenario that someone could be able to come and spy on that traffic and among other things these microservices might be in different locations, right? They might be in different clouds.

    But then number two is that the identity piece is also really important because if I could manage to inject a false microservice into the cloud I could harvest information, give bad commands, give access, all of those doorways to breaches and attack and so, we need this to protect from really both of those scenarios. Right?

  • David Colon

    Correct.

  • Jason Soroko

    So, Dave, I’m going to have you expand on this. The analogy that I gave with VPN, the analogy that I gave with the SSL use case with a web browser, where the analogy breaks down is in the lifespan of the certificates, the lifespan of the connections and the proliferation of the certificates which makes that whole risk profile sort of jump. Would you agree with that?

  • David Colon

    Oh yeah. Most definitely. And this is why when you have a CA you most likely have a CRL service. I’m not aware of any CAs that have an out-of-the-box OCSP service. Therefore, short-lived certificates are highly valuable especially when sometimes some of these workloads are ephemeral enough that they only spin up for seconds at best and then spin back down.

  • Tim Callan

    So, what would be the duration of a certificate? Are they all gonna have the same duration? Are we setting them at a certain time period, you know 24 hours or are they being - - is it variable based on the specific task or how does that work?

  • David Colon

    So, depending on what solutions and products that you choose, some of them mandate four hours. That’s all you can set. So, you can’t be as variable as you want but for those who are building on top of let’s say Cloudflare’s SSL library, you can create some sort of scripts. I’ve talked to other colleagues in the space and I’ve seen some pretty clever solutions to handle this but, once again, thinking from a management perspective, this is a bit scary because there’s no visibility into what that engineer has done really.

  • Tim Callan

    Right. Yes. So, you say four hours, for instance, might be the cap. Is it possible that’s too short?

  • David Colon

    Yeah. And that’s where some clever solutions come about. So, in Kubernetes, there’s this concept of a sidecar. So, Kubernetes’ smallest component is something called a pod and inside that pod may be one or many containers. So, usually you have the main container serving up something and you have a sidecar and, in this model, what you see is the sidecar basically on behalf of the pod renews that certificate.

  • Tim Callan

    Ok. That is clever. So, it’s interesting that you said that it’s hard to have visibility on what that engineer did. Is there another aspect? Are you worried about having certs being too long as well?

  • David Colon

    Yeah. Once again, if you have certs that let’s say were two years long or let’s even say five years because it’s your own CA that you spun up so you are not bound to certain rules. If you are spinning up tens or thousands or hundreds of thousands of these pods or containers that each need their own SSL certificate, that CRL is gonna be quite large and what ends up happening is anything that’s checking that certificate may potentially be downloading a two-megabyte CRL hundreds of thousands of times within your network at a pretty exponential scary rate.

  • Tim Callan

    And instead, you are saying, and once the certs are expired, you just remove them from the CRL because they are expired, right? So, that allows us to keep it much, much smaller. Is that correct?

  • David Colon

    Yep, and that tends to be a trend for short-lived SSL certificates and systems like this.

  • Jason Soroko

    Yeah. Thanks, Dave. This is really good stuff.

    I’m gonna put you on the spot for maybe my last question for today, but I’ve heard of some real bad stuff. In fact, I’ve even seen it in some areas where people have in their automation script put things like their hard-coded credentials and perhaps even things related to certificates. Is that something that you are wary of and I think that the bigger question I’m asking here is I always like to sometimes leave the audience with either some homework or a best practice to really think about. What’s top of mind for you, Dave.

  • David Colon

    That’s definitely one aspect for sure. I’m kind of spoiled by working in a Certificate Authority business that I have access to HSMs. Some DevOps engineers in some Fortune 500 companies, you know, they never needed an HSM, so they can’t really store their private keys or store sensitive information, like you said, in a script in an HSM. There are open-source products by a company called HashiCorp that has vault which allows you to store these secrets in an encrypted manner, so that’s one avenue that they’d be able to use. For those in the cloud, they’re a bit more luckier because the big cloud providers provide different concepts for storing your private keys for basically sharing an HSM so, they get to have a lot more tooling available to them. So, for those that are using scripts, there are definitely products that are coming out day-to-day that they should be looking to. But those that have been in the DevOps space for five years and created their own tooling, it may be one of those things that they need to go back and look at and revisit. For me personally, if I had my management hat, it would be the visibility. Seeing what sort of RSA bit are they using. Are they using 2048, 4096, or are they using 256 because they were trying to optimize the time or the container or pod or thing to exist. Meaning if having 4096 bits took an extra 10 seconds and it wasn’t meeting the product management’s requirement of this thing must be ready instantly. You know, that’s kind of scary.

  • Tim Callan

    Right. And so that’s it and then in a few years where it’s also going to be just about am I still using RSA and ECC or am I using the newer algorithms that are going to be the secure ones moving forward in a post-Quantum world. So, suddenly that’s going to become an issue for you as well.

  • David Colon

    Yep. That’s correct.

    And another thing, too, to think about is if you have hundreds of thousands of these existing and let’s say they exist for four hours, how do you store and track that from a historical preference? There is a lot of gray area when it comes to compliance depending on the industry demand of whether or not you have to show something that backdates two years because that’s a lot of stored logs and auditing and which can be quite costly as well.

  • Tim Callan

    Yeah. So, just to play that back to you and make sure I’m getting it right, you are saying I may have a compliance requirement to demonstrate that I have had let’s say, secure key lengths and so those need to be logged and I need to be able to access them and share that information in a practical way?

  • David Colon

    Exactly. Because this is a whole new concept, I had a colleague express about how their CSO was trying to just vet out exactly what can and can’t be audited for security reasons and one of the aspects – and this goes back to code and guardrails at the DevOps philosophy – is if you show that the system can’t generate anything underneath a 2048 bit, it may take away the need to log everything that the system does, which may save you those bites.

  • Tim Callan

    Yeah. That makes great sense. You demonstrate it’s not possible and therefore you can meet your audit requirement that way without having to go and have 100,000 logged events. Got it.

    So, gee, I think that this is a deep, rich and interesting topic and we have barely scratched the surface and what I would like to suggest to both of you gentlemen is that we need to ask Dave to come back and we need to dive deep on some of the things we’ve discussed today and other aspects of DevSecOps as well. Does that sound like a good idea?

  • David Colon

    Yeah. Definitely. I could, you know, spend hours and hours talking about this and how I can slice and dice requirements and also, the expectations of meeting those requirements.

  • Tim Callan

    I think that would be really good for our listeners and I would really like to do that. So, why don’t we let that be for now and we will definitely ask you to come back and join us again and, Dave, I want to thank you so much for being an excellent guest and, Jay, of course, it’s always a pleasure to talk to you.

  • Jason Soroko

    Thank you, Tim.

  • Tim Callan

    Thank you gentlemen very much. This has been Root Causes.