Podcast Apr 27, 2020

Root Causes 86: SSH Keys

SSH keys are essential for controlling access to production infrastructure. Our hosts are joined by repeat guest David Colon to discuss how SSH keys are used in contemporary computing environments, what risks they carry with them, and tips for IT professionals to use SSH keys easily and securely.

Original Broadcast Date: April 27, 2020

Episode Transcript

Lightly edited for flow and brevity.

Tim Callan

Once again, we have our repeat guest, David Colon. David, thank you for joining us.
David Colon

It's great to be back.
Tim Callan

David is a Senior DevOps Engineer here at Sectigo and an on the ground practitioner in the worlds of DevOps and containers and all of that fun stuff and this is something we like to talk about because the PKI implications, I think, are very specific and very important. So, we really enjoy having Dave's practical expertise with us. And to that effect, I think today, we wanted to talk about use and management of SSH keys.
Jason Soroko

Yeah, so Dave, you know, on this PKI podcast, we talk about certificates of a lot of kinds, mostly in the x.509 world from the SSL use case to various kinds of authentication use cases, etc. You know, and when I was thinking about SSH and bringing this up as a podcast, I was thinking about you specifically as a DevOps practitioner and what do you do to handle mitigating risk? And first, what is the risk with relation to things around SSH key management, and perhaps thinking even further down the road about SSH certificates, which is a newer concept, which allows you to do things like expiry? So, Dave, if you could start with just, you know, what is the risk that we're talking about here? What's the typical risk, especially because when I'm going around talking to customers and talking to practitioners, I still see a lot of SSH key material, kind of just lying around?
David Colon

Yeah. So, there's quite a bit of risk when it comes to SSH and in the world of automation. Developers traditionally, including myself, when you're working in a development environment, you tend to just want to get the work done. So, you tend to overlook security aspects at first. And some of that can creep up in production infrastructure. So, you can have SSH servers that don't even use keys and they actually use username and passwords and there are a lot of DevOps tools out there that allow you to connect to an SSH endpoint through various methods, whether it's a username and password combination, a SSH key or client certificates. So, that is definitely one very big risk is knowing exactly what they're doing and how they're authenticating and what services are actually talking to other services.
Jason Soroko

So, in terms of how you're mitigating that, I mean, there's obviously, the principle of least privileges is being broken, left, right and center from what I can tell it has been for a long time. This is not a new thing. So, you know, I remember back in the days with my full Microsoft stack, we were dealing with things such as, you know, hash management, being able to swap the hash out of Active Directory user quite often and for Active Directory domain controller administrators, you know, making sure that there were very few of them, and that their credentials were rolled over fairly often. That's, you know, those ideas still hold in the Microsoft stack. What are you doing from an SSH standpoint?
David Colon

So, I start by identifying the problem and what things typically lack in most infrastructure. The number one that comes to mind is identity, authorization, and access and SSH from default doesn't include a lot of those. So, we need to start looking at tools that enable those concepts and help give a central governance and policy control. So, that way, if the company has a security team or department, they can help manage and govern those individuals, and what access they need to.
Jason Soroko

So, in terms of toolsets that you're using, I think you and I talked a little bit before this podcast - there are some open-source ideas. There's definitely some big expensive privilege access management software packages out there. Are you using a blend of tools or one particular? You don't have to name names? I'm more interested in terms of what's the complexity of the toolset that you're working with?
David Colon

Yeah, so we started simple at first and by simple, I mean years ago, we use configuration management to put employees’ SSH keys on it and there was an arduous documentation process that met our compliance and that took a long time. So, if I hired a new employee and I wanted that employee to start working right away and troubleshooting a production server, it usually took quite a bit of time. We eventually matured that into a PKI solution. SSH allows you to use PKI to manage that and at the beginning, there wasn't any products out there that could have done this so we had a bunch of ASH scripts and it wasn't everywhere in our environment, because no one really wants to touch those legacy environments because you're afraid that if you touch that 10-year-old box that's on an older version of OpenSSL, just things are just gonna break. So that wasn't also very great. But recently, there has been quite a bit of tools that pop up that allow you to hook up your SSH authentication to a PKI infrastructure, as well as some sort of directory so that you can have users identify against that directory.
Jason Soroko

That's really, interesting. So, I'm - - I think, for people listening to this podcast, who might not have been, you know, actively mitigating risk and SSH in the earlier days, I think you've just shown how it has been done up until now. I think there's better tools now, and more definitely coming out. But I'm curious about this connection between SSH keys and PKI management. And now I'm assuming you're talking about wrapping the SSH key pair essentially taking what was a key pair generation and connecting it to the concept of a certificate which essentially acts as an envelope with a policy such as an expiry date and an identity. I'm wondering if you could expand on that?
David Colon

Yeah. So, generating key pair literally just generates a public key and a private key and that private key may or may not be protected by a password. That is literally the only information that comes out of generating a key pair. In contrast, an x.509 certificate has a lot more information. It has when the certificate is valid from and valid to also known as the expiration date. It also has subject name and common name, which has information such as an email address or who the person is and you also get this chain of trust. And all this information is what is useful for anyone that wants to interrogate the client certificate and then determine what to do with it. Obviously, if the certificate is no longer valid, just, you know, that's it, exit, don't give them access. Whereas if the certificate is valid, you can then interrogate the common name and see that oh, yeah, Bob Smith over here definitely has access to the server, let's let him in. So that's the difference between x.509 and generating a key pair. So, the concept with SSH and PKI is having a place to obtain a client certificate. So, client certificates have an expiration date. Therefore, you can control how long someone has access to certain things. And with your servers, they can have a whole chain of trust. So, your development servers could have an intermediary certificate that authenticates all developers and above. So usually, your infrastructure team has, “root access” to all your infrastructure. But your development team only should have access to development resources. So, one way of handling this is using different intermediary certificates and configuring each infrastructure component to match that. Another aspect that comes into the PKI world is identifying against something. So usually, you have a directory, and you can then - - let's say, an Active Directory, and you identify against that directory, the directory will say, oh, you're a developer, here's the development client certificate. And now that developer can log into any development server. If they try to log into a server that they don't have access to, they normally will get denied.
Tim Callan

And Dave, am I correct in my understanding that this idea of wrapping SSH keys in a certificate is relatively new and is different from the old practice, which is just to use the keys as keys?
David Colon

Yes. So, the practice is relatively new, but the technologies actually existed in OpenSSH server for quite some time. Part of the problem is managing your own PKI infrastructure isn't easy at all. Not until you have some sort of tools that help you automate the process.
Tim Callan

And so is that the reason - - I guess where I'm going with that is kind of a dual pair of questions. Number one, which is, why are we talking about SSH key certificates now in particular, as opposed to in the past? What's changing? What's making us have that conversation? And then connected to that how come it took so long? Certificates have been around a long time and why was it only now that we are having that conversation?
David Colon

So, I believe the first one is just risk, really. SSH certificates just make it a lot easier to manage, where you were using keys before, you had to rely on something like a configuration management tool. And as we spoke before, there is this concept of configuration drift. So, there's a lead time that someone can still have access to a box even though it's been revoked. The PKI model actually allows you to put an expiration date on it. So, there's an inherent security benefit there. And the reason why it's coming up recently is because there have been a couple companies, whether through open-source products or paid products, give control to security teams and infrastructure teams, on giving access to infrastructure.
Tim Callan

So, it's about management platforms make making this practical?
David Colon

Exactly. And a huge business need that isn't really mentioned here is speed and velocity. The faster that you can get your employees to the exact resources that they need right away, the faster that they can produce whatever feature that they need to.
Tim Callan

And are we seeing the similar trend we've seen in other areas of PKI, which is that the number of keys that the typical enterprise needs to have, the number of SSH keys is growing fast and that's making all of this just harder than it was in the past?
David Colon

It depends. With traditional infrastructure, and think virtual machines and physical servers, the number of SSH keys tend to remain static. The only way this number usually grows is if the company is growing or they're hiring outside help. On the other hand, cloud native applications have these concepts known as microservices. So, you find yourself in the scenario where you have micro service A needs to talk to micro service B in a secure fashion and in order to do this, they use something known as mutual TLS. And with mutual TLS, you use the concept of a root CA and intermediaries in order to create this trust. Now if you're thinking in terms of cloud native applications, yes, SSH keys will be growing because of how ephemeral and scalable Docker containers are. So, the contrast between both is usually with traditional infrastructure you have a human being, an operations person, a developer, needing access to a server. Whereas, with cloud native applications, you can have one to infinity number of containers that need to talk to one another and each one of those containers needs their own client certificate.
Jason Soroko

So, Dave in talking about certificates - - certificates have expiry dates and in a previous podcast, anybody who's interested should probably go and listen to some of Dave's previous stuff, but you brought up a real-world example of how you came up with an expiry timing of around two hours or so for a short-lived SSH certificate. Can you describe that process again, for us?
David Colon

So, our current process is a user needs to identify themselves and if they identified themselves correctly, they would get a client certificate that would expire in two hours. The reason we came back with two hours is we give that person two hours to connect to whatever resources they need. This only happens during the authentication process on SSH. Meaning if you have an open session and your client certificate expires, you're still logged on to that session and the reason two hours was a good fit for us is because a lot of us log into the same machine during troubleshooting scenarios through a couple of different terminals. And the reason we have these terminals open is one of them might have a tailing log file, while another one is looking at a configuration file. Therefore, we don't know how many terminal accesses we'll need when we first start troubleshooting as a server. So, two hours seems to be the good timeframe for us. We could make it as short as five minutes and force every user to use something like Screen and Tmux, therefore, they can open up different sessions within, but we found that with our team that wasn't an ideal workflow.
Jason Soroko

Yeah, I'm sure some people would kind of complain if it if it got short and they'd had to do extra work like that. Makes a lot of sense, Dave, appreciate it. So, you know, Dave, I think where we're at right now is really to ask you about any final thoughts for people who are not necessarily, you know, the practitioners, but people who are the risk officers, the CSOs, the CIOs about SSH key management within your enterprise. I think they've, you know, listening to this podcast, they probably learned a few things from you, but, you know, I think the risks of not managing those things are real. I think that we are discovering the fact that SSH key management and embedding them within certificates, and the management tools to do that are starting to mature. I think the number of people accessing cloud resources has meant that SSH is proliferating even more so than it has in the past where it was proliferating already. So, we're dealing with a pretty exciting and dynamic world, I know that you're right in the center of Dave. So, any final thoughts you have for us on this would be great?
David Colon

Yeah, if you find that your organization isn't using client certificates for SSH, you should definitely find out what they're using and see what steps you can take to get there. Another approach that we haven't really talked about is using a directory and having the user's SSH keys stored on the directory. However, you won't get that expiration date that you get through the PKI model. So, I would definitely strive towards the client certificate as it gives you way more control and managing access to your infrastructure.
Jason Soroko

That's great, Dave. So, you know, it's funny, and Tim, if I if I think there's a common theme here, with what Dave is teaching us seems to be a lot to do with just taking inventory of what you've got. Having these key - - having all this key material just lying around, is the kiss of death and it's a golden age to be a bad guy collecting all this key material off of file system.
Tim Callan

And this ties into the greater trend, right? What we've seen in general is just having visibility and control over your certs and what they're identifying is a big need and a big theme throughout the enterprise. Right? Not exclusively in a DevOps world. So, in that regard, it seems to tie in with, you know, with what a lot of professionals are seeing.
Jason Soroko

I would say so. So that's it. Thank you so much, Dave, for your insights on this topic. I don't think it can be talked about enough. I think this is one area where we need to beat the drum pretty hard because I think people need to be doing a better job.
Tim Callan

Alright, well, that's a great place to leave it. As always very insightful. Thank you, Jason.
Jason Soroko

Thank you, Tim.
Tim Callan

Thank you, Dave.
David Colon

Thanks for having me.
Tim Callan

Dave, I think we'd like to have you on again. I think there's more to be said here, but that'll do for today. So, thank you, audience. This has been Root Causes.

Contributors

Jason Soroko

Fellow

Tim Callan

Chief Compliance Officer

About Root Causes

Tim Callan and Jason Soroko explore the issues surrounding digital identity, PKI, and cryptographic connections in today's dynamic and evolving computing world.

View All