Root Causes 346: Private Credentials in Public Code

Hosted by

Tim Callan

Chief Compliance Officer

Jason Soroko

Fellow

Original broadcast date

December 8, 2023

We uncover the epidemic of private credentials in public-facing code repositories, including why it occurs and what do to about it.

Podcast Transcript

Lightly edited for flow and brevity.

Tim CallanWe have a news article. This is by the excellent security journalist, Dan Goodin. We quote him a lot. It’s an Ars Technica article from November 15, 2023. The headline reads, “Developers Can’t Seem to Stop Exposing Credentials in Publicly Accessible Code”. I love that headline. Jason, we’ve talked about these kinds of stories in the past but what do we see in this article here?

Jason SorokoWell, it looks like Dan himself is quoting from a research by GitGuardian who reported – and I’m gonna quote it directly here – “finding almost 4,000 unique secrets stashed inside a total of 450,000 projects submitted to PyPI” – which is Python programming language repository – “Nearly 3,000 projects contained at least one unique secret. Many secrets were leaked more than once, bringing the total number of exposed secrets to almost 57,000.”

Tim CallanNow are these all in publicly available repositories? Like PyPI I presume is a publicly available repository where I could just go find this? So these are just sitting there in the world for anyone to see?

Jason SorokoYes.

Tim CallanSo, yeah. Wow. And then I see we have a list, right? Here’s a list. Go ahead.

Jason SorokoYeah. So, Tim, like you might think what are these things? These are secrets that really help to log into nothing. Right? No, Tim. No. Active Directory API keys. DropBox keys. Auth0. Auth Zero keys. SSH credentials. Coinbase credentials.

Tim CallanRight. Yes. Oh my God. Database credentials for providers such as MongoDB, MySQL, and PostgreSQ. Like oh my God! This is the real stuff. Those are the real keys to let you into the real secrets. Give you the real access that you really don’t want people to have. And these are just hard-coded. So, these are just hard-coded into scripts pure and simple?

Jason SorokoYes, Tim. I’m gonna get right into what that is. I’m gonna tell you why it happens as well. We’ve covered this before but let’s break it down for people who really don’t understand the scope and scale of this problem.

When you say key, Tim, this isn’t some sort of cryptographic key, you know, esoteric thing. These literally open the door. Right? These keys, which can be cryptographic in nature, they can be essentially shared secrets, these, regardless of the form factor, are literally keys to open digital doors. It allows you to log into systems. It allows you to remotely administrate these systems. So, we haven’t even talked about the associated privileges and entitlements of these credentials. The credential opens the door. When I see something like an Auth0 key or an SSH credential or an Active Director API key, we are talking about things that essentially open up databases, open up servers to sometimes, if not root level access, sometimes very, very privileged access. Because, the typical person who is using an SSH credential is typically some sort of administrator or developer with a lot of credential. Right? A lot of privilege is what I mean.

Tim CallanAbsolutely or you wouldn’t have an SSH credential at all.

Jason SorokoYes, there definitely are people who will cut down what can be done with the associated login but, unfortunately, the reality is that that’s not often done. It typically is very, very privileged credentials. So let’s now talk about why these things are happening.

Well, developers who are creating scripts, developers who are creating, I need to have my computer system whether it’s an API or an application talking to an API, it could be any number of computing functions. Well, in a modern application, Tim, that it’s extremely common to log into for a piece of code itself to need to log into a database.

Tim CallanRight. Of course.

Jason SorokoIt’s just fundamental in what applications and APIs do. Ok. Well, what ae the other hundred million places that applications need to log into? Well, it’s pretty much everything. Right? They might have to log into other APIs. Themselves log into other databases. There could be - - this is why the proliferation of these secrets. Why get a number like 57,000 I just quoted. And that’s nothing. Actually, I would consider that to even be low. So good on ya guys for only being 57,000 that were exposed.

So, the problem is this. It’s just so easy to take a shared secret credential and just stick it in the code so that when the code runs, the code has the credential it needs to log into a database.

Tim CallanIt just works. It’s awesome. Absolutely. Sure.

Jason SorokoBut there’s a problem, Tim. First of all, you should never code that way anyway. That’s just bad.

Tim CallanLike there’s a problem even if you don’t post it in a public code repository for starters.

Jason SorokoThat’s right. That’s right. That’s the point I’m trying to make. You got it, Tim. So, there’s a couple problems here. Putting your keys to the kingdom in a public repository – not good.

Because it’s so easy for people to go in and find it but also, you never, ever be doing this anyway. So, here’s what is interesting. I want to get to the heart of the matter. There are ways to solve this, Tim. There are ways to solve this. And, in fact, the entire secrets involved industry – and we could rhyme off all the usual suspects, right – HashiCorp Vault, Akeyless, Doppler – there’s a bunch out there. And in fact, heck, your privileged access management systems, right? CyberArk and Delinea. Those guys have vaults. And Sectigo has vaults even for things like S/MIME certificates. Right? So you can go and generate those things and be sGrow those things. There’s so many reasons to keep your secrets in a repository that can be accessed by code. So, where your hardcoded credential would have gone, instead of hardcoding the credential, you code a routine which then says, hey, I’m gonna reach out to my - -

Tim CallanSomething that calls the credential?

Jason SorokoYou got it. It retrieves it from the vault and uses it and it’s never exposed publicly. It stays in its secret location. And the code that needs to go off and log into the database can do so without harming anything and having to, it’s underlying exposed secrets to get into that sensitive database.

Tim CallanSure. So, Jason, I know that I haven’t walked a mile in these people’s shoes but that seems to me like that’s a fairly straightforward thing to do. Why doesn’t that just happen?

Jason SorokoThat’s a really good question. And I think, Tim, it comes down to laziness.

Tim CallanOk. Or ignorance.

Jason SorokoIf you were to put a pie chart of which is which, I would say laziness might be number one. Ignorance is number two and number three, it could just be hey, I don’t have budget for this. Which is a bad argument because there are open source ways of doing this.

Tim CallanRight. And also, do you have budget for a breach? Do you have budget for being owned? Right? Because that might be the outcome.

Jason SorokoYou know, Tim, I think that when I say laziness, I include developers that are under the gun to get code out extremely quickly and if it just runs, I can get onto my next project and get paid.

And I think that that’s a form of laziness in the sense - - and that’s why I say that because ignorance is a whole other thing. This is where that it could be done better but you’re cutting corners.

Tim CallanRight. And so perhaps there’s a lack of accountability aspect to that, too, where I can cut this corner and I don’t think something bad is gonna happen and even if it does, it’s not gonna happen to me.

Jason SorokoI would say, Tim, we come from the world of PKI and I would say that we run into governance and compliance a heck of a lot more often than just general IT. And, Tim, you live in that world of compliance and so about just how important having the full visibility of your risks are.

And I would say, Tim, this is one of the most unspoken about problems that are out there and one of the reasons why we call this out, yes, some of these credentials are shared secret based, they are alphanumeric tokens, which are not the equivalent to an asymmetric key. Key pair. Such as what would be contained - - a digital certificate which is bound to a public key. Yes, these things can also be hardcoded and so that’s the reason why we are bringing it up this podcast, it’s because, hey, all you folks out there that are using some sort of certificate based authentication, this is a problem for you too. And so, I’m gonna call out everybody here. Everybody who is in the governance business. For those of you who are risk officers and all the way up to CIOs and CISOs, I think we gotta start demanding code reviews and looking very specifically for these credentials that are floating around inside of code. Whether it’s publicly posted or not and just eradicate the practice and I think those of us who come from the PKI world and the world of governance and compliance, we should be the leadership in pounding our fist on the table for everybody else and say, guys, that’s just not acceptable. And an article like this – it’s a call to arms to get that going.

Tim CallanYeah. I want to quote from the actual research quote and Dan Goodin pulled this out for us - “In the course of outreach for this project, we discovered at least 15 incidents where the publisher was unaware they had made their project public. Without naming any names, we did want to mention some of these were from very large companies that have robust security teams..”

So, that kind of shows the trouble. Right? The trouble is I don’t think a lot of people sit and deliberately say I’m gonna include my secret in my code and then publish my code. I think what you get is somebody decides to put their secret in their code and then either a different individual or that same individual when that original decision was lost in the sands of time, makes the decision to publish that code. Even if I felt like my code was completely secure and owned and I could put a secret in there because it didn’t matter, it was never gonna get out – if I really want to believe that – if there is some possibility now or in the future, that my company or my organization might decide to publish this without my knowledge or without my consent or without any visibility from me or enough time has passed that I don’t remember there’s a secret in there, then that’s how this kind of thing can happen.

Jason SorokoTim, that’s quite common. So, the disjointedness between the initial developer and then the integrator of the code. I think, Tim, what’s interesting is as we get into distributed architectures, as we get into microservices, as we get into non-monolithic forms of coding – which is pretty much the way things are written now and if you’re not, my goodness what are you doing? – I think that it’s quite common for certain kinds of developers to be disjointed from one another and from operations. The call to arms is this. Right from the source, code reviews need to happen to eradicate this problem of by the time it gets into the second and third and fourth persons hands, we are like hey this code just works don’t rock it, right?

Tim CallanRight. Exactly.

Jason SorokoWell, the problem is, you might be causing more harm than good by having that kind of thinking. It is going to be worth doing a code review. And let me put it this way. We are not trying to find a needle in the haystack. It is not that hard to find hardcoded credentials. There are patterns to be able to detect this.

And, in fact, of course, I’m not saying anything new. There are vendors out there that this is their bread and butter. They look for these things. They have tools to be able to do that. What I’m saying though is the holisticness of creating an application or an API, some sort of functional code, the holisticness of this has gone away in the sense that who cares if there’s a problem upstream. This thing works so I’m just gonna leave it be.

Tim CallanAnd I’m not even necessarily looking at it, right? Like I’m not reviewing it. I don’t take the actions that might detect that this credential is in there.

Jason SorokoThis is why I call out compliance people and governance people, risk officers. You should be the ones who kind of put it all together and go I don’t care what kind of crazy practices you guys all have for loosely coupling your pieces of functional logic. What we’ve gotta do here is look for these kinds of basic mistakes. These are not advanced mistakes. These are very basic mistakes, and I would say that the solution to these things is not so difficult that you cannot put some kind of solution underneath it and shore it up. I would love for this problem to go away. Tim, as you said at the top of the podcast – we’ve seen this before. We’ve talked about this on this podcast before and now it seems to be even getting worse. What will finally break the habit and bring us around to being like, alright, everybody is using a vault now. You are not using a vault? That’s crazy. I wish developers were talking like that.

Tim CallanWell, let’s keep talking like that and maybe we can be part of that dialogue.

Jason SorokoThat’s the whole point. Dan Goodin – great article, as usual.

Stay informed with expert insights

Subscribe to Root Causes for engaging discussions on PKI, digital security, and best practices for protecting your organization's critical assets. Don’t miss an episode!