Root Causes 68: Why SHA-1 Is No Longer Secure

Tim Callan

Great. So, what are we talking about today? What we're talking about today is we want to discuss SHA-1.

Jason Soroko

Yeah, SHA-1 is a popular hash function and it's - - it had been important, but it was fairly ubiquitous at one point in time.

Tim Callan

Yeah.

Jason Soroko

You know, in terms of a hash function, it's something that in terms of this PKI podcast is pretty fundamental. I bet you a lot of people on this podcast who are listeners probably already know what it is.

A hash function is any function that can be used to map data of an arbitrary size to a fixed size value. So therefore, Tim, you know, pick any sentence or just stream of text of various sizes, run it through a hash function, and then that should be chunked up into a value, which is of a given length and regardless of the length of the key value that's fitted into that hash function, the hash value will be of a given length, regardless of the size of the key length.

Tim Callan

And the given length is important—well in part, I guess, it's important because it doesn't give away information about the key, right? It's not growing and shrinking it with the size of the key. But what's the importance of the given length?

Jason Soroko

The given length is really to try to, well, for the most part, let's go back to one of the original usages of a hash function, which was basically to take key values of some kind and try to put them into a table so that they could be easily, easily retrieved. You know, if you think about the way a database works, if you could index a series of keys in a very, very easy, easy to search manner, which is, you know, a given bit length, then you can really speed up a database engine by quite a bit and that was one of the original usages for this. And, in fact, it's still used for that to this day. However, because of some of the hash functions, you know, cryptographic hash functions, some of the properties that those have, it's useful for cryptography and we'll obviously get into that a little bit more in this podcast.

Tim Callan

Yeah, and of course, we saw that the amount of use of cryptography in terms of the number of systems and transactions and connections that were using it began its explosive growth that it still has enjoyed until this day, with the widespread adoption of the worldwide web, right. With the Netscape browser and back in 1995 and if we go back to SHA-1, SHA-1 was published in what do you know, 1995. So, SHA-1 really has been just a cornerstone or was a cornerstone of the early secure internet.

Jason Soroko

Yeah, and anything that comes first is usually extremely widely adopted. So, we saw it everywhere, right? So, let's talk about real quickly, before we get into the timeline, the properties of what I think make up a good cryptographic hash function.

Tim Callan

Okay.

Jason Soroko

The first two are pretty, pretty simplistic in the sense that, you know, you could, off the top of your head probably run this off, even without knowing a lot about it. The first property is the fact that it has to be deterministic. In other words, given a key value, the resulting hash should always be the same, right? So, in other words, a deterministic function is one that you know, is it's always going to be consistent and that's important. Additionally, it needs to be fairly quick. If you're waiting on a hash function to spit out a, from a from a very, very complex set of mathematics, that's going to be problematic. The third property, I think, is one of the most important here and this is why it's used for cryptographic functions and that’s the idea of image resistance. Which is the idea that once you have the hash, it should be very difficult or let's say, extremely improbable to be able to then reverse that to what the actual value of the key was.

Tim Callan

Right. So, the key, the value of the key and the properties of the key become obscured during the hashing function so that you can't reverse engineer it and use the hash to get back to that original key and that has the advantage that it keeps that secret secret.

Jason Soroko

Absolutely, Tim. So, the idea then is even a very small value change, just one bit flip within the key value should render an extremely different or, you know, different enough hash value so that if you were trying to brute force this just with small changes, to get to retrieve hash values, you still could not determine what the original key value was.

Tim Callan

Right. And so historically, SHA-1 was considered - if we go back in time - to meet these three requirements. Yes?

Jason Soroko

Yes, absolutely. I think it was, there was a SHA-0, actually, Tim, that came out in 1993 and that actually had some flaws that - - a little bit beyond the scope of this particular podcast. But that's why SHA-1 was published in 1995 by NIST. I think it was developed by the NSA and published by NIST and that standard, was in reaction to some of the flaws that were found in SHA-0 or what was originally just called SHA, or the Secure Hash Algorithm.

Tim Callan

Right. Yes, there weren't numbers when there was the first one, it was just SHA. And, of course, that was an example of NIST doing what it still does to this day, which is, in a lot of ways NIST is an unofficial but widely followed guide for technology standards, especially around open systems like the internet and really the whole world pays a lot of attention to what NIST has to say on these topics. We've discussed NIST in the past in other podcasts and, you know, this is a great early example of NIST doing that.

Jason Soroko

Exactly, Tim. So, let's talk about this extremely simplistically. If you were to run a hash algorithm, hash function, which simply turned any key into say, a two-bit integer, for example. That's going to be easy to reverse, right. And also, you're going to end up having a lot of collisions, especially if the integer range is say, only, you know, 15 or 20 integers, right.

Tim Callan

Right.

Jason Soroko

That's, that's too simplistic of a hash digest of a hash - - a range of hash values. So, therefore, what you need to have is a large enough hash value size, bit size, in order for it to be very, very difficult to reverse and to retain those properties we talked about earlier. So, what was chosen both with the original SHA-0 and also SHA-1 is it's 160-bit digest. I guess what we're talking about today, Tim, is the fact that that digest size was known to be problematic, in the sense that theoretical attacks against it which were really brute force based, right? In other words, there was no fundamental flaw, it was just how fast is computing getting so that you could brute force, basically, the number of combinations necessary, so that you could collide a hash with a different key value, which, you know, should be extremely difficult, improbable. The problem is that that digest size simply isn't big enough for today's computing power. Today's computing power can do it in a reasonable amount of time.

Tim Callan

Right. So, in 1995, we built something that was 1995 sized, right? If we had had a digest that was too big then it would have been impractical to use it for the use cases that the world had in mind, for the applications that the world was going for. But of course, at this point, it's 25 years later and probably if you went back and talked to those people from NIST in 1995 and said, hey, is this going to be deprecated in 25 years, they probably would have, yes, it is and let's worry about that bridge when we come to it. Right? But we've come to that bridge, and so it's time to worry about it.

Jason Soroko

Yeah, and I think NIST saw this early on because around 2001 is when SHA-2 was published, which actually had a variable, you could sort of pick and choose from a menu of digest sizes that range typically from 256 to 512. There were some in between ones as well, like 224, but the popular one at the moment being 256 in terms of a digest size. It's still in use today. It's still considered strong enough. It absolutely has increased collision resistance. But, you know, my own understanding of SHA-2 is that it's - - there are some techniques in the algorithm to increase the collision resistance but really, the collision resistance comes from that larger bit size of the hash digest.

Tim Callan

And we can handle - - it's widely considered that for all the reasonable, you know, applications we'd have today that those larger bit sizes are just fine and we can, what do I want to say, process them in the timeframe that we need for, you know, for real time production systems to operate successfully.

Jason Soroko

Right. Exactly. So as time went on, Tim, you know, the attack being proposed in 2004 was talked about and then I think the publishing of that was 2005. But 2006, NIST actually called for the end of SHA-1 usage by 2010.

Tim Callan

By 2010? Ten years ago?

Jason Soroko

Exactly right. And yet, we still see SHA-1 being used today in a number of use cases. Right.

Tim Callan

So why is that? I think it’s what you said at the beginning, Tim. You know, the original SHA-1, which was the first one really implemented, because it was first, it became foundational, in a lot of people's infrastructure. We see it in Microsoft CA implementations from back in those early days and those things are still being used.

So, this is legacy implementations. This is things that were created long ago, that are still in the system somewhere. Perhaps they are - - perhaps there's some trepidation to deal with them because people don't know that the code is fully understood. They don't necessarily know - - like the people who wrote that might not be with the company anymore and even if they are heck, they don't remember why they made the decisions they made way back then and there's generally a sense of if I don't have to monkey with this, I would rather not. Is that right?

Jason Soroko

If NIST said this was a standard, you're stuck with it, and you implemented it and that was that. You never really planned for geez is my system going to last 20 years? Well, yes, it did and it's going to even go further than 20 years but the cryptographic algorithms being used are going to be deprecated and you need to have a plan to move forward.

Tim Callan

Yeah, and there was considered to be almost this event horizon, right? Like I remember back when, you know, the worldwide web started exploding in 1995, you would ask people, what's the internet going to look like 10 years from now and they would shrug and say how could anybody know that? Right? And so, you sort of got into the system of saying I can only plan so far ahead, because so much is going to change that my plans are guaranteed to be of no value. And so, you start to say to somebody look, I'm going to give you this hashing algorithm and one day, we're not going to be able to use it but that day is 15 years from now, they were not worried about that.

Jason Soroko

Well, Tim, absolutely. To the point where I mean, I was working in this field so were you back in that timeframe.

Tim Callan

Yeah.

Jason Soroko

And it was not in our - - even in our imaginations that you or I within five minutes from now could have at our fingertips, server farms worth of GPUs running, that were far more powerful than the most powerful government supercomputer?

Tim Callan

That's right, exactly correct. Right. The whole the whole idea of public cloud just wasn't even an idea, right? Or a rentable public cloud wasn't a thing that anybody was thinking about. So yeah, so, okay.

So, SHA-1 is being used primarily because people have had implementations in place for a long time and there is some amount of concern or trepidation to mess with them or just you know, they have busy schedules and lots of stuff on the roadmap and, you know, it's hard to get for this to bubble to the top or just, it's kind of out of sight out of mind. Does that sound right?

Jason Soroko

Absolutely, Tim. So, let's even describe where some of these use cases are. One of them is in just good old-fashioned, you know, early, CA especially Microsoft CA. We see SHA-1 being used there. Additionally, PGP, right? So, in other words, identities that are based off of PGP, we see SHA-1 use because those were just early, early implementations of it that are still being used. And here's a scarier one, Tim - - well, perhaps not scarier, but scary to me, obviously, you're going to use some sort of a checksum algorithm like, like a SHA-1 to be able to, hey, is the code that I implemented within my Git repository the code that I intended. So, if you run, you know, a SHA-1 algorithm against a code base to get a digest, and it comes back the same as what you expect, you should have some level of confidence that it's the code that you intended. Well, that's not true anymore, Tim.

Tim Callan

Right.

Jason Soroko

That's not true anymore because you know, a bad guy, a really sharp bad guy who can rent one of these public server farms, can change a certain number of bits, you know, make some rogue code within your code base, flip a few bits here and there to basically compute the necessary hash collision, and then all of a sudden, you're going to trust that code base.

Tim Callan

That's really bad. So maybe this is the time- - just a pithy definition of a collision attack for the listeners.

Jason Soroko

Here's a good one that I think describes the more simple ones, that is just a true collision. Let's say I send out a PDF document to you. Somebody in the middle of us, a man in the middle attack, gets a hold of my document, makes a change that is harmful to me, but beneficial to the bad guy who is in the middle.

Tim Callan

Right.

Jason Soroko

And then you receive that document. What I give you is, hey, Tim, in order to trust that document, here's the hash, here's the hash digest - - if you run your own SHA-1 against it to get the same hash then you should be able to trust my document and yes, that came from me.

Tim Callan

Then it passes muster. Yeah. And it's considered to be not only truly from you and authentic, but also untampered with.

Jason Soroko

Well, the bad guy with a certain amount of computing power, can modify the document as needed to their malicious needs and then flip a certain number of bits so that you might not see or it would be difficult to detect and then, in order for —

Tim Callan

Make it match the expected hash, the original hash.

Jason Soroko

In order to make final document match the expected hash - - you would never know the difference. Right?

Tim Callan

Right.

Jason Soroko

Unless you and I physically compared the document themselves—

Tim Callan

Right, yeah. Short of - - and then of course, you'd run into all kinds of problems, maybe, depending on the use case, maybe, someone isn't doing that or maybe there's a question about which one is right, and which one isn't and all of that. Yeah.

Jason Soroko

Yeah. So as of, you know, not that long ago, we've now seen that attack, it's not just theoretical, it's not just at the university level, we've actually had people, researchers who have done this attack successfully against SHA-1 with a reasonable amount of computing power.

Tim Callan

Right. That's the last point. A reasonable amount of computing power, right? What - - how much are we talking about?

Jason Soroko

I actually have some numbers here that I'll reveal to you in a moment. Against an even scarier attack because that's what's known as what we could call a classic or classical collision attack - that PDF attack, right.

Tim Callan

Yeah.

Jason Soroko

And in fact, for those of you who are interested, one of the best-known attacks of this was called Shattered and if you go to shattered.io, you know, the authors of that attack have a lot of great information. Please check that out if you're curious. That's a classical collision attack. But let’s talk about what's interesting within this podcast, especially, Tim, which is PKI. Let's talk about the creation of a rogue certificate, not just a rogue PDF, which is more or less a static document but something that requires, you know, chosen prefixes. So, we've now seen a chosen prefix collision against SHA-1.

Tim Callan

And again, for the listeners - - chosen prefix what's the significance of that?

Jason Soroko

The chosen prefix concept really is if you think about, you know, a classical attack against something - - a PDF document - - that PDF document, you can copy directly it's an identity. In other words, I can create an identical copy with just a few changes perhaps in the wording and create a collision. With something like a certificate, there's a lot more to it, where there's a structure to the document an x.509 certificate structure, right? You have the serial number, the validity period, x.509 extension, the signature, etc. So, the x.509 extensions and the signature are probably the only parts of the document that are going to be copied directly over to a rogue certificate, right?

Tim Callan

Right.

Jason Soroko

Things like the serial number, validity period, those are things that at the publishing of the actual x.509 document, the certificate, you'd actually have to predict them and then what ends up being the rogue value is something like perhaps the domain name, right? If I wanted to, you know, take a valid certificate that was issued against a specific domain and then perhaps create a rogue certificate that was valid against a rogue domain, if you want to call it that.

Tim Callan

Sure.

Jason Soroko

So therefore, in the public key, that's where I'm going to then try to compute the bits that are needed for the collision. So, that's a lot I have to do. There's only a, let's say, a minority of the document that can be directly copied over compared to a PDF where the majority is copied over.

Tim Callan

Right and that's why a classic collision was broken first, if broken is, the right word was delivered first was because it's just a considerably simpler problem.

Jason Soroko

That's exactly right. It's a much simpler problem. So, think about it's not just now about computing the collision bits it's also about doing this prediction and, you know, all the other things that are needing to be done to take this document, which is supposed to be very, very difficult. When we when we use the term improbable, like we really mean it, improbable.

Tim Callan

Way improbable, yeah.

Jason Soroko

Yeah.

Tim Callan

Like you do the math on this stuff and it's sort of like, you know, the sun should go supernova before you find the right key. Right? It's that kind of improbable.

Jason Soroko

Yeah, and so that's why I think a lot of people still to this day are like, yeah, yeah, SHA-1. People have done collisions, but I'm still going to use it for my PKI, right? I'm still going to use it for my certificate-based hashing algorithm. But here's the problem, Tim. January of this year, 2020 was the Shambles attack and check it out Shambles chosen prefix collision, if you put that into your search engine of choice, you're going to come up with a lot of great information and they sped up the original SHA-1 attack by a factor of 10, Tim.

Tim Callan

Okay.

Jason Soroko

And they now have it to the point where all attacks that are practical against MD5, which is another, you know, another algorithm.

Tim Callan

Yeah. That's a deprecated algorithm.

Jason Soroko

So therefore, the full impact to this, Tim, is that x. 509 certificate, PGP keys, all these things are now at full risk if the underlying hashing algorithm is SHA-1, and now, I'm going to go back to a question you asked me earlier - how much computing does it take?

Tim Callan

Right. How much compute does this require? I think was the question I asked more or less.

Jason Soroko

Exactly right. Well, the cost of this attack was under $50,000. But if there was an identical prefix attack the authors surmised that they could do it for about $11,000. And by 2025, just using a bit of Moore's Law, they said that they could get it below $10,000.

Tim Callan

Right. So yeah, perhaps there are some secrets that aren’t worth $10,000 to the bad guy but we are routinely encrypting or using certificates for things that are worth much, much more than that to our companies, in terms of the value of our secrets, the cost of a breach, the cost to us of a disruption, etc.

Jason Soroko

Yes. And, Tim, I think there's one more point here to be made, which is, we've seen so many a-ha moments now, in this field, that, you know, I would be afraid to say that that's all it would take because as people's math gets better, as attack algorithms get better, this, this is going to get less and less and I'll tell you, I mean, I kind of keep my finger on the pulse of what's going on in the cloud world and my ability to rent computing powers is going up daily. And so therefore, you know, this just gets - - I would say if you're using SHA-1 and you feel confident, it must be because and this is the only conclusion I could come up with its rational, it must be because the thing you're protecting isn't really worth much.

Tim Callan

Right. Exactly. One could contend that if the thing I'm protecting isn't really worth getting, you know, it's maybe not completely without value, but it's, you know, a criminal might get a few $100 for it then they're not going to bother and I'm probably alright. But short of that this is not an algorithm that really you don't want to be using this.

Jason Soroko

Anybody who ever used the cybersecurity defense of, oh, they'll never come after me, I'm not interesting, has found out really quickly that that’s a really bad method of security. So, Tim, I think last point here, and because anybody listening to this podcast who’s really sharp might point this out to us.

Tim Callan

Okay.

Jason Soroko

I think, now that we know this, and now that we know that even SHA-2 as we described earlier, the algorithm for collision resistance isn't, you know, massively, somehow intrinsically better than SHA-1 it just has a larger digest with some extra tricks up its sleeve, I think for x.509 certificates - - I think it's now become - - it's very important - - this is already known in the browser world but randomizing your serial numbers is a really, really, really important idea now.

Tim Callan

Yeah, and we talked about this in an earlier podcast. We talked about the 63 bits, 64-bit serial number problem. And just as a quick reminder for the listeners, or in case you didn't listen to that podcast, about the summer of last year, maybe late spring, it was discovered that through a rather esoteric flaw in a popular tool that many public CAs had been issuing certs where one of the bits was entirely predictable. It was I think it was always zero, maybe it was always one, but there was a bit in there that was always the same and you knew what it was and what that did is that essentially reduce the actual entropy of the serial number into half of what it had been because one of the bits was predictable. And that took it from 64 bits to 63 bits and the requirement, according to the CA/Browser Forum baseline requirements was 64 bits and so all of these certificates were out of compliance. And the reason any of that matters, is what you were just talking about right here. Is about collision attacks.

Jason Soroko

It's funny how it comes full circle, Tim, all the time.

Tim Callan

Yeah.

Jason Soroko

It does. And I just want to say Tim 2011 was when NIST officially declared SHA-1 deprecated. 2017 is when the - - was at the CA/Browser Forum, I believe that or the browsers themselves that took it on themselves to officially distrust SSL certificates that were using SHA-1, even though we still see of them today.

Tim Callan

Right. Yep.

Jason Soroko

And then a big move by Microsoft was 2019, when they said they would actually stop supporting legacy operating - - Windows operating systems that were using SHA-1. All updates they would give would only be related to SHA-2 based code.

Tim Callan

Gotcha. So, if I'm choosing SHA-1, let's say on my Microsoft CA, does that mean that I am using a version of Windows that is no longer supported?

Jason Soroko

If we - - I wish we had somebody from Microsoft on with us but I’m going to guess that what it means is that code related to those particular - - that particular code set is - - will not be supported any longer. The rest of the operating system probably will be even though we do know of course, things like you know XP and 7 have limited or no support except for its, you know, it's extreme circumstances. But, but no, your Microsoft CA code that is, you know, hashing with SHA-1 that will probably not be in live support from Microsoft as of right now - - as of 2019.

Tim Callan

So that's scary because what if there's another security flaw?

Jason Soroko

And you we've already seen it, Tim. Like we've already podcasted on Microsoft cryptographic algorithm code set flaws. I don't want to say they're coming fast and furious, but we do see them.

Tim Callan

Yeah, like just earlier this year, the Windows CryptoAPI spoofing vulnerability was just giant, massive news and Microsoft issued patches for all affected supported operating systems. Meaning if you had an unsupported operating system, you didn't necessarily get a patch.

Jason Soroko

It's time to move off with SHA-1. I think, you know, it's not just the theoretical realm. We've now seen classical collisions that are totally doable, and now we've seen the scariest kind which is true chosen prefix attack, which is what I think some people might have been waiting for, like once that shoe dropped that was it. Well, the shoe dropped folks. That’s it.

Tim Callan

All right. So that's a very thorough and compelling case that SHA-1 is not secure. If you're using SHA-1, whatever you're protecting are encrypting with SHA-1 is not protected and that this is the time to bite the bullet and do that thing that you knew you're gonna have to do eventually.

Jason Soroko

Exactly, Tim, I think the benefit to it, though, is that moving to a new CA, perhaps - - moving to a new infrastructure, it's never easy but thankfully, the implementations of these things, you know, the implementation you did 15-20 years ago, that was probably painful. I can tell you things have changed.

Tim Callan

It’s a lot less painful now.

And that's a valid point as well. And there's crypto agility, and all kinds of other things that are out of scope for this discussion. Jay, very important message. I'm glad we recorded this. Hopefully, people who are listening to this are thinking hard about whether or not they're using SHA-1 and finding out if you're using SHA-1. You might not know it. And those are very valuable things to do because you'd hate to find out the hard way.

Jason Soroko

Thanks, Tim. I was wondering if we ever really had to have this podcast, but you know, in talking to customers and talking to our product management group, we are seeing people using this, using SHA-1 and that's why we decided to do this podcast.

Tim Callan

Yeah, and I think it was a good decision and I think the timing is right on this. So, thank you very much, Jay. As always, very insightful, very informed and very informative and this has been Root Causes.