Podcast
Root Causes 25: Entropy and Random Numbers


Hosted by
Tim Callan
Chief Compliance Officer
Jason Soroko
Fellow
Original broadcast date
July 2, 2019
One cornerstone of successful cryptography is entropy, or the ability to create genuinely unpredictable values. But it turns out that generating truly random numbers is harder than you might think. Join our hosts as they discuss the need for randomness, the lengths companies go to to generate random numbers, and the bad things that can happen when they fail.
Podcast Transcript
Lightly edited for flow and brevity.
Let’s say your numbers are between 1 and 50 and for four or five or six straight weeks your numbers are skewed towards 50 more than 0. Wow. That gives an enormous advantage to the people playing your lottery.
And so therefore you need to take advantage of randomness. Typically that’s done just by having, you know, ping pong balls that are rolling around seemingly in a completely random way. The problem is of course if the balls aren’t completely weighted identically, that brings in bias and this bias is the opposite of randomness.
In terms of crypto, I think what you would do is you would pick the numbers or configurations that are more likely to be right and you would reduce the necessary processing time to brute force attack a key.
And if you happen to know that a certain class or a certain segment of those permutations are not necessary to scan, you’ve reduced the safety factor of your encryption in a massive way. Even though you haven’t given away the answer, the ability to brute force that encryption rises exponentially.
His job back at Bell Labs was to basically deal with signal to noise ratio issues at Bell. You’re trying to put a signal modulating information across a wire and there’s a noise floor below which you can’t read that information any more. He was an expert in really maximizing the amount of information going across a wire and pulling it out of the noise.
His work is the basis of a lot of very important concepts in computing. We wouldn’t have the computers we have today running the way they do. We wouldn’t have the PKI industry probably. All these important concepts of randomness that we just talked about and how to measure entropy, how to measure randomness, Claude’s work is the basis of a lot of it. So I want to reference that. And anybody who’s interested in that really should check it out.
The precise meaning of information for Claude Shannon: Here’s an analogy to help explain it. You and I right now, Tim, we’re talking in the English language and we’re conveying information to each other and to the listeners.
English came out very high, and there’s a reason for that. Take a look at say French or Russian, of which I have some familiarity. (Not a lot.) But French and Russian have the concept of genders for nouns. Genders for nouns, that’s a cultural aspect. It’s not an informational aspect. In other words, if you use a verb tense for a masculine or feminine noun, does that convey information? It actually doesn’t.
So therefore it’s an extremely highly informational concept. Now something that would be a low information would be say something like just a drumbeat. Can you convey a sentence to me just with a simple downbeat, upbeat, downbeat, upbeat? There’s only so much you can say in that. Morse code on the other hand has information.
One of the highest forms of entropy, think about something like white noise, and this is where we’ll circle back to bias. White noise is kind of perfect in a way because if you start looking into the white noise, it’s very, very difficult to predict what any of the given elements of white noise are going to be. If it’s just a time series, the amount of sound there is or where it happens to be in frequency is very, very difficult to predict at any given moment. However, if it’s like audio engineers play with this stuff to create pink noise and concepts like that, that’s when bias is introduced into the white noise.
So I think to get back to computing now, Tim we’ve seen examples in the Linux world and other places where it wasn’t truly random. It was pseudo random, and there was some kind of bias that was conceptually sort of the same as a pink noise, a bias within the random number generator in the operating system.
A lot of people might not realize but Linux for example uses just movements of the mouse. Other forms of input that are happening, it’ll use those seemingly random or chaotic features to be able to derive randomness.
Where we’re actually finding some of the least amount of randomness and where we’re running into a lot of trouble is in IoT. Some people are trying to generate certificates with some algorithm based off inputs that are going on in a device. The problem is that a lot of these devices that don’t have some sort of dedicated hardware random number generator, where’s the entropy coming from? Where’s the information? Where’s the chaotic randomness coming from?
It turns out it’s not coming from anywhere. It’s an extremely closed off and homogenous environment whereas each device probably was created exactly the same way except for just a serial number, and that’s not enough randomness.
The first thing that comes to mind of course is 2008’s Debian OpenSSL flaw. The OpenSSL function in the Debian flavor of Linux somewhere along the line shipped with a problem where basically the seed value was predictable. As soon as people realized that that seed value was predictable, they immediately calculated, like, the first 100,000 values. You then had a list and so you could probably crack anything that was created with OpenSSL.
If it was done in the first 100,000 thousand values (and by the way it’s most likely to have been done in the first thousand values), then you could just go down this list. So it basically meant all the certificates that where the CSR had been created in OpenSSL had predictable keys. Plus, it was easy to find them because you just looked for certificates that had keys on this list.
And it was big. There were tens of thousands of them.
That sort of thing can happen where people don’t realize. It looks like it’s all working correctly and you think it’s fine and then it turns out that it’s not actually random.
Another example was, also in 2008. Some security researchers managed to do a collision attack on a VeriSign off-brand certificate. It wasn’t VeriSign branded but it was from VeriSign, and part of the way that was possible is they detected a pattern in the serial numbers which allowed them to predict the next serial number. It happened, it was pretty easy pattern. It was increment by one, which was pretty easy to figure out.
Once they detected that pattern, it made it possible for them to predict that next serial number, and therefore no randomness. No entropy. Entropy is collapsed down to a single value. And under circumstances like those were real-world events with real-world consequences because the ability to have genuine unpredictability was absent.
Something in the news very recently Tim, I just kind of bring it up because it’s interesting. I think it was Cloudflare that formed the League of Entropy.
I also like, U Chile has something they call Seismic Girl and Seismic Girl collects randomness from five sources. They are as follows: seismic measurements in Chile (and we know that’s a very seismically actively place), a stream from a local radio station, a selection of Twitter posts, data from the Ethereum blockchain, and their own off-the-shelf RNG card.
Now, we have touched on this topic before and I don’t want to go over old material, but I do want to reference it. In our Episode 9 we talked about the 63- versus 64-bit serial number concern that occurred earlier this year in the world of public CAs, and that comes back to an entropy concern. The problem was that one of those bits turned out to be predictable, which cut the genuine entropy in half. It surprised people and nobody was expecting it, and it got discovered incidentally as people were investigating other things.
That’s a perfect example of what we were talking about. You might think I'm hitting the button and getting what looks like a random number, but it’s not always actually a random number.
It’s just smart math, and the need for absolutely real randomness, the importance of it just gets more and more through time.

