Root Causes 481: What Is Protocol Ossification?
Protocol ossification is the phenomenon whereby ecosystems fail to work correctly with the full range of options included in a protocol. This occurs when individual software components only partially support the capabilities that should be available. We define protocol ossification, explain how and why it occurs, give real world examples, and talk about potential remedies.
- Original Broadcast Date: March 31, 2025
Episode Transcript
Lightly edited for flow and brevity.
-
Tim Callan
We're here at Toronto session, season three.
So Jay, I just recently got back from the PKI Consortium PQC Conference that occurred in January of 2025, and there was a term that I heard used several times by presenters that, first of all, I love and second of all, I thought it would be really nice to bring to the attention of the audience, which is protocol ossification.
-
Jason Soroko
Whenever I hear that term ossification used with respect to computer systems, it's typically because there's things, dependencies and things built on and assumptions that are made about the way a thing has to work, which then causes it not to be able to be flexible.
-
Tim Callan
So ossification, in a general sense, is taking something that should be pliable and flexible and basically turning it to stone, making it rigid and locked in. In the case of protocol ossification, what these folks are talking about is a protocol will have a certain range of allowed use cases or motion, and what can happen is, though the protocol allows that certain range, you can find that there are portions of it that if you actually try to use it that way, you break things.
So think about it this way. Let's suppose that I've got an interoperable system, and there are seven ways you're allowed to do the following thing, but let's suppose that 99% of the people out there are using ways one through three. It is very possible for someone to write software that actually can't use two through seven correctly, and they could write that software because they're not aware of it, or because they just don't care. Maybe they're trying to save some space, and so if you look at this protocol and you say, aha, number five, this is exactly what I want. We're gonna go do that. Just to give people a little flexibility, we'll let them choose six or seven also. You could write to something that fits the protocol, but when you try to implement it in the real world, things break. That is protocol ossification.
-
Tim Callan
That's great, Tim. How about give us a real world example.
-
Jason Soroko
So it has come up in the context of PQC, because PQC, when you look at protocols, there are certain things that are laid out, and it's things like timeout periods. Someone looks at the protocol and says, oh, I can fit this all in the timeout period. We're fine. Yes, it's bigger than it was before, but it's still inside of the timeout period. Great. All is great. People up, it's great. They move forward, and then they put it in the real world, and they find out that stuff is timing out, and connections are being lost, and we can't get our data packets through and it turns out because we're running into timeout limits. You can say, well, those are the wrong timeout limits, dummy. It's not what the protocol says. But that doesn't really matter, because when it's out employed on 1000s of software packages that are being used by hundreds of 1000s of organizations around the world, you standing on top of a hill shouting, you're all dummies doesn't fix the problem. So, this came up, and this came up. So one of the people I know who said it, I was presenting on stage, was somebody from Cloudflare who was talking about this real thing. We took it, we rolled it out, and all kinds of stuff broke, and the basic reason is protocol ossification.
-
Jason Soroko
A loss of flexibility.
-
Tim Callan
This is being uncovered as a very real, very meaningful challenge to real world PQC rollout, because stuff that is supposed to work according to the protocols isn't necessarily working. Goes back to we did an earlier episode where we talked about known unknowns and unknown unknowns. That's an unknown unknown. Then finally you discover it, and it goes into a known unknown. Goes back to an earlier episode we talked to both Bas Westerbaan and Michele Mosca about the importance and the value of getting real world feedback on this stuff, because we can all look at protocols and this will all work. Then you start using it, and suddenly you're finding it isn't working because of things you just weren't aware of, such as protocol ossification. So what do we do about this?
It is a good question. I think part of what we do about this is exactly that. Which is we start using the stuff. In another earlier episode, you and I talked about the Apple step down process, and it's very similar. What it's supposed to do is, because it steps down in phases, it's supposed to discover problems that happen when you go to six month certs or problems that happen when you go to three month certs before everybody just rushes to one month certs. So you get a similar thing here, I think, where we start using these PQC algorithms, or this could be any big change. Doesn't have to be PQC, it's just the one people we're talking about at the conference. You take these PQC algorithms, and you start using them because what we're discovering here, ultimately, is deviations from published standards that are commonplace or commonplace enough that they're problematic. Then, I mean, like to some degree, market forces fix it. If my router won't do what it's supposed to according to the rules, and that thing is now necessary, and my competitor’s router will, then what's happening is my customers will vote with their wallets, and that'll get my attention in an awful hurry, and suddenly I will deossify my use of that protocol.
So, trouble with this stuff is it's it takes time. It's hard to hurry it up. It requires participation of many, many, many, many different parties who don't even necessarily talk to each other, who aren't always even aware of where these protocols are written down. They just do what they've always done. They just do what everybody else does, and it's just a major herding cats kind of exercise.
-
Jason Soroko
Technology is like that.
-
Tim Callan
Especially interoperable technology.
-
Jason Soroko
I didn't think I'd have this thought at the beginning when you first brought up this topic, but now it's clear to me, we've been talking about this for six years now. One of the terms that we, you and I, quite commonly use for this is we say with PQC, it has been time to get your hands dirty for a long time now. We've been saying that for six years. I'm really pleased to say that the company that we work for is in the middle of releasing a PQC sandbox, where you can actually see and smell and touch and hear a postquantum certificate.
That's the first step of getting your hands dirty. Is seeing what it looks like and how it operates, even just the latency of the creation of it. It tells you a lot of information. There's a second thought to this Tim, one of the themes in the objections to moving forward to PQC, one of the objections to moving forward with automation or certificate lifecycle management is, oh, well, there's going to be problems, so we can't do that. We've now had multiple podcasts where we've talked about Apple's intentions, Google's intentions. You’re going to have to move forward and deal with the problems that come up, including things that are known and unknown. We've said this now on multiple podcasts, and imagine, it's like you've just got to go and do the thing in order to discover where the problems are and then know how to deal with them. So many people who sit at the command line for a living are of the opinion, well, there's gonna be problems.
-
Tim Callan
There's an adage in the US Marines, which is that no plan survives the initial encounter with the enemy, and in this case, the enemy is software errors. So we can sit and we can plan and plan and plan and plan and plan and plan and plan some more. That's what we've been doing. But when we get in there and encounter the enemy and we start putting this stuff in the field, that's where we're going to discover the shortcomings of our plans.
-
Jason Soroko
There's gonna be plenty.
-
Tim Callan
But there's just no way to get it otherwise. I think that's a big thing. I think a protocol can be deossified if the need is sufficient. It occurs the way that we just talked about. I think in this case, some of it will be because the need for PQC is not going to stop being there. Sometimes, maybe there's range in a protocol that we don't need and we don't care about, and there's no forcing function to deossify that. Maybe in that case, that's fine. Like, if it's honestly redundant, if it's an appendix, and we'd be better off without it, then maybe we just should not have it. That's okay. But I think when we're seeing real world problems really occurring, like we are with implementations of PQC, that's where we've got to go back and say, okay, listen, this corner you cut, whether you knew it or not, Mr. Software Developer, I know you got away with it for the last 15 years, and good for you. But it's time to put it on the roadmap and get it fixed proper.
-
Jason Soroko
I never thought I'd be saying it again. It is time to get your hands dirty and this is the reasons why. Thanks, Tim.