Podcast
Root Causes 348: What Is a Merkle Tree?


Hosted by
Tim Callan
Chief Compliance Officer
Jason Soroko
Fellow
Original broadcast date
December 15, 2023
One foundational element of modern cryptographic systems is the Merkle tree. Merkle tree is an enabler of blockchain and CT logs, among other things. We explain this data structure, its properties, and its use cases.
Podcast Transcript
Lightly edited for flow and brevity.
It’s not used for the same thing. It is essentially a one-way tree. It is a one-way data structure. You only really ever add to it. So, that’s true. Same thing for blockchains, etc., but it’s used for kind of different purposes and it’s ideal when you have a large, large - what will eventually become a large dataset. So, a Merkle Tree needs to have this property - - if you are trying to solve the problem of a very large dataset that will grow through time, large and complex, and you need for every data item stored in the data structure, if you need to ensure that those blocks of data within the data structure have to be guaranteed to be received undamaged, unaltered and to also make sure that people who are adding – when I say people I mean systems – that are adding data to the Merkle Tree cannot be lying and sending fake blocks of information into the data structure and so how do achieve that? And the way to achieve that, Tim, is we gotta go back to a good ‘ol cryptographic primitive – hashing.
Hashing is really at the heart of the Merkle Tree. And, in fact, a Merkle Tree is in a data structure sense a hash tree. Right? And what the means is that every single block of data within this data structure has been hashed with something and what that hash is depends on what kind of a data block we are talking about here. Because when you are talking about a tree structure, there’s obviously a root to the data structure. There’s an initial block and then every single other block after that will typically have another two blocks assigned to it as children and then on and on and on like that. And so, this is essentially a way for the labeling of every single one of these blocks of information. There’s an associated label which is a hash of the actual block itself and those blocks are basically added upon. Right? So, in other words, every single hash after that is related to the parent within the tree.
And that’s why you call it a hash tree is because essentially you are hashing through a set of parents and children all the way up to a root.
But, let’s just talk about the use cases for a moment because I think those are what are most interesting.
One of the other important aspects of a Merkle Tree is it can be made public. Because it is unalterable, it can be made public so other people can read it and the ability to glean information off of that non-stop stream of truth can get you a lot of good things such as, hey, were any domains that I own issued with a certificate in the past year or two? Well, you can look up the Merkle Tree CT log and see whether that was true or not. And if something does pop up that you don’t expect, you can raise the alarm bells and do what you need to do to help protect yourself. Whether it’s revoking a certificate or just investigate what the heck is going on.
I would say, Tim, that I would imagine that for other PKI issuance topics in the future, I don’t think we’ve seen the end of the usage of Merkle Trees.
If you are going to put a trusted stream of data and I’m talking about a growing data structure, a data structure that is going to grow through time and that needs to be publicly available and trusted amongst peers, this kind of data structure just is kind of ideal.

