Hash pointers and data structures
In this section, we discuss hash pointers and their applications. A hash pointer is a data structure that turns out to be useful in many of the systems that we consider. A hash pointer is simply a pointer to where some information is stored together with a cryptographic hash of the information. Whereas a regular pointer gives you a way to retrieve the information, a hash pointer also allows you to verify that the information hasn’t been changed.
We can use hash pointers to build all kinds of data structures. Intuitively, we can take a familiar data structure that uses pointers, such as a linked list or a binary search tree, and implement it with hash pointers instead of ordinary pointers, as we normally would.
We call this data structure a blockchain. In a regular linked list where you have a series of blocks, each block has data as well as a pointer to the previous block in the list. But in a blockchain, the previous-block pointer will be replaced with a hash pointer. So each block not only tells us where the value of the previous block was, but it also contains a digest of that value, which allows us to verify that the value hasn’t been changed. We store the head of the list, which is just a regular hash-pointer that points to the most recent data block.
To understand why a blockchain achieves this tamper-evident property, let’s ask what happens if an adversary wants to tamper with data in the middle of the chain. Specifically, the adversary’s goal is to do it in such a way that someone who remembers only the hash pointer at the head of the blockchain won’t be able to detect tampering. To achieve this goal, the adversary changes the data of some block k. Since the data has been changed, the hash in block k + 1, which is a hash of the entire block k, is not going to match up. Remember that we are statistically guaranteed that the new hash will not match the altered content, since the hash function is collision-resistant. And so we will detect the inconsistency between the new data in block k and the hash pointer in block k + 1.
The upshot is that if the adversary wants to tamper with data anywhere in this entire chain, to keep the story consistent, she’s going to have to tamper with the hash pointers all the way to the end. And she’s ultimately going to run into a roadblock because she won’t be able to tamper with the head of the list. Thus, by remembering just this single hash pointer, we’ve essentially determined a tamper-evident hash of the entire list. So we can build a blockchain like this containing as many blocks as we want, going back to some special block at the beginning of the list, which we will call the genesis block.
Last word
As before, we remember just one hash pointer: in this case, the one at the root of the tree. We now have the ability to traverse through the hash pointers to any point in the list. This allows us to make sure that the data has not been tampered with because, just as we saw for the blockchain, if an adversary tampers with some data block at the bottom of the tree, his change will cause the hash pointer one level up to not match, and even if he continues to tamper with other blocks farther up the tree, the change will eventually propagate to the top, where he won’t be able to tamper with the hash pointer that we’ve stored. So again, any attempt to tamper with any piece of data will be detected by just remembering the hash pointer at the top.