are f-in incredible and here's why. Say you make a protocol that blocks any single transfer that's above 65K (0xFFFF bytes). So nobody can send you anything bigger than 65K, without you being able to verify its hash. Obvious limitation: there are files bigger than 65K. With a hash tree you can accept that limitation and still receive a 4 gigabyte file, with only 0.04% overhead. You could use a hash *chain* (aka a blockchain) but that's inferior for a few reasons...

With a hash tree, I break the file into 65K blocks, and get the hash of each block. Great. Now how to get those hashes and their order to you? Well, I make a list of the hashes in order, but what if that list is bigger than 65K? That's the "tree" part of it. You break that list up into 65K blocks, and make a list of their hashes, drastically fewer. Soon you'll have it down to one <65K block with 1 root hash, that you can send to me, and I'll request the rest.

With a , I break the file into 65,503 byte blocks, and then add 32 empty bytes to the first block. Then I take the hash of that... thing, and add those 32 bytes to the second 65,503 byte block. Then add the hash of that to the third, etc. You can reconstruct the file if I send you the hash of the last thing, and you request the previous thing, then the previous one and so on until you get a hash that's zeroed out.

Blockchains take for f-ing ever to request. I could spam the whole chain at you though, and you'd accept it as fast as you could verify each "thing," one at a time. You can't rebuild the file in-place, because you'd get 32 byte hashes stuck into it, so that's hard to deal with. But the real damning flaw is that you can only verify the chain in reverse sequential order. You can verify none of the previous things, until you receive all the things ahead of them.

So if one of the things gets corrupted, you have to reject all the previous things I'm spamming at you, since some adversary could have replaced them all with garbage data or something. If the network is flaky and drops a thing halfway through, I'll spam 2 gigabytes at you, which you reject, and then you request the new tail of the chain and I have to spam 2 gigabytes at you again, a total of 6 gigabytes, assuming no further issues.

Hash trees do have some overhead: the file, plus the size of the tree. But blockchains also have overhead, just intermingled with the data of the file, plus their blocks must be smaller, so more hashes, more overhead. I calculated for a 4G file, 65K block size, 32B hashes, blockchains have 1,954,112 bytes of overhead, while hash trees have 1,954,176, a difference of 64 bytes, or 0.003% more overhead. Anything smaller has even less extra overhead.

Hash trees are random access. They can be swarmed. Once you have the tree, you can validate any of the blocks in levels below. That's why bittorrent can be swarmed, because it uses hash trees not blockchains. You can get the blocks from anyone in any order and still reconstruct the file, in-place. If a block gets lost or corrupted, you only lose a tiny amount of blocks, and can re-request exactly what you need. Plus hash trees take up orders of magnitude fewer blocks...

Follow

In the 4G example, the hash tree will fully fit in 30 blocks. Those blocks can be given higher priority, more replication, more redundancy. In contrast, blockchains have 1 hash in every single block...thing, so unless you can apply the same replication to all 61,065 blocks, you're at far greater risk of catastrophic data loss.

Sign in to participate in the conversation
Mastodon

The social network of the future: No ads, no corporate surveillance, ethical design, and decentralization! Own your data with Mastodon!