Paper review. Blockchains from a distributed computing perspective
Our distributed systems seminar met the first time this Monday. I went through the syllabus, explained how we will run the seminar, and introduced the papers we will discuss. In the second hour, to give students an overview of the blockchains technology, I went through this paper: "Blockchains from a distributed computing perspective".
I liked this paper a lot. It is from a respected distributed systems expert, Maurice Herlihy. It is written to be accessible and expository. The paper gives several examples (with increasing sophistication) to explain the blockchain concepts concretely. Finally, the paper is a perfect fit with our seminar's theme of looking at blockchains from a distributed systems perspective. Herlihy states that the paper is "colored by the perspective that much of the blockchain world is a disguised, sometimes distorted, mirror-image of the distributed computing world."
For a data processing perspective on blockchains, see this other paper.
Is this a too simple, too constrained way of implementing the system? The paper says that the log-based system architecture has two compelling advantages:
Indeed this log/ledger based architecture is very popular for modern cloud-native distributed systems. This writeup, by Jay Kreps, tells you how important is the log/ledger abstraction for building distributed systems.
Kafka and BookKeeper are very popular platforms that enables compartmentalizing the concurrency and fault-tolerance in the framework's consensus protocol---realized by ZooKeeper. (As a side note, the private blockchain, Hyperledger v1.0.0-rc1, adopts a no-Byzantine consensus protocol based on Kafka.)
Finally, the Tango paper (SOSP'13) showed a good example of universality of logs: it showed how to build replicated in-memory data structures, and reliable and available distributed transactions and applications, on top of a shared log architecture. I believe the ideas explored in Tango paper can be used for efficiently maintaining multiple streams in the same log so that reading the whole chain is not needed for materializing the updates at the nodes.
This is somewhat like using Kickstarter for the restaurant. (Centralized is not de facto bad. Why would you not trust KickStarter? The market and the laws can straighten Kickstarter up, if it plays crooked.) However, the initial coin offering (ICO) adds more functionality on top of Kickstarter: you can sell/trade fractions of your token. In other words, the ICO creates a whole market around kickstarting.
The paper introduces the Proof-of-Work (PoW) idea using this example. Most miners are probably honest, content to collect their fees, but there is still a threat that even a small number of dishonest miners might collude with one another to cheat Alice’s investors. Alice’s first idea is to have miners, identified by their IP addresses, vote via a Byzantine fault-tolerant consensus algorithm. Alice quickly realizes this is a bad idea. Alice has a nemesis, Sybil, who is skilled in the art of manufacturing fake IP addresses. So Alice employs PoW for the blockchain. Sybil’s talent for impersonation is useless to her if each of her sock puppet miners must buy an expensive, long-shot lottery ticket.
PoW is a form of costly signaling: it is expensive in terms of time wasted and electricity bills. As a famous example, Bitcoin uses PoW for consensus: only a miner which has successfully solved a computationally hard puzzle (finding the right nonce for the block header) can append to the blockchain.
This is a good point for a concrete example. Luckily, this demonstration of PoW-based blockchain is a perfect way of doing that. If you haven't watched/tried this, take 15 minutes now to do it. Someone should give an award to Anders.
PoW has several shortcomings, of course. It is computationally very expensive and extremely wasteful for resources around the globe. Its throughput is miserable and its transactions are slow. It is nondeterministic: "It is still possible that two blocks are appended at the same time, creating a fork in the blockchain. Which block should subsequent miners build on? The usual answer is to build on the block whose chain is longest." So Bitcoin consolidates this by only considering a block as confirmed after it is followed by a number of blocks (typically six blocks).
I am acutely aware that technology is not the cure for everything. But, for the usecases where you can go all-digital, I think smartcontracts will be a great tool. It is an executable contract, so you can avoid notaries, lawyers, courts, and cops, because the failure clauses will be executed automatically. I think the smartcontracts idea is here to stay even when the fully-decentralized PoW-based blockchain approach dies.
On the other hand, the smartcontracts are not devoid of challenges either. The paper talks about the DAO attack, and makes a great point about the importance of concurrency control for smartcontracts: "We have seen that the notion that smart contracts do not need a concurrency model because execution is single-threaded is a dangerous illusion." In a post last week, I had presented a TLA+/PlusCal modeling of the DAO attack.
Now, I am not saying distributed systems have consensus solved at scale. There is no solution that scales to the big number of participants that may be desired as in public blockchains. I wrote about it this earlier: "I love that Blockchains brings new constraints and requirements to the consensus problem. In blockchains, the participants can now be Byzantine, motivated by financial gains. And it is not sufficient to limit the consensus participants to be 3 nodes or 5 nodes, which was enough for tolerating crashes and ensuring persistency of the data. In blockchains, for reasons of attestability and tolerating colluding groups of Byzantine participants, it is preferred to keep the participants at 100s. Thus the problem becomes: How do you design a byzantine tolerant consensus algorithm that scales to 100s or 1000s of participants?"
On the other hand, the research on consensus in distributed systems literature has established a rigorous perimeter around the safety and liveness/progress guarantees that can be provided. The FLP impossibility result shows that it is impossible guarantee progress under a full asynchronous model with a crash failure---even with reliable channels. Paxos approach to solving consensus compartmentalizes the safety and progress properties nicely. Even under a fully asynchronous model (where all reasonable timing expectations break), Paxos preserves safety thanks to its balloting and anchoring system. Paxos also provides progress as soon as the partial synchrony kicks in and weak-complete & eventually-weak-accurate failure detectors are implementable (i.e., when we are out of the realm of the FLP result).
Because of this compartmentalization, we are guaranteed that Paxos's safety guarantees is not vulnerable to a timing attack where malicious parties break all reasonable timing assumptions about the system. This is a hard thing to get right, and several protocols have been shown to be vulnerable to timing attacks because they depended on some perfectly reasonable time synchronization assumptions.
It seems like the Bitcoin protocol makes several reasonable timing assumptions, but when an attacker manages to violate those timing assumptions, the safety of the protocol can be violated as well.
I still have a lot to learn about blockchains and cryptocurrencies, and in particular Bitcoin. Something I don't yet understand is whether it is possible for the ecosystem to erode in a slippery slope, tragedy of the commons fashion, to a majority Byzantine behavior. Say if a slightly different version of the protocol starts cutting some corners, and since that turns out to be advantageous financially more miners start adopting it, and soon that becomes the majority behavior and give rise to a hack (via a trojan) or slaying of the goose that lays the golden eggs.
While I don't know much about the dirty implementation details of Bitcoin, this incident doesn't give me much confidence that Bitcoin is decentralized, immutable, unforgeable, and unhackable.
What about applying model checking to smartcontracts? Could that help reduce the risks?
I liked this paper a lot. It is from a respected distributed systems expert, Maurice Herlihy. It is written to be accessible and expository. The paper gives several examples (with increasing sophistication) to explain the blockchain concepts concretely. Finally, the paper is a perfect fit with our seminar's theme of looking at blockchains from a distributed systems perspective. Herlihy states that the paper is "colored by the perspective that much of the blockchain world is a disguised, sometimes distorted, mirror-image of the distributed computing world."
For a data processing perspective on blockchains, see this other paper.
Simple ledger example
The paper first introduces a simple ledger system using Alice's online news service as an example. To record articles as they arrive, Alice created a ledger that is implemented as a simple linked list. When an article arrives, it is placed in a shared pool, and a set of dedicated threads, called miners, collectively run a repeated protocol, called consensus, to select which item to append to the ledger. And to query for an article, a thread scans the linked-list ledger.Is this a too simple, too constrained way of implementing the system? The paper says that the log-based system architecture has two compelling advantages:
- it is universal; it can implement any type of data structure, no matter how complex
- all questions of concurrency and fault-tolerance are compartmentalized in the consensus protocol
Indeed this log/ledger based architecture is very popular for modern cloud-native distributed systems. This writeup, by Jay Kreps, tells you how important is the log/ledger abstraction for building distributed systems.
Kafka and BookKeeper are very popular platforms that enables compartmentalizing the concurrency and fault-tolerance in the framework's consensus protocol---realized by ZooKeeper. (As a side note, the private blockchain, Hyperledger v1.0.0-rc1, adopts a no-Byzantine consensus protocol based on Kafka.)
Finally, the Tango paper (SOSP'13) showed a good example of universality of logs: it showed how to build replicated in-memory data structures, and reliable and available distributed transactions and applications, on top of a shared log architecture. I believe the ideas explored in Tango paper can be used for efficiently maintaining multiple streams in the same log so that reading the whole chain is not needed for materializing the updates at the nodes.
Blockchain
Going from a centralized log implementation to a fully-decentralized public blockchain implementation needs some motivation. After her online news business, Alice wants to go to restaurant business. Like any trendy social influencer, she does an initial certificate sale (redeemable for meals) for her restaurant for raising capital.This is somewhat like using Kickstarter for the restaurant. (Centralized is not de facto bad. Why would you not trust KickStarter? The market and the laws can straighten Kickstarter up, if it plays crooked.) However, the initial coin offering (ICO) adds more functionality on top of Kickstarter: you can sell/trade fractions of your token. In other words, the ICO creates a whole market around kickstarting.
The paper introduces the Proof-of-Work (PoW) idea using this example. Most miners are probably honest, content to collect their fees, but there is still a threat that even a small number of dishonest miners might collude with one another to cheat Alice’s investors. Alice’s first idea is to have miners, identified by their IP addresses, vote via a Byzantine fault-tolerant consensus algorithm. Alice quickly realizes this is a bad idea. Alice has a nemesis, Sybil, who is skilled in the art of manufacturing fake IP addresses. So Alice employs PoW for the blockchain. Sybil’s talent for impersonation is useless to her if each of her sock puppet miners must buy an expensive, long-shot lottery ticket.
PoW is a form of costly signaling: it is expensive in terms of time wasted and electricity bills. As a famous example, Bitcoin uses PoW for consensus: only a miner which has successfully solved a computationally hard puzzle (finding the right nonce for the block header) can append to the blockchain.
This is a good point for a concrete example. Luckily, this demonstration of PoW-based blockchain is a perfect way of doing that. If you haven't watched/tried this, take 15 minutes now to do it. Someone should give an award to Anders.
PoW has several shortcomings, of course. It is computationally very expensive and extremely wasteful for resources around the globe. Its throughput is miserable and its transactions are slow. It is nondeterministic: "It is still possible that two blocks are appended at the same time, creating a fork in the blockchain. Which block should subsequent miners build on? The usual answer is to build on the block whose chain is longest." So Bitcoin consolidates this by only considering a block as confirmed after it is followed by a number of blocks (typically six blocks).
Smartcontracts
The paper introduces smartcontracts using cross-chain swap as an example. Suppose Alice wants to trade some of her coupons to Bob in return for some bitcoins. Alice’s coupons live on one blockchain, and Bob’s bitcoin live on another, so they need to devise an atomic cross-chain swap protocol to consummate their deal. Naturally, neither one trusts the other.I am acutely aware that technology is not the cure for everything. But, for the usecases where you can go all-digital, I think smartcontracts will be a great tool. It is an executable contract, so you can avoid notaries, lawyers, courts, and cops, because the failure clauses will be executed automatically. I think the smartcontracts idea is here to stay even when the fully-decentralized PoW-based blockchain approach dies.
On the other hand, the smartcontracts are not devoid of challenges either. The paper talks about the DAO attack, and makes a great point about the importance of concurrency control for smartcontracts: "We have seen that the notion that smart contracts do not need a concurrency model because execution is single-threaded is a dangerous illusion." In a post last week, I had presented a TLA+/PlusCal modeling of the DAO attack.
MAD questions
Blockchain PoW Consensus vulnerabilities
The consensus problem, which is also at the heart of the blockchain, has been studied for more than 40 years in distributed systems. The PoW based blockchain takes a radical approach to implement it in an approximated manner.Now, I am not saying distributed systems have consensus solved at scale. There is no solution that scales to the big number of participants that may be desired as in public blockchains. I wrote about it this earlier: "I love that Blockchains brings new constraints and requirements to the consensus problem. In blockchains, the participants can now be Byzantine, motivated by financial gains. And it is not sufficient to limit the consensus participants to be 3 nodes or 5 nodes, which was enough for tolerating crashes and ensuring persistency of the data. In blockchains, for reasons of attestability and tolerating colluding groups of Byzantine participants, it is preferred to keep the participants at 100s. Thus the problem becomes: How do you design a byzantine tolerant consensus algorithm that scales to 100s or 1000s of participants?"
On the other hand, the research on consensus in distributed systems literature has established a rigorous perimeter around the safety and liveness/progress guarantees that can be provided. The FLP impossibility result shows that it is impossible guarantee progress under a full asynchronous model with a crash failure---even with reliable channels. Paxos approach to solving consensus compartmentalizes the safety and progress properties nicely. Even under a fully asynchronous model (where all reasonable timing expectations break), Paxos preserves safety thanks to its balloting and anchoring system. Paxos also provides progress as soon as the partial synchrony kicks in and weak-complete & eventually-weak-accurate failure detectors are implementable (i.e., when we are out of the realm of the FLP result).
Because of this compartmentalization, we are guaranteed that Paxos's safety guarantees is not vulnerable to a timing attack where malicious parties break all reasonable timing assumptions about the system. This is a hard thing to get right, and several protocols have been shown to be vulnerable to timing attacks because they depended on some perfectly reasonable time synchronization assumptions.
It seems like the Bitcoin protocol makes several reasonable timing assumptions, but when an attacker manages to violate those timing assumptions, the safety of the protocol can be violated as well.
How will the Bitcoin bubble end?
I had done a Twitter poll on this a while ago. What other options do you think are plausible? (Of course I am not claiming that this poll means anything.)I still have a lot to learn about blockchains and cryptocurrencies, and in particular Bitcoin. Something I don't yet understand is whether it is possible for the ecosystem to erode in a slippery slope, tragedy of the commons fashion, to a majority Byzantine behavior. Say if a slightly different version of the protocol starts cutting some corners, and since that turns out to be advantageous financially more miners start adopting it, and soon that becomes the majority behavior and give rise to a hack (via a trojan) or slaying of the goose that lays the golden eggs.
While I don't know much about the dirty implementation details of Bitcoin, this incident doesn't give me much confidence that Bitcoin is decentralized, immutable, unforgeable, and unhackable.
What is the best medium for smartcontracts?
Is a scripting language the best medium? It is easy to screw that up as the DAO attack and ERC20 standard examples in the paper show. Maybe using a more declarative approach, and employing formal specifications and invariant-based reasoning to writing contracts could prove to be more resistant to errors.What about applying model checking to smartcontracts? Could that help reduce the risks?
Comments
So a "lockless" implementation follows where the channels put their entries into a shared pool and miners select entries from the pool and do compare-and-swap instructions to put in link-list ledger. But do the channels and miners need to lock the shared pool to insert/delete entries?