Anatomical similarities and differences between Paxos and blockchain consensus protocols

Now that we read through some blockchain papers in our seminar, I started to develop some understanding of the "blockchain way". Since I already know about the "Paxos way", I thought it would be instructive and fun to compare & contrast Paxos and blockchain consensus protocols. If you know about either of these protocols well, this post can help you get a headstart on learning the other one.

I will be referring to the below Paxos figure for the protocol phases, so let's pin it here.


Leader election: silent vs loud


A couple days ago I tweeted that. That is how leader election is done in blockchain consensus protocol. The first miner to solve the puzzle first becomes the leader for the current round. For blockchain it is par for the course that for each round another leader (the one who solves the next puzzle first) serves. In other words, instead of Phase1a "propose leadership" and Phase1b "promise to follow" back-and-forth communication among the nodes, blockchain uses a proof-of-work puzzle to elect a leader. What if multiple miners solve the puzzle at about the same time? Well, see the below section for that comparison.

For Paxos, the leader election is done via Phase 1. A leader candidate sends a phase 1a message with its ballot number (essentially a counter or a logical clock if you like) and try to get OKs from a majority number of participants in the quorum. Receiving one "Not OK" message is a deal breaker, because it indicates there is another leader candidate with a higher ballot number reaching to the participants.

In Paxos applications, typically we like to continue with the same leader for many rounds, actually as long as we can. So to avoid paying the cost of phase 1 repeatedly, MultiPaxos skips phase 1 and continues with iterating over phase 2 messages for the consecutive rounds. If an incumbent leader arises, the phase 1 leader election may need to happen again. An orthogonal mechanism, such as a failure detection service, is used to ensure that there are not many candidates running to become a leader at a given time, as that may interfere with the progress of the protocol. That situation is called the "dueling leaders" situation. But worry not. Even when there are multiple leader candidates, Paxos is safe as we discuss below.

Multiple leaders case: fork vs no-fork

In blockchain, the difficulty of the puzzle makes sure that the probability of having two leaders at the same time is low. When there are multiple leaders, you will have a fork. The fork is resolved eventually, within a couple more block additions because most likely one fork will grow faster (the chances of both forks growing at the same speed becomes negligible after a couple of blocks) and the participants prefer to mine on the longest chain for having a chance to receive incentive: the mining fee if they become the leader for that round. In other words blockchain provides a eventually-consistent/stabilizing consensus protocol.

In Paxos, a failure detector service (which relies on heartbeats and often has some randomized backoff implementation) often helps with converging the number of leaders/candidates to 1. However, Paxos has the built-in mechanism (with the ballot numbers and majority acknowledgment via Phase 1 and Phase 2) that ensures safety of agreement even when there are multiple active leaders/candidates. If a participant is aware of an alternative candidate/leader with a higher ballot number, it sends a "Not OK" message to the incumbent leader. The incumbent leader waits for receiving an acknowledgment from a majority of participants to commit a value. If it receives a "Not OK" message, it needs to go back to Phase 1 to duel in the leader election again. On the other hand, it is safe for the incumbent leader to commit a value after a majority acknowledgment is received because even when the incumbent leader is dethroned, the newcomer is bound by Paxos protocol rules to repropose and accept the same value as the consensus decision. Therefore, even when there are multiple leaders, you won't have forks in Paxos, because only one of them--if at all-- will be able to complete Phase 2 and commit a value.

Communication in Accept phase: 1-way vs 2-way 

The blockchain protocol employs communication only once and only in one way, from the leader to the participants, and this corresponds to the phase 2a of Paxos: the Accept message Phase.

Since the blockchain consensus protocol skips Phase 2b and Phase 3, it provides unavoidably a probabilistic consensus.(I had talked about why this is the case for leader based distributed consensus such as Paxos in this previous post.) Since blockchain consensus is inherently probabilistic, you need to wait 6 more blocks to see if that block sticks. At that point the work required to rewrite history is so huge that you can conclude the block is finally committed.

For phase 2 in Paxos, the leader has a two-way acknowledged communication. The two-way communication works OK for the leader because the number of the participants is typically less than half a dozen (for the reasons explained below).

On the other hand, if 1000s nodes participated in a Paxos protocol, the leader would not be able to survive receiving 1000s of ack messages for each round. (Well, I guess it could have, but the phases would take several seconds than the typical 10s of milliseconds.) To avoid this problem, with 1000s of participants, the Blockchain has only one way broadcast communication, which is propagated from the leader to the participants via a gradual gossip protocol.

So to recap, and to refer back to the figure, here is what we have. Blockchain does a silent Phase 1 (via proof-of-work) and follows that with only Phase 2a.

The environment: public vs private

The typical use case of Paxos is to replicate state to make it consistently and safely survive crash failures in a private environment. So keeping the number of participants small (around 5 nodes) is OK for this use case.

The blockchain applications bring new constraints and requirements to the consensus problem. In blockchains, the participants can now be Byzantine, motivated by financial gains. So it is not sufficient to limit the consensus participants to be 3-5 nodes, because the rest of the network does not necessarily trust these nodes. In Blockchains, for reasons of attestability and tolerating colluding groups of Byzantine participants, it is preferred to keep the participants at 1000s to 10,000s. Thus the problem becomes: How do you design byzantine tolerant consensus algorithm that scales to 1000s of participants?

Moreover, the consensus problems solved are also slightly different. The Bitcoin-NG paper provided a nice formulation of the blockchain consensus problem in its model section.

Comparing the Nakamoto consensus problem with the classic Byzantine consensus problem is very instructional. Nakamoto consensus has probabilistic guarantees for Termination, Agreement, and Validity, whereas the classic Byzantine Consensus (which a Byzantized version of Paxos solves) has deterministic guarantees for them. (To keep the discussion simple in this post, I didn't get into a byzantine tolerant Paxos protocol, which is more complicated. By utilizing randomization/probabilities, blockchain has a really simple protocol --I have to give it to it.)

MAD questions

(This section is here due to my New Year's resolution.)
1. Well, this entire post started as a result of a MAD question: "What if I try to map Blockchain protocol components to that of Paxos protocol components?" So I chalk that up as my first MAD question :-)

2. Now that we have this mapping, is there a way to leverage on this to synthesize a new insight? What is the next step? Sometimes providing such a mapping can help give someone a novel insight, which can lead to a new protocol.

Comments

Popular posts from this blog

Hints for Distributed Systems Design

Learning about distributed systems: where to start?

Making database systems usable

Looming Liability Machines (LLMs)

Foundational distributed systems papers

Advice to the young

Linearizability: A Correctness Condition for Concurrent Objects

Scalable OLTP in the Cloud: What’s the BIG DEAL?

Understanding the Performance Implications of Storage-Disaggregated Databases

Designing Data Intensive Applications (DDIA) Book