State Machine Replication in Facebook's Libra Blockchain

This paper presents LibraBFT, the state machine replication system designed for Facebook's Libra Blockchain. LibraBFT is based on HotStuff, which I had summarized recently. LibraBFT adds modules, such as a pacemaker, to adopt and operationalize the HotStuff protocol within a real-world blockchain system.

The plan with the Libra project is that, initially, the participating validators will be permitted into the consensus network by "Founding Members". Later on, membership eligibility will gradually become open/permissionless while preserving decentralization with careful governance. (To facilitate dynamic membership, Libra BFT can reconfigure itself by embedding configuration-change commands in the sequence.)

In my summary below I mostly use prose lifted from the LibraBFT whitepaper. It is a 41 page paper, so I focus only on the important parts to provide the summary. After the summary, I add a brief discussion about how LibraBFT compares with other blockchain consensus protocols and future directions for research in the Libra project.

Features

  • Safety: LibraBFT maintains consistency among honest validators even if up to one-third of the validators are corrupt/byzantine.
  • Asynchrony: Consistency is guaranteed even in cases of network asynchrony.
  • Finality: LibraBFT supports a notion of finality (an explicit commit certificate) for transactions. 
  • Linearity and Responsiveness: Linearity guarantees that driving transaction commits incurs only linear communication (this is optimal) even when leaders rotate; responsiveness means that the leader has no built-in delay steps and advances as soon as it collects responses from validators.
  • Sustainability: In contrast to the computationally expensive Proof-of-Work (PoW), LibraBFT is designed as a Proof-of-Stake (PoS) system, where participation privileges are granted to known members based on their financial involvement.

Leaders, Votes, Quorum Certificates

LibraBFT belongs to the family of leader-based consensus protocols. In leader-based protocols, validators make progress in rounds, where each round has a designated leader.

Leaders are responsible for proposing new blocks and obtaining signed votes from the validators on their proposals. During a round, the leader proposes a block that extends the longest chain it knows. If the proposal is valid and timely, each honest node will sign it and send a vote back to the leader.

After the leader has received enough votes to reach a quorum, it aggregates the votes into a Quorum Certificate (QC) that extends the chain. The QC is broadcast to every node. If the leader fails to assemble a QC, participants will timeout and move to the next round.

Eventually, enough blocks and QCs will extend the chain in a timely manner, and a block will match the commit rule of the protocol. When this happens, the chain of uncommitted blocks up to the matching block become committed.

Rounds and Phases

Each round has three-phases. The first and second phases of a round are similar to PBFT, but the result of the second phase is a certified certificate, or a QC-of-QC, rather than a commit decision. A commit decision is reached upon getting a quorum of votes on the QC-of-QC (a QC-of-QC-of-QC). An honest leader can prove the safety of a proposal by referencing a single QC (from the highest round).

Chaining 

LBFT uses a chaining approach, where the three phases for commitment are spread across rounds. More specifically, every phase is carried in a round and contains a new proposal. The leader of round k drives only a single phase of certification of its proposal. In the next round, k+1, a leader again drives a single phase of certification: it sends its own k+1 proposal, but it also piggybacks the QC for the k proposal. In this way, certifying at round k+1 generates a QC for k+1, and a QC-of-QC for k. In the third round, k+2, the k proposal can become committed, the (k+1) proposal can obtain a QC-of-QC, and the (k+2) can obtain a QC. (See my summary of HotStuff for a more detailed explanation.)

Commit-rule

Rounds must be strictly increasing along a chain: $round(B_i) < round(B_{i+1})$. When rounds increase exactly by one, that is $round(B_i) + 1 = round(B_{i+1})$, we say that the chain has contiguous rounds. The commit logic is simple: It requires a 3-chain with contiguous round numbers whose last descendent has been certified.

(Safety) New commits always extend a chain containing all the previous commits.
(Liveness) If the network is synchronous for a sufficiently long time, eventually a new commit is produced.

Occasionally, LibraBFT may contain chains that have gaps in round numbers. For example, a dishonest leader may propose an invalid block, or a leader may fail to gather a quorum of votes in a timely manner because of network issues. LBFT guarantees that only one fork becomes committed through a simple voting rule that consists of two ingredients:  First, validators vote in strictly increasing rounds. Second, whenever validators receive a block, they maintain a preferred round, defined as the highest known grandparent round. The rule is that validators vote for a block if its parent round is at least the preferred round. In Figure 2, validators that contributed to the formation of a QC for round k+2 remember k as their preferred round.


Round synchronization

LibraBFT assumes a partial synchrony model: after some unknown global stabilization time (GST), the network delivers all messages between honest nodes under some (unknown) time delay $\delta_M > 0$.

The advancement of rounds is governed by a module called Pacemaker, which keeps track of votes and of time. In a happy path, the Pacemaker module at each validator observes progress,  a leader proposal for the current round, and advances to the next round. In a recovery path, the Pacemaker observes lack of progress in a round. Upon a local round timeout, the Pacemaker broadcasts a TimeoutMsg notification.

Comparison with other protocols

LibraBFT is based on PBFT, so it is in the same leader-based protocols family with Tendermint and Casper. I discussed this a little in the previous post on HotStuff.

Nakamoto consensus is also a leader based protocol, but the leader election is silent, and is determined by proof-of-work. Also in Nakamoto consensus there is no explicit commit/finality. Here is a comparison of Nakamoto and Paxos protocols from an earlier post.

Threshold Logical Clocks and QSC provides a leaderless consensus protocol for blockchains. Finality/commit is not explicit in that protocol either.

Avalanche (a descendant of Texel consensus) also provides a leaderless consensus protocol for blockchains. It is based on polling/sampling and metastability. Again, finality/commit is not explicit.

Future work: Leader election

The paper does not provide much detail about how the leaders are designated per round in LibraBFT. It looks like the leader election strategy in the Libra source-code is round-robin. However, the paper notes that this strategy makes round leaders predictable, and hence facilitates denial-of-service attacks. To address this problem, the paper proposes to work on a verifiable random function (VRF) to randomize leaders using some source of randomness that cannot be predicted in advance.

The paper also mentions that, going forward, the Libra team will investigate new proposer election strategies.
There are several ways to enhance the proposer election mechanism in order to avoid performance hick-ups when a bad leader is elected, or worse, when a succession of bad leaders is elected. 
An alternate leader strategy elects an ordered pair of leaders per round. The lower leader delays a certain duration in order to yield to the higher leader. This approach has the benefit of unlocking rounds with a crashed leader quickly, but the risk of creating contention among the leaders. The success of the approach largely hinges on the ability to stagger leaders effectively. 
A different approach optimizes for a stable leader, and provides fairness and load balancing via rotating input generators. In each round, there are two distinct roles, a leader and an input generators. The leader is kept stable so long as progress and good performance are observed. The input generators are designated among the validators based on past performance and availability. In each round, the stable leader has the authority to promote or dismiss an input generator, but if it dismissed generators too frequently, the leader itself will get demoted eventually.
In any case, it looks like this will be an issue that will require more research in LibraBFT.

Future work: Scaling to a large number of nodes

There is no evaluation result in the paper about scalability. In a previous paper, the HotStuff protocol was evaluated on up to 128 nodes. The incast storm at a leader node constitutes a bottleneck for the system, and it would not be possible to scale the system to large number of nodes if every node reports back to a leader.

Despite many bad things associated with Nakamoto consensus, it avoids the incast storm problem and is able to scale to very large number of nodes. Moreover, sampling-based/metastability approaches as in Avalanche also avoids the incast storm problem and has a chance to scale better.

But I am still hopeful about the leader-based consensus approach. It is simple, assured, and gives a quick proof-of-finality. There could be ways to make the leader-based approach scale. For example, one way to resolve the incast storm problem in LibraBFT could be to use incast-aggregator nodes before a leader. Another way to alleviate the problem could be to use a federations-based consensus model as in Stellar.

Comments

Popular posts from this blog

Hints for Distributed Systems Design

Learning about distributed systems: where to start?

Making database systems usable

Looming Liability Machines (LLMs)

Advice to the young

Foundational distributed systems papers

Distributed Transactions at Scale in Amazon DynamoDB

Linearizability: A Correctness Condition for Concurrent Objects

Understanding the Performance Implications of Storage-Disaggregated Databases

Designing Data Intensive Applications (DDIA) Book