Paper review. Threshold Logical Clocks for Asynchronous Distributed Coordination and Consensus

This is a recent arxiv paper by Bryan Ford, EPFL. The figures I use are from Bryan's presentation.

The paper introduces a threshold logical clock (TLC) abstraction and uses it to implement decentralized asynchronous consensus on top. In contrast to Ben-Or which implements decentralized asynchronous binary consensus, TLC based Que-Sera-Consensus (QSC) achieves consensus for arbitrary values proposed.

After I summarize the paper, I will compare/contrast QSC with Paxos, Texel/Avalanche, and Ben-Or.

Threshold Logical Clocks

TLC ensures that a number of nodes progress through logical time in a lock-step fashion. On reaching logical time-step s, each node waits for a threshold tm of broadcasts received from s before it can proceed to step s+1.

Different nodes may see different subsets of the tm messages. The adversarial network schedule ultimately determines this, but can we at least measure a-posteriori the success/failure of a given message's propagation to other nodes? For this purpose, TLC supports witnessing.

Each time-step occurs in two logical phases. Each node broadcasts unwitnessed message and collects responses from at least t witnesses. Each node re-broadcasts witnessed message and collects at least t witnessed messages

A particular protocol instance TLC(tm, tw, n) is parameterized by message threshold tm, witness threshold tw, and number of nodes n. To get from step s to s+1, each node must collect not just tm messages but tm threshold witnessed messages from step s. Each threshold message must have been witnessed by at least tw participants.

A problem still exists. Different nodes may see different subsets, and different messages may have been witnessed by different subsets of nodes. TLC observes that causal ordered message delivery adds some convergence properties to the protocol and simplifies reasoning about it. One way to ensure causal message propagation is to piggyback and gossip every message send so far. Each node i simply includes in every message it sends a record of i's entire causal history. This makes time advancement events propagate virally.

When configured with majority thresholds, TLC offers a useful property. Every witnessed message m broadcast in step s is seen by all n nodes by step s+2:
  • By majority nodes by s+1 by definition of witnessing 
  • Each node collects majority step s+1 msgs by s+2 
  • Since any two majorities intersect, there is at least a 1-node overlap at s+1

Que Sera Consensus (QSC)

QSC provides an asynchronous consensus built on top of TLC. Each round takes three TLC logical time-steps, in which
  • Node broadcasts proposal w/ random priority p 
  • Waits and observes for three TLC time-steps 
  • Decides if any proposal is undisputable winner
This description is of course too high-level. Since there is a lot of subtlety involved due to network asynchrony, the QSC protocol is reached by way of explaining several strawman protocols leading to it.

Here is the baseline and the first strawman (genetic fitness lottery) towards deriving QSC.

Strawman 3 makes the nodes pick celebrity proposals, which a majority of nodes have heard of by the next time-step. This rule still leaves uncertainty, however, since different participants might have seen different subsets of confirmed proposals from step s, and not all of them might have seen the eligible proposal with the globally winning ticket.

In order to seek a universal celebrity, Strawman 5 says we should watch the paparazzi.

The paparazzi condition guarantees that, everyone knows the proposal's existence by s+2, and everyone knows of its celebrity by s+3. But not everyone might have seen paparazzi message!

This brings us to Que Sera, whatever happens happens, Consensus (QSC).  Nodes prefer highest-priority celebrity proposal p and build on it in proposals for future rounds, but don't assume everyone agrees on p! This says that only some consensus rounds may succeed, and that only some nodes may even realize that a round succeeded. And it also guarantees that when this holds, all nodes will build on that value and eventually decide.

Nodes conservatively determine agreement when they are unaware of existence of higher-priority proposal and aware of at least one paparazzi node for p.

Each node i will observe successful consensus with a probability of at least 1/2 in each round, independently of other rounds. Thus, the probability i has not yet finalized a unique proposal for round r by a later round r+k is at most $1/2^k$. What I really like is that by tying the consensus with a blockchain structure/formation, the delay in committing is made somewhat compensated/pipelined. When a commit is finally achieved, any round that i sees as successful will permanently commit both proposal p and any prior uncommitted blocks that p built on in the blockchain structure.

QSC implementations are available in Promela/Spin & Go.

How does this compare with other consensus algorithms?

I am a distributed systems guy. I will be looking at this from a distributed algorithms perspective. It looks like Bryan is coming from security background, and he may be trying to emphasize other properties (like network adversary tolerance) of the protocol. So I may be missing some security related properties of the protocol.

Before I go on to compare and at some points criticize the algorithm, I want to mention that overall this is a nice algorithm. I want to model this in TLA+ when I find some time so that I will be able to get a better understanding of what is going on in TLC and QSC.

The paper claims that QSC is simple, and even simpler than Paxos, but I disagree with that strongly. It took 9 subsections with many strawman algorithms to describe the QSC algorithm in Section 4.10 after TLC is described in Section 2. That said, the protocol is nice and interesting, and it has some advantages not present in other decentralized consensus algorithms like Texel and Ben-Or.

QSC is not binary consensus. This makes it more powerful than Texel and Ben-Or. However, if the domain is blockchain, a binary consensus works with no problems because you are deciding whether to include this proposed transaction in the chain or not. If only one value (one transaction) is proposed for a given UTXO, a binary consensus algorithm can guarantee termination for it. Well-formed clients propose only a single transaction for a given UTXO because otherwise it would be a double-spend attempt, and then the binary consensus does not need to guarantee termination for this instance.

TLC enforces a single frame of reference for the rounds as it aligns rounds across a threshold of participants. In this sense, TLC is reminiscent of the Ben-Or algorithm which also aligns rounds across threshold number of participants. In Ben-Or, N-F participants proceed in lock-step for the two phases of each round. In Ben-Or the termination is again probabilistic with probability 1, as the non termination chance after k rounds reduce to  $1/2^k$ probability. If one node decides at round k, it is guaranteed that all the other nodes will decide in the next round. As we mentioned above, while Ben-Or is restricted to binary consensus, QSC allows arbitrary proposals as input.

It may be possible to argue that the single frame of reference for the rounds, simplifies reasoning. But as a distributed algorithms person, I think this is just unnecessary extra synchronization and not a good idea. In Paxos, the rounds not aligned across participants. Paxos uses concurrent/separate frames of reference for rounds with no problem at all. This is achieved by Paxos's use of totally ordered ballot numbers to make rounds client-restricted. This allows multiple frame of reference rounds to exist concurrently without violating the safety/agreement property.

This brings me to do a more in depth comparison with Paxos, and a speculation.

QSC vs Paxos (and SDPaxos?)

QSC is still randomized consensus. So is Paxos in an asynchronous environment. In an asynchronous environment, dueling leaders problem may be probabilistically avoided (using probabilistically backing off). However, if there is some little partial synchrony which allows the failure detectors can stabilize to $\diamond S$, stable leaders can emerge, and progress will be guaranteed. In comparison, progress in QSC will always be probabilistic.

By using a leader based solution Paxos gets other benefits as well. Paxos uses much less communication to get the consensus done. Paxos communication is 1-to-all-to-1  (with 1 being the leader). In contrast communication in QSC is all-to-all-to-all across the three steps in the rounds.

Here is the speculation part. In effect a leader also emerges in QSC. It is the higher-priority celebrity proposal. So QSC is closer to leader-based Paxos protocols rather than to the leaderless Ben-Or and Texel protocols. What is more the random priority assignment and the use of paparazzi seem to implement a dynamic version of the sequencer node in the SDPaxos protocol. Let me explain.

Let's recall the decision procedure in QSC. A node decides as consensus on a proposal p, when it is (1) unaware of existence of higher-priority proposal than p, and (2) aware of at least one paparazzi node for p.

SDPaxos separates the leader into two roles of ordering/value-choosing done by the sequencer and the replication-checking done by any node. And it seems like QSC divides the sequencer role in to two, via the celebrity and paparazzi roles. QSC confines everything in single-frame of reference rounds by its use of TLC. On the other hand, SDPaxos allows multiple rounds occurring concurrently, via ballotnumber use for sequencer.

Extensions for scalability

QSC uses a lot of communication (all-to-all steps) and wouldn't scale. Finding alternate ways for scaling QSC would be beneficial.

Avalanche has extended Texel via sampling to large scale decentralized consensus.

Is it possible to come up with a sampling based extension to QSC? Texel had two advantages going for it. It is pull-based solution, and it only considers binary consensus. In contrast QSC is push-based, considers arbitrary consensus, and uses a single-reference frame for rounds. These make it hard to apply sampling to QSC.

It would be interesting to think about scalability extensions for QSC. Of course committee-selection based approaches should be applicable at least.


Popular posts from this blog

I have seen things

SOSP19 File Systems Unfit as Distributed Storage Backends: Lessons from 10 Years of Ceph Evolution

PigPaxos: Devouring the communication bottlenecks in distributed consensus

Frugal computing

Learning about distributed systems: where to start?

Fine-Grained Replicated State Machines for a Cluster Storage System

My Distributed Systems Seminar's reading list for Spring 2020

Cross-chain Deals and Adversarial Commerce

Book review. Tiny Habits (2020)

Zoom Distributed Systems Reading Group