Modeling an atomic version of Texel: an asynchronous consensus algorithm without rounds

I had written about my preliminary impressions about Texel, an asynchronous consensus algorithm without rounds.

Over the last couple of nights, I have modeled a shared memory version of the algorithm in TLA+, so that I can understand the algorithm better and learn more about the approach. I started with a very rough atomic version of the algorithm, where a node can atomically read the state of all other nodes and update its own state. This is not practical, but it is good for highlighting the essence of the Texel algorithm. In this post, I will talk about this atomic Texel model.

After I got this model down, it was easy for me to refine the atomicity. In the refined model, a process can atomically read from one other process and update its state. That refined model is just one step removed from the message passing Texel algorithm presented in the paper, and demonstrates the tricky issues that arise when multiple nodes are concurrently trying to update their states. In that read-write atomicity model, we see the need for reading states from a consistent-cut, and why some concurrent experiments should be aborted to satisfy that condition. But that read-write atomicity Texel specification is the topic of my next post. Today we just focus on the atomic Texel model.

The atomic Texel model

N stands for number of nodes, and F denotes the number of nodes that can crash. At model checking time, TLA+ toolkit asks you to enter values for N and F. I tried for N=4, and F=0 and F=1. I also tried for N=7, and F=0,1,2. My Texel model satisfies both agreement and progress always for N>=3F+1. Progress is always satisfied, because Choose is deterministic. A node will choose "a" when both "a" and "b" meets SupDecision criteria, which is a supporting value that a node can adopt based on its querying of other nodes. A supporting value is one that is shared by at least F+1 nodes. Note that, for N>=3F+1, it is always possible to find a supporting value for binary consensus, even when up to F nodes fail.

I use f to keep track of actual number of nodes that crash. The model ensures that f=<F. The variable decision tracks of the decision of each node. I hardwire it for N=4 in my run. When I try for N=7, I change this initial assignment.


In my specification, each node has three actions.

Line 23 gives the first action. A node reads the state of other nodes to find a supporting value and adopts it as its own decision value. But the decision is not finalized until, the node sets its finality flag t for the decision to TRUE.

Line 24 starts the second action. If f, the actual number of crashes is still less than F, the allowed number of crashes, then the node can crash by setting its decision to crash permanently.

Line 28 starts the third action. If a node finds that its current decision is shared by at least N-F processes, then that decision is "anchored", and the node can finalize its decision by setting its t=TRUE.



Here are the Agreement and Progress properties I check. Agreement says that if two nodes j and k finalized their decisions, then their decisions cannot differ. Progress says that eventually any non-crashed node will finalize its decision.

For N>=3F+1, both Agreement and Progress are satisfied. Since the atomicity is too rough (a node can read the states of all other nodes atomically), Agreement holds without installing extra mechanisms for reading states from a consistent-cut and aborting other nodes' concurrent experiments, because each experimentation is done atomically and hence in an interleaving manner. Progress holds because the CHOOSE in SupDecision is deterministic, and helps the nodes to converge to one the binary consensus values.

This is a very simple model, but it helped me to come up with the refined read-write atomicity Texel specification quickly, and in that refined model it becomes easy to see what could go wrong when we don't have additional mechanisms in place to enable nodes read from concurrent states.

MAD questions

1. How does Texel compare with Paxos and Ben-Or and how does failure-detectors fit in this picture?

In Paxos, there are rounds, and the rounds are client-restricted. This means that a higher round preempts the lower rounds. A leader leads the rest of the nodes through a round, and this means that the nodes are held hostage by a leader which may be dead. Hence, failure-detectors need to be utilized so that the nodes do not wait forever for a dead leader in an asynchronous model. However, if the failure detectors at nodes are trigger happy, the nodes will suspect whoever is the leader currently for no reason, and will start their own rounds, which preempts the leader's round. This leads to the dueling leaders problem, and violation of liveness even when we have a bound on F, (i.e., F< N/2).

In Texel, there is no need for a failure detector if we have a bound on F (i.e., F<N/3). This is because Texel is a decentralized consensus algorithm, and the nodes do not need to rely/wait on a leader to lead a round; instead all nodes do their own polling and deciding. But as we will discuss in the read-write Texel model (wait for the next post), if the nodes are very snoopy and keep interfering with each others’ experiments, then liveness is still violated. This is where having a leader to lead a round (as in Paxos) provides advantage: the leader by definition reads from a consistent state, as for that round the other nodes are passive.

What if we had non-client restricted rounds as in Fast-Paxos? That is an opportunistic-leader-based solution, and progress is not guaranteed if multiple opportunistic-leaders clash. Then we need to default to Paxos.... which is subject to the failure-detectors result as above for progress! Back to square one.

In the Ben-Or algorithm, there is no need for failure detectors if we have a bound on F (i.e., F<N/2), because that is also a decentralized algorithm. Ben-Or has rounds but the rounds are not client-restricted and do not preempt each other. Also it seems like a node does not interfere/cancel other nodes' progress in querying/experimenting. Ben-Or does not have the disadvantages of Paxos or Texel. So what gives? Ben-Or is a probabilistic algorithm. By using randomization, the system eventually and probabilistically converges to a consensus decision.

(While writing the read-write model of Texel algorithm, I found several parallels between Ben-Or and Texel. Those will also be interesting to investigate more closely.)

Comments

Popular posts from this blog

Learning about distributed systems: where to start?

Hints for Distributed Systems Design

Foundational distributed systems papers

Metastable failures in the wild

The demise of coding is greatly exaggerated

Scalable OLTP in the Cloud: What’s the BIG DEAL?

The end of a myth: Distributed transactions can scale

SIGMOD panel: Future of Database System Architectures

Why I blog

There is plenty of room at the bottom