Modeling Paxos and Flexible Paxos in Pluscal and TLA+

The first part of this post describes modeling Paxos in Pluscal. The second part shows how to modify that model to achieve a flexible quorum Paxos. The idea for flexible quorums was introduced in a paper in 2016 summer. This simple and surprising result says "majority agreement is not required in Paxos and the sets of acceptors required to participate in agreement (known as quorums) do not even need to intersect with each other".

Modeling Paxos in Pluscal

While there are many examples of Paxos modeling in TLA+ available, I haven't found any Pluscal modeling of Paxos, except this one from Lamport, which helped me come up with my Pluscal model below. The problem with TLA+ is that it is too low-level (i.e., too declarative and math-like) for writing--and reading-- distributed algorithms. The PlusCal language provides a higher-level pseudocode, which is easier to follow.

However, as you go through my Pluscal model below, you will find that it doesn't follow your expectations of an implementation in your favorite imperative language. This is OK, Pluscal is meant to just model the algorithm so that we can model check for correctness against concurrency bugs. I had written more about modeling at a higher abstraction level earlier in this post and this post.

Leader denotes the range used for the ids of the leader processes, and Acceptor denotes the range used for the ids of the acceptor processes. Slot is the range of slots, and Ballots is the range of ballots.


Acceptors are simple, they just react to leaders' Phase1, Phase2, Phase3 messages sent with various ballot numbers. To this end, the acceptor body calls macros, which are inlined while the Pluscal code is being translated to TLA+ for model checking.

An acceptor keeps a variable for remembering the maximum ballot number maxBal it promised. It also remembers all values it accepted at Phase2a using hVal, a set of <slot, ballot, value> tuples. Finally an acceptor stores the decided proposals at each slot as a set. Of course if the agreement property of Paxos holds, the decided set for a slot has cardinality <=1.

The leader loops through the 3 phases of a round for each slot. It tries to dominate in Phase1, so it can go to Phase2. After the leader is elected, Phase1 can be skipped in subsequent slots, if the leader is not preempted by another leader. An elected leader can get preempted any time and CollectP2 can fail, so the leader checks this before it can decide a value at SendP3.

The ballot number of the leader b is incremented modulo M (the number of leaders) so it remains unique across leaders. The variable pVal is a set to store the values accepted in earlier slots, so a suitable value can be re-proposed in Phase2a. (See CollectP1 and SendP2 macros.)

AccMsg denotes set of acceptor messages sent and LMsg denotes set of leader messages sent. Instead of actually sending messages in channels and to each acceptor, sending message is modeled as adding a message in a messageboard, where other processes can nondestructively read the message. (This idea resembles the Linda tuplespaces idea.)

The macros SentXX returns a set of messages in the messageboard that match a specific filter. SuitVal is a macro for identifying the proposal with the highest ballot id accepted for a given slot.


SendP1(b) lets a leader put a Phase1a message with ballot number b to the AccMsg messageboard. ReplyP1(b) lets acceptors react to a Phase1a message with ballot number b by writing a reply back to LMsg messageboard. CollectP1(b) lets a leader to proceed as elected from Phase1 if a majority of acceptors said OK. The await statement serves as a guard: if the await predicate is not satisfied, the rest of the macro is not executed.

SendP2(b,s) lets a leader put a Phase2a message with ballot number b, slot number s to the AccMsg messageboard. The message proposes self as value, or SuitVal as value if applicable. ReplyP2(b) lets acceptors react to a Phase2a message with ballot number b by writing a reply back to LMsg messageboard.

Using CollectP2(b,s) a leader can learn that its proposal was accepted by majority of acceptors, or else it can learn that it has been preempted by a higher ballot number. SendP3 and ReceiveP3 macros implement Phase3 of Paxos.

The model checking took 7 minutes with the parameters I mentioned in the comments. Not bad.

Flexible Paxos

The flexible quorums idea was introduced in a paper in 2016 summer.  It says that we can weaken the Paxos requirement that "all quorums intersect" to require that "only quorums from different phases intersect". That is, majority quorums are not necessary, provided that Phase1 quorums intersect with Phase2 quorums.

For example  in a system of 10 acceptors, we can safely allow any set of only 3 acceptors to participate in Phase2, provided that we require 8 acceptors to participate for Phase1. This decreasing of Phase2 quorums at the cost of increasing Phase1 quorums is called as simple quorums.

Or alternatively, we can use grid quorums, where every column forms a Phase1 quorum, and every row a Phase2 quorum. In grid quorums, the quorums within either phase do not intersect with each other.

The flexible Paxos paper gave a TLA+ model, but didn't give a Pluscal model. But as I show next, we can implement the flexible Paxos quorums with a simple modification of our Paxos Pluscal model.

Quorum1 denotes Phase1 quorums, and Quorum2 denotes Phase2 quorums. Quorum1 quorums must intersect with Quorum2 quorums, but neither needs to be a majority quorum. Also Quorum1 quorums need not intersect among each other, and Quorum2 quorums need not intersect among each other. In the comments, I provide example parameters for model-checking the flexible Paxos extension.




Before the leader can get elected in Phase1, it checks to see if acceptors from a Quorum1 quorum said OK. SatQ1(b) and SatQ2(b) are macros for checking whether acceptors from a Quorum1 and Quorum2 quorum responded back for a given ballot number b.

Related links

Paxos and FPaxos Pluscal programs available on Github.

Using TLA+ for teaching distributed systems

My experience with using TLA+ in distributed systems class

There is a vibrant Google Groups forum for TLA+: https://groups.google.com/forum/#!forum/tlaplus

By clicking on label "tla" at the end of the post you can reach all my posts about TLA+

Comments

Anonymous said…
Dear professor: pluscal/tla+ code links to dropbox are broken in most posts.

Popular posts from this blog

Hints for Distributed Systems Design

Learning about distributed systems: where to start?

Making database systems usable

Looming Liability Machines (LLMs)

Advice to the young

Foundational distributed systems papers

Distributed Transactions at Scale in Amazon DynamoDB

Linearizability: A Correctness Condition for Concurrent Objects

Understanding the Performance Implications of Storage-Disaggregated Databases

Designing Data Intensive Applications (DDIA) Book