Teaching Paxi

Paxos family of protocols (which I refer to as Paxi) is immensely useful for building distributed systems due to their excellent fault-tolerance properties. Many cloud computing services and distributed databases employ Paxi for state machine replication (SMR). Paxi preserve the safety of consensus problem (no two nodes commit different values for the same slot) even to the face of a fully asynchronous execution, crash faults, message losses, and network partitions. Paxi satisfy liveness of consensus problem (some value is eventually committed for the slot) when the system moves outside the realm of the coordinated attack and FLP impossibility results.

Paxi are perennially misunderstood and their sophistication underrated. While there has been a lot of work on Paxi, we have been able to explore only a fraction of the algorithm design space. A striking evidence of this arrived in 2016, where we had a flexible quorum breakthrough after 30 years, which no one had anticipated.

There is a need to unpack Paxi and explain them better for both students and practitioners alike. Paxi provide a great opportunity for teaching the principles of distributed systems and we should seize on this opportunity.

Problems with teaching Paxi

Most coverage of Paxos in courses is dry and superficial: the Paxos protocol is described and the students memorize the protocol. While the Paxos protocol looks simple, it has a lot depth and subtleties. It is not possible to appreciate these and truly understand distributed consensus by just memorizing the Paxos protocol. To understand Paxos, you should not only understand how it works, but also why it works, and what cornercases it prevents, and how else it could be realized.

Raft has been proposed as a simple explanation of consensus and Paxos. While many developers love the operationalized explanation style of Raft and the implementation that accompany it, tying the explanation to a constrained implementation is unnecessarily restrictive. The generality of Paxos family of protocols are lost, and the context and principles of distributed consensus is not communicated satisfactorily.

We need better explanations of not just Paxi but the context and derivation of these protocols, explaining why each action is needed and why this is a hard problem. However, explaining the protocol solely in a declarative way using derivation is also hard to follow for students, and some intuition should be provided as well. The students should also be provided with ample opportunity to get a lot of hands-on exercise and experience with the protocols, their implementation, and their integration into practical applications/systems.

Every year my favorite part of the distributed systems class is when I get to teach Paxos for two weeks. Every year, I am able to do a better job of it by gradual/gradient-descent improvement. But these days, I am planning an overhaul of how I teach Paxos, and put some real effort behind this to realize my ideal setup for teaching Paxi. Here is my explanation of this setup in terms of the content of the module and supporting tools for it.

Course module on Paxi

The Paxi course module will provide context and teach the principles and derivation of the protocols.

To teach about the context, the module will cover questions such as: What makes the distributed consensus problem so hard?  How has our understanding of distributed consensus change after "Attacking Generals" and FLP impossibility results? What are the cornercases that haunt correctness and liveness? Covering these will help the students appreciate the depth and ingenuity of Paxi.

To teach about derivation and principles of the protocol, the module will employ a stepwise refinement from the high-level consensus specification to an intermediate round-based abstraction (for which Heidi's generalized consensus framework is a good candidate), and then to the Paxos algorithm.  The module will explore both leader-based (as in Paxi) and non-leader-based (as in Ben-Or and Texel) refinements of this round-base intermediate specification, and will discuss the advantages and disadvantages of each approach.

The module will also relate distributed consensus to decentralized protocols in the blockchain domain. By showing a Texel to Avalanche transition, the module will tie consensus (where Texel shows an alternative solution to leader based consensus as in Paxos) to blockchains (where Avalanche shows how to scale and operationalize Texel to large-scale decentralized environments).

Supporting tools

To allow the students to experiment with the protocols at the design level, we will provide TLA+/Pluscal modeling of Paxos and variants. With these, the students will be able to model-check Paxi protocols and experiment with modifications to see which safety and progress properties are satisfied under different environments.

To enable the students to experiment at the implementation level, we will use our Paxi framework implemented in Go (available as opensource at https://github.com/ailidani/paxi). Our Go Paxi framework provides a leveled playground for protocol evaluation and comparison. The protocols are implemented using common building blocks for networking, message handling, quorums, etc., and the developer needs to only fill in two modules for describing the distributed coordination protocol. Paxi includes benchmarking support to evaluate the protocols in terms of their performance, availability, scalability, and a linearizability checker to check the protocol for consistency. We have a dozen Paxos variants already implemented in Paxi. We will invite students to implement more, especially Byzantine versions of Paxos protocols, and consider tie-ins to permissioned blockchain protocols. In order to link the high-level design to the implementation, we will provide a mapping from TLA+ model to the Paxi implementation of distributed consensus.

In order to showcase the integration of Paxi protocols to distributed systems applications, we will use the globally distributed database FleetDB (https://github.com/acharapko/fleetdb) as the hands-on application. We will first extend FleetDB to be instantiable with different Paxi protocols as plugins, and make the Paxi protocols exchangeable based on workload and environment. FleetDB can lead the way and help distributed databases for integrating Paxi protocols in their replication operation. Currently only a handful databases (including Spanner and CockroachDB) use Paxos as part of their replication. Although Paxos provides excellent fault-tolerance properties and prevent any loss of consistency, the vanilla Paxos protocol is not a good fit for WAN deployments, and performs poorly under certain topologies and workloads.

Couple other tools are worth mentioning. One is the DSLabs tool from UW for model checking distributed systems projects implementations. Another is the DistAlgo tool from SUNY Stonybrook.

MAD questions

1. What is your favorite teaching technique for Paxi?
While teaching Paxos, I bring 5 students to the board to perform live reenactments the Paxos consensus algorithm (each one simulating a Paxos node) under several fault scenarios. This part is the most fun one and most beneficial in my distributed systems course. I do this twice in two different classes. Through this exercises the students get to learn how the protocol works, and see how it deals with the cornercases.

Another favorite moment for me is to watch the students think they understand the protocol, and then forget about it and get confused again. This is inevitable, and a big part of how learning works. Learning needs self evaluation and self correction. Without doing work yourself, you can't truly learn anything. Easy come, easy go. I also like watching the students learn the application of the single-instance Paxos in the MultiPaxos protocol, and see them learn about some Paxos variants. The students then realize that what they learned was only the tip of the iceberg.

2. Are there other algorithms/systems you suggest can serve as capstone projects in a distributed systems class?


Ygal Bellaiche said…
Great post.
Very interesting and with a lot of useful links

Popular posts from this blog

Graviton2 and Graviton3

Foundational distributed systems papers

Learning a technical subject

SQLite: Past, Present, and Future

Learning about distributed systems: where to start?

Strict-serializability, but at what cost, for what purpose?

CockroachDB: The Resilient Geo-Distributed SQL Database

The Seattle Report on Database Research (2022)

Amazon Aurora: Design Considerations + On Avoiding Distributed Consensus for I/Os, Commits, and Membership Changes

Warp: Lightweight Multi-Key Transactions for Key-Value Stores