Modeling the Dining Philosophers Algorithm in TLA+

In this post I will provide a modeling of the dining philosophers algorithm in TLA+. If you have not heard about TLA+ before, take a look at my earlier posts on TLA+ to get some context.  Post1 and Post2

The dining philosophers problem is an instance/refinement of the mutual exclusion problem. The specification of the mutual exclusion problem dictates that when a node is in critical section, no other node in the system can be in the critical section. The dining philosopher relaxes this by taking into account the topology: When a node is in critical section, none of the neighbors of that node can be in critical section. Note that, this does not rule out two nodes to be in critical section simultaneously provided that they are not neighbors.

In the original formulation of the dining philosophers problem by Dijkstra, there are 5 philosophers sitting around a table. Between each philosopher is a fork and, in order to eat a philosopher must hold both of the forks. (The spaghetti is really messy.) The philosophers cycle between 3 states: thinking, hungry, and eating. The dining philosopher protocol should satisfy 1. (safety) if a process is eating neither of its left neighbor or right neighbor can be eating, and 2. (progress) if a process is hungry, eventually it gets to eat.

Dijkstra's formulation is an example of a token-based approach to mutual exclusion. Unlike token-based solutions to mutual exclusion where there is only one token in the system (as only one node is allowed to be in the critical section at any given time), the dining philosophers formulation relaxes the requirement and introduces one token (i.e., fork) per edge. A node that is eating needs to possess all the forks on its side, hence the safety condition is easily satisfied: if a process is eating, neither/none of its neighbors can be eating. The progress condition on the other hand is more trickier to satisfy and we will devote most of our attention on satisfying the progress condition in the below algorithms.

PlusCal modeling of naive dining philosophers

Here is a PlusCal algorithm modeling of the dining philosophers problem.

There are N processes from 0..N-1. (You can provide N at model-checking time.) Each process is symmetric, they execute the same code instantiated with their ID. Each process has a variable called state, and loops through selecting 3 actions. From "state=thinking", the process can transition to "hungry". A hungry process first waits till the fork on its right is available and then grabs it. The hungry process then waits till the fork on its left is available and grabs it, and transitions into eating state. An "eating" process can transition to "thinking" by releasing its left and right forks.

OK, let's have the birds and bees talk now. How do we model a system at a suitable abstraction layer? When we are trying to get the basics of the algorithm right, we don't need to model the protocol at a message passing level. Let's get it right at the shared memory model first. Modeling the system at a suitable abstraction layer is an art. Novice users often make the mistake of writing Java-like iterative programs with PlusCal. This makes the code long and also very slow for model checking. After seeing better ways to model systems looking at other PlusCal code examples, novice users gradually learn to write more declarative math-like specifications. I believe this latter mode of thinking also helps improve their system design skills as well (which is what makes me most excited about using/teaching TLA+). I find that once you get the "define" block right, the rest of the PlusCal algorithm practically writes itself. But you may need to make a couple unsuccessful takes before you figure out the most suitable "define" block that leads to a succinct and elegant model.

In my modeling I use a shortcut to model the forks between two process. There are N forks, so I model the forks using an array of size N. The left fork of a philosopher is the fork that resides at the array index given by the ID of that philosopher. The right fork of a philosopher is the fork at index=ID+1 (modulo N). I say that a fork is available for grabs if the fork's value is set to N. When a process grabs its left fork or right fork, the fork's value is assigned to be equal to the ID of that process, so we denote that the fork is not available anymore. (Note that "self" refers to the ID of the process.)

When we model-check our algorithm, TLA+ complains and alerts us to a deadlock. Below is the deadlock trace provided by TLA+. You can see that all process are blocked at label "E:" where each one of them is waiting to grab its left fork. You can click on each step in the trace to see the program state at that step, and see how the final deadlock state is reached as a result. The problem turns out to be this: all the processes first grabbed their right fork, and when they try to grab their left fork, they can't because those forks have been grabbed by their respective right neighbors as a right fork.

Modeling dining philosophers with a distinguished philosopher

As a general principle in conflict resolution, it is necessary to break the symmetry. As long as each philosopher looks exactly like the others, the problem is hopeless, the processes run into a deadly embrace. So let's differentiate process 0 as being lefty: while all philosophers grab their right fork first, process 0 tries to grab its left fork first and its right fork later. This bit of asymmetry breaks the cyclic dependency for the availability of the forks. We changed only the second action of the program as below. When transitioning from hungry to eating, we introduced an if branch. If the process ID is 0, the process grabs left fork first instead of grabbing its right fork first.

When we model check this code, we find that since the symmetry is broken, the deadlock is avoided. However, TLA+ still complains about starvation. The "NoStarvation" predicate asserts that it is always the case that if a process is hungry, eventually that process gets to eat. In other words, no process starves. When we ask TLA+ to check NoStarvation in the temporal properties, TLA+ produces the following long counterexample. Inspecting the loop from state 19 to state 28, you see that process 2 can get to eat again and again while process 1 and 0 remains at the "hungry" state and starve. This counterexample is possible with a weak-fair scheduler. The scheduler chooses process 1 to execute every so often, but only when its left fork is unavailable so its left-fork grab action is disabled. (A strongly-fair scheduler would solve this deadlock, because process 1 will get chosen to execute the left-fork-grab action when the guard of the left-fork-grab is enabled. You can tell TLA+ to use strong-fairness by typing "fair+ process" in the PlusCal code. Also you would need to replace the either or construct with an "if else if else if else" construct, so TLA+ scheduler cannot get away with choosing to execute an inactive "or branch".)

The way to solve this problem under weak fairness is to introduce a flag to reduce the priority of a process once it eats, so that a hungry neighbor will have higher priority and can get to eat. We are going to look at such a solution in the next blog post on TLA+: the hygienic philosophers algorithm.


Naive Dining Philosophers model in TLA+
Dining Philosophers with a distinguished process model in TLA+ 

The PlusCal code for the two programs are accessible through the above links. When you want to model check the program, start a new model, and enter 3 for N, so the model checker is ready to start. In the model checker tab, click on Invariant pane and enter ME, so that the ME predicate is checked against any violation, and in the Properties pane enter NoStarvation so that the progress property is checked for satisfaction. If you make a modification to the PlusCal code, make sure you click on File, Translate Algorithm, so the PlusCal code gets translated to the TLA+ code, which is then used for the model-checking. See the TLA+ posts below about how this works.

Related Links

Using TLA+ for teaching distributed systems

My experience with using TLA+ in distributed systems class

There is a vibrant Google Groups forum for TLA+:!forum/tlaplus

By clicking on label "tla" at the end of the post you can reach all my posts about TLA+


Popular posts from this blog

Foundational distributed systems papers

Your attitude determines your success

Progress beats perfect

Cores that don't count

Silent data corruptions at scale

Learning about distributed systems: where to start?

Read papers, Not too much, Mostly foundational ones

Sundial: Fault-tolerant Clock Synchronization for Datacenters


Using Lightweight Formal Methods to Validate a Key-Value Storage Node in Amazon S3 (SOSP21)