A snapshot isolated database modeling in TLA+

While idly browsing tlaplus/Examples repository, I noticed TLA+ spec for a snapshot isolated Key-Value Store written by my friend Andrew Helwer. This is a centralized database. Each transaction makes a copy of the store, and OCC merges their copy back to store upon commit. I immediately wanted to write a PlusCal version of it and contribute back.

My motivations were:

  • Let's see how quickly I can write the PlusCal version. This way I can get practice converting TLA+ specs to PlusCal. (TLC automatically converts PlusCal to TLA+ spec, but the other way requires some judgement call.)
  • The PlusCal version would make this spec easier to understand for modeling newbies.
  • I can also add the client-centric isolation checking to this to provide a more concrete example of that immensely useful module.
  • It would be cool to see the Serializability counterexample the model would throw. I was curious if the model checker will throw the classical write-skew counterexample.
  • This would be a cool self-contained example I can show in my TLA+ seminar inside Amazon/AWS to walk people from basic to intermediate/advance concepts in TLA+ modeling.

And, I did that. I had fun doing it and contributing it to the tlaplus/Examples repo. And of course, I had to write this blog post to explain the model, because that is how I am wired.

What could be your motivations to keep reading on?

  • A modeling walk-through! There aren't many examples of that. This is the way to learn about abstraction skill.
  • You can learn how to do transaction isolation checking for TLA+ models
  • You can find out if the Serializability counterexample is the classical write-skew counterexample
  • This beats mindless browsing on the Internet. Seriously man, you can do better.


Global variables


You can see the PlusCal keyword algorithm on line 22. PlusCal is just a block comment that TLC parses and translates to a TLA+ specification. You should still be able to read TLA+ to debug problems arising when you are modeling.

Ok, then we have the variables keyword to start us off with global variables of the algorithm.

store is the datastore we are operating on. It is simply a mapping from Keys to Values, hence a key-value store. In math-land, it is a function. As we declare it, we also initialize it, for the initial state of the system, to map each key to a NoVal. This is abstraction. I am interested in isolation checking, I am not interested in adding or removing keys. I can pretend the keyspace is fixed, and initially all keys are mapped to NoVal.

tx is the set of active transactions at a given time. In the initial state there are not active transactions, so tx is empty set.

missed is the set of updates this active transaction failed to see while working on its local snapshot of the database. We will talk about missed in the COMMIT part of a transaction process.


The transaction process

I realized I only need one type of process: a transaction process to do the acting on the store. I can instantiate this process 3 times, if I like to test with 3 concurrent processes.

As you read the spec below, notice the placement of the labels in the code. Label to label is the unit of atomic execution in PlusCal. There are rules for where they are needed like at the start of a while statement, and the TLC parser will inform you if you missed a required label placement. You can check the PlusCal cheatsheet for quick information on these.In addition you can add other labels as yield points from this process to other processes it is concurrently executing with.

It is very instructive to read the generated TLA+ and check how it corresponds to your PlusCal code. I would recommend you do it for this model until you see how TLA+ translation works. Especially pay attention to how process is instantiated with self, and called in Next later.


Transaction start


On line 38, process is instantiated for each element t in TxId set. This is a set that user can give in at model checking time with .cfg file, say as {t1,t2,t3}, which instantiates and runs this process 3 times concurrently.

We have 4 process local variables. snapshotStore has the same signature and initial mapping with global variable store. It denotes local copy/snapshot of the store.

Read_keys (respectively write_keys) denote the set of keys selected for reading (respectively writing to) by this transaction process.

Finally, ops is a operation log to record the reads & writes this transaction executes, and is used for interfacing to the client-centric isolation checking component.

Then on line 44 we have the start of the process code, and on line 45 the first label, titled START. This is one of those label required situations. The code for each process must begin with a label. If you scroll down and check the corresponding TLA+ translation, you will understand why this is the case.

Within this START label, we do the transaction initialization work. Line 46 adds this transaction to the set of existing transactions tx (remember this from the global variables?). The process refers to its id as self because this code is instantiated for each t in TxId.

Line 47 sets snapshotStore to copy the current value of store. It may be that another transaction process already got scheduled and ran to completion before we execute this line, so we may be snapshotting a modified version of the store to start and execute our transaction.

The with block in line 49, sets read_keys and write_keys to a random set in the superset of Key set (which may be given at the .cfg file as {k1, k2, k3}), excluding the empty set. So read_keys may be assigned to {k1,k2}, and write_keys to {k2}.

A heads up here about the difference between CHOOSE and with. CHOOSE cannot be used to introduce nondeterminism, but with does provide random choice.


Transaction read update/commit


The next label is READ. This means after executing the lines between START to READ, the scheduler may yield control to other processes to execute other labels for some time, before it picks back up on this current process to execute the READ label to UPDATE label atomically, and yield again. This checks for all possible concurrent executions of the process code.

We cheat in the read operation. We pretend to read things, but we don't need to read them, because the value we write in the UPDATE label (our process id, self) is independent of the values we read. But we do record in ops the content of our read_keys set to pass this information to the client-centric isolation check module which relies on this log to validate transaction isolation. When we are passing this we use a SetToSeq formula to flatten the set to a sequence the ops log expects. We could have added this to the log in a while loop one by one, but we avoid this because a while loop requires a label, and would mean the control may be yielded to other processes at each iteration, and this blows up the number of possible executions the model checker needs to check (without finding interesting behavior for transaction isolation checking purposes).

We have the UPDATE label on line 59. This is optimistic concurrency control (OCC), meaning we do the updates local first, and later at commit time if there is no conflict we get to commit them permanently. So here the process updates its own copy of store, the snapshotStore to assign the keys in write_keys, its id, self, as the value. On line 60, we do the update as a one-liner, with a new function assignment to the snapshotStore. For each key in Key space, we only update the keys that are in our write_keys to the value self, and leave the other keys alone.

We again avoid the while loop update. This is a process local variable, rather than the global store, so assigning to our local snapshotStore at different orders doesn't change the bottomline for isolation checking. For isolation checking, we just care about the contents of ops log for read values and the committed write values.

Finally in line 62, we have the COMMIT label. Line 63 is the no-conflict validation check. If there is no-conflict, we get to commit the transaction and merge our local updates with the global store. Otherwise, the process terminates, and our local updates get lost, which means the transaction is trivially and cleanly aborted without any side-effects.

Line 63 checks that this transaction did not miss an update on a key that is in your write_keys in order to prevent a write-write conflict snapshot isolation forbids. In other words, the intersection of missed[self] and write_keys is an empty set. But what is this missed set? You remember it defined as a global variable, but you haven't seen it updated yet. The update comes on line 68, if this transaction passes the check and gets to commit itself.

For commit, the process takes its id self from the list of active transactions set, tx, and updates the missed set of actively transactions in tx set with the updates it is about to install to the global store. This is again a one-liner full function assignment update. For each transaction, if the transaction is active (i.e., is a member of tx), we append our write_set to its missed set, otherwise we leave its missed set unchanged.

You should be able to follow the global store update on line 71 without any problem now. Finally, on line 74 we log these writes to the ops log so the client-centric isolation check module can use these logs for checking isolation.   

Here is a followup exercise for you. Try to break it by making the COMMIT finer grained. Can you pull some of the statement in COMMIT in a PRECOMMIT label to add more fine-grained concurrency to the protocol? Would that work or violate isolation invariants? If so, try to understand how by checking the counterexample TLA+ model checker produces.


Instantiating the CC

I am not going to go through the explanation of what the CC module does. For this, please read my blog post from last July.

In the CC module, Tim Soethout translates the ideas from the 2017 PODC paper on client-centric specification of database isolation to TLA+. That PODC'17 paper is a big improvement to be celebrated. Instead of a graph of operations and trying to check conflict on that graph, it enabled reducing isolation checking to state formulas about complete read states and performing validation through these formulas. This makes up for the difference between a complicated 1600 lines TLA+ module for checking isolation through graphs, versus 160 lines TLA+ module for doing the checking on states as the CC module implements.

Here is how we instantiate the client-centric isolation checking (CC) TLA+ module. First copy ClientCentric.tla and Util.tla in the same folder as your model for the CC instantiation to work. Then use INSTANCE keyword to instantiate it and assign it to a handle, in this case CC to refer to it. You can write CC!formula to refer to formula under the CC module. Or you can rekey them.


The figure shows the module constants and instantiating CC. The CC component also expects Keys and Values variables passed to it. This is done on line 13 through INSTANCE and WITH mapping. We map Keys to be the Key in our database model, and for Values we map it to TxId \union {NoVal} because that is what we use as values in our database model.

The transactions in our database model is interfaced to ClientCentric checking model only through the read and write operations it invokes, which is captured through an operations log. We pass this through the ops log when calling Isolation invariants to check, as we show next.


Invariants


Line 151 shows how to check SnapshotIsolation by using CC. That simple! We give the function what it needs to check, the initial state that the KV store starts, and the ops log for each transaction for the reads (keys and values) and writes (keys and values) it performs. The module takes care of the rest.

If we instead invoke the serializability check, we see that serializability does not work. Let's review the counterexample.

You can try this yourself by opening this folder in VScode, and using the TLA+ extension to model check the MCKVsnap.cfg with Serializability uncommented.


Serializability counterexample

It is indeed the classic write-skew example we get back. Our snapshot isolated database by design allows write-skew and this violates serializability. (This is not a problem, you can deal with it if you know what you are getting. Strictest isolation is not always better.)

  • Step 2 shows t1 start with *read_keys* {k1} and *write_keys* {k2}.
  • Step 5 shows t2 start with *read_keys* {k2} and *write_keys* {k1}.
  • In step 6, t1 commits.
  • In step 9, t2 commits, because there is no write-write conflict between t1 and t2, and the snapshot isolated database is happy to commit both.

The read-write conflict between t2 and t1 is not accounted for, which leads to the write-skew and violation of serializability.


The config file


Above is the config file for completeness. If you like to model check, install the VSCode TLA+ extension and follow these steps.

  • Open the .cfg file you are interested in checking in VSCode
  • Get the VSCode panel (Option+X on Mac), and choose "TLA+: Check Model with TLC")


It takes 16 seconds on my 2019 model MacBookPro to exhaustively model check this configuration. But if I add k3 to the Key set, it takes 44 minutes! There is a demonstration of state space explosion for you. 165 times increase in model checking time when we add just another key.

Still 44 minutes is not so bad! For even larger models, to reduce the checking times, you can invest in a large AWS machine, or distributed model checking via TLC.


Call to action


Abstraction is the hardest part to teach when teaching modeling with TLA+. There is an art to abstraction, and you learn it through apprenticeship process. By observing good examples of abstraction, you learn the art by osmosis over time. And abstraction is a very important thing to learn.
Abstraction, abstraction, abstraction!
That's how you win a Turing Award. --Leslie Lamport
TLA+ modeling examples like this will help for learning abstraction. I request you to contribute more TLA+ specification examples to https://github.com/tlaplus/Examples/, so you can help people to pickup and master the art of modeling with TLA+. We need examples from different use cases and subfields. 

Comments

Popular posts from this blog

Hints for Distributed Systems Design

Learning about distributed systems: where to start?

Making database systems usable

Looming Liability Machines (LLMs)

Foundational distributed systems papers

Advice to the young

Linearizability: A Correctness Condition for Concurrent Objects

Understanding the Performance Implications of Storage-Disaggregated Databases

Scalable OLTP in the Cloud: What’s the BIG DEAL?

Designing Data Intensive Applications (DDIA) Book