Automated Validation of State-Based Client-Centric Isolation with TLA+ (2021)

This work provides a TLA+ formalization of the state-based isolation model introduced in the PODC 2017 paper.

The TLA+ models provided here will be very useful for you if you are a distributed database developer. Using these, you can show your database protocols satisfy the desired isolation levels, and as you extend the design, you can keep checking that the guarantees still hold.

I have been suffering from this problem recently. I wrote a TLA+ model for a distributed database design. I thought I would be able to write serializability predicates using TLA+ formulas relatively easily. When I tried doing so, I realized what a mess I am in. I made a couple attempts and found that operation-centric specification of database isolation is hard to define and check. I decided approach the problem sideways and wrote some BaitInvariants to show what behaviors are out-ruled, only to find that what I thought would always be rejected was actually valid serialization behavior in some scenarios. Serializability already accepts a lot of behavior, so it was hard to specify. After this experience with serializability, I questioned what chance I had in specifying even more relaxed isolation levels, like snapshot isolation.

What does it mean for a transaction to come after the other? What does it mean to read a value written by a committed transaction? I need a transaction object abstraction, which would include a start-time, end-time, read-set, write-set. I can create a record for transaction, and write it to a log (sequence) and then I can do processing on this log to check for violations. But see, this processing the log becomes a TLA+ model of its own (comparing pairs of transactions with each other) rather than a simple TLA+ formula I can write at the end of our database protocol model.

I had found a TLA+ model of Snapshot Isolation using graph-based isolation checking from Cahill's thesis. But I was unable to make progress in using that, as it is not easy to follow and adopt.

Then I came across this work that followed up on the client-centric specification of database isolation paper we reviewed last week. This work was just what I needed. I instantiated the Client-Centric Isolation model provided here in my distributed database model and I was able to check the design for different isolation levels. This was an easy process as I explain below.

How to use the model

For using the model, you have two options: extending the model or instantiating a copy of the model.

The PaperExamples model
*extends* the ClientCentric model and uses ASSUME statements to model-checks the Figure-2 example and Bank example in the PODC17 paper, as well as a buggy trace from 2-phase-locking(2PL)/2-phase-commit (2PC) transactions. This is an OK way to debug a problem when you already have the trace. It is also a quick and cheap way to check an example execution to see which isolation levels it complies with. But if you like to model-check a new design you wrote, you should use the general approach below.

The ClientCentric2PL model *instantiates* the ClientCentric model and uses it to check a PlusCal model of the 2PL/2PC transaction protocol it specifies. This provides an easy and modular way to check isolation levels using single line invocations of the instantiated ClientCentric model. The transactions in this 2PL/2PC model is interfaced to ClientCentric checking model only through the read and write operations it invokes, which is captured through an operations log. (The operations log should satisfy the condition for tying read-value to a transaction, as stated in the paper: "For simplicity we assume that each value is uniquely identifiable. There can thus be no ambiguity, when reading an object, which transaction wrote its content.")

\* Client Centric instance for checking consistency levels
CC == INSTANCE ClientCentric WITH Keys <- resources, Values <- 0..10
w(k,v) == CC!w(k,v)
r(k,v) == CC!r(k,v)

operations[tId] := operations[tId] \o << r(self, state), w(self, state+1) >>;

The w(k,v), r(k,v), and operations are defined in the ClientCentric model:
\* Helpers representing Reads and Writes
r(k,v) == [op |-> "read",  key |-> k, value |-> v]
w(k,v) == [op |-> "write", key |-> k, value |-> v] 

\*  reads and writes per transaction id
operations = [ tId \in transactions |-> <<>> ]

This process went pretty smoothly. I just copied ClientCentric.tla and Util.tla in the same folder as my database model for the CC instantiation to work. Then, I followed the example in ClientCentric2PL.tla model to interface my model with the CC instantiation.

In my model, I was writing multiple readKeys/writeKeys at once, and a tip I got from Hillel helped me to write this as a single-liner and avoid writing an ugly while loop for adding these one by one. The SetToSeq macro was the key for doing this as defined in TLA+ community modules.

IsInjective(f) == \A a,b \in DOMAIN f : f[a] = f[b] => a = b  SetToSeq(S) == CHOOSE f \in [1..Cardinality(S) -> S] : IsInjective(f)

 

After finishing these steps to check for isolation levels for your database model, just insert these lines at the end of your model, and check the isolation level predicate as predicate when model-checking.

ccTransactions == Range(operations)

Serializable == CC!Serializability(InitialState, ccTransactions)
    
SnapshotIsolation == CC!SnapshotIsolation(InitialState, ccTransactions)

ReadCommitted == CC!ReadCommitted(InitialState, ccTransactions)

ReadUncommitted == CC!ReadUncommitted(InitialState, ccTransactions)


That is it! You are done.

If you are interested in how ClientCentric.tla formalized and operationalized the theoretical foundations of the state-based isolation model introduced in the PODC 2017 paper, read on.

 

Formalizing Client-Centric Isolation

Below I include snippets from the ClientCentric.tla model which formalizes the state-based isolation concept introduced in the PODC17 paper. These are fairly straightforward to follow after having understood the corresponding concepts described in the PODC17 paper. The model is short, amounting to only 180 lines.

First some basics.
State == [Keys -> Values]
Operation == [op: {"read", "write"}, key: Keys, value: Values]
Transaction == Seq(Operation)
ExecutionElem == [parentState: State, transaction: Transaction]
Execution == Seq(ExecutionElem)


This is how *Read States* concept is formalized in TLA+.
ReadStates(execution, operation, transaction) ==
  LET Se == SeqToSet(executionStates(execution))
      sp == parentState(execution, transaction)
  IN { s \in Se:
        /\ beforeOrEqualInExecution(execution, s, sp)
        /\ \/ s[operation.key] = operation.value
           \/ \E write \in SeqToSet(transaction):
              /\ write.op = "write" /\ write.key = operation.key /\ write.value = operation.value
              /\ earlierInTransaction(transaction, write, operation)
           \/ operation.op = "write"
     }


beforeOrEqualInExecution(execution, state1, state2) ==
  LET states == executionStates(execution)
  IN  Index(states, state1) <= Index(states, state2)


Now comes the famous commit tests from Table 1.


\* Serializability commit test
CT_SER(transaction, execution) ==
  Complete(execution, transaction, parentState(execution, transaction))
Serializability(initialState, transactions) == satisfyIsolationLevel(initialState, transactions, CT_SER)

\* Snapshot Isolation
CT_SI(transaction, execution) == \E state \in SeqToSet(executionStates(execution)):
  Complete(execution, transaction, state) /\ NoConf(execution, transaction, state)
SnapshotIsolation(initialState, transactions) == satisfyIsolationLevel(initialState, transactions, CT_SI)

\* Strict Serializability:
CT_SSER(timestamps, transaction, execution) ==
  LET Sp == parentState(execution, transaction)
  IN /\ Complete(execution, transaction, Sp)
     /\ \A otherTransaction \in executionTransactions(execution):
        ComesStrictBefore(otherTransaction, transaction, timestamps) =>
          beforeOrEqualInExecution(execution, parentState(execution, otherTransaction), Sp)

StrictSerializability(initialState, transactions, timestamps) ==
  \E execution \in executions(initialState, transactions): \A transaction \in transactions: CT_SSER(timestamps, transaction, execution)

\* Read Committed
CT_RC(transaction, execution) == Preread(execution, transaction)
ReadCommitted(initialState, transactions) == satisfyIsolationLevel(initialState, transactions, CT_RC)

\* Read Uncommitted
CT_RU(transaction, execution) == TRUE
ReadUncommitted(initialState, transactions) == satisfyIsolationLevel(initialState, transactions, CT_RU)


\* tests there exists an execution for `transactions`, that satisfies the isolation level given by `commitTest`
satisfyIsolationLevel(initialState, transactions, commitTest(_,_)) ==
  \E execution \in executions(initialState, transactions): \A transaction \in transactions:
    \* PrintT(<<"try execution:",execution>>) =>
    commitTest(transaction, execution)

Preread(execution, transaction) ==
  \A operation \in SeqToSet(transaction): ReadStates(execution, operation, transaction) /= {}

NoConf(execution, transaction, state) ==
  LET Sp == parentState(execution, transaction)
      delta == { key \in DOMAIN Sp : Sp[key] /= state[key] }
  IN delta \intersect WriteSet(transaction) = {}


Let's look at snapshot isolation; these transactions read from same state "state" and are restricted to write to different variables to avoid overwriting each other as enforced in NoConf.



Comments

Popular posts from this blog

Hints for Distributed Systems Design

Learning about distributed systems: where to start?

Making database systems usable

Looming Liability Machines (LLMs)

Advice to the young

Foundational distributed systems papers

Distributed Transactions at Scale in Amazon DynamoDB

Linearizability: A Correctness Condition for Concurrent Objects

Understanding the Performance Implications of Storage-Disaggregated Databases

Designing Data Intensive Applications (DDIA) Book