Checking Causal Consistency of MongoDB
This paper declares the Jepsen testing of MongoDB for causal consistency a bit lacking in rigor, and goes on to test MongoDB against three variants of causal consistency (CC, CCv, and CM) under node failures, data movement, and network partitions. They also add TLA+ specifications of causal consistency and their checking algorithms, and verify them using the TLC model checker, and discuss how TLA+ specification can be related to Jepsen testing.
This is a journal paper, so it is long. The writing could have been better. But it is apparent that a lot of effort went into the paper.
One thing I didn't like in the paper was the wall-of-formalism around defining causal consistency in Section 2:Preliminaries. This was hard to follow. And I was upset about the use of operational definitions such as "bad patterns" for testing. Why couldn't they define this in a state-centric manner, as in the client-centric database isolation paper?
It turns out that Section 2 did not introduce any new content, rather it was just reviewing the previous model established by the "On verifying causal consistency (POPL'17)" paper by Bouajjani et. al. So the reason that the Preliminaries section fell flat was because it was trying to compress that 22-page Bouajjani paper into a couple of pages without much context. By quickly skimming that POPL'17 paper, I got sufficient context about the area. And boy, did I learn some interesting things about causal consistency.
So we will first look at that POPL'17 paper, and then we will come back to this paper for TLA+ specs for causal consistency and Jepsen testing of MongoDB.
On Verifying Causal Consistency (POPL'17)
This is a good old distributed systems theory paper without any experiments, and it is an immensely useful and practical paper as well. I learned a lot by just skimming this in an hour --yeah, I call that skimming, not reading.
The paper opens with a stunner: Verifying whether all the executions of an implementation are causally consistent is undecidable! They explain this neatly in one paragraph: "This undecidability result might be surprising, since it is known that linearizability (stronger than CC) and eventual consistency (weaker than CC) are decidable to verify in that same setting. This result reveals an interesting aspect in the definition of causal consistency. Intuitively, two key properties of causal consistency are that (1) it requires that the order between operations issued by the same site to be preserved globally at all the sites, and that (2) it allows an operation o1 which happened arbitrarily sooner than an operation o2 to be executed after o2 (if o1 and o2 are not causally related). Those are the essential ingredients that are used in the undecidability proofs (that are based on encodings of the Post Correspondence Problem). In comparison, linearizability does not satisfy (2) because for a fixed number of sites/threads, the reordering between operations is bounded (since only operations which overlap in time can be reordered), while eventual consistency does not satisfy (1)."
But fear not, they also show that in practice this is not a problem. "We prove that reasoning about causal consistency w.r.t. the RWM abstraction becomes tractable under the natural assumption of data independence (i.e., read and write instructions is insensitive to the actual data values that are read or written)."
Data independence implies that it is sufficient to consider executions where each value is written at most once, i.e., differentiated histories. Differentiated histories mean we can determine, only by looking at the operations of a history, from which write each read is reading from. There is no ambiguity, as each value can only be written once on each variable. In practice this comes from a timestamp or versionstamp attached to the data by the write operation.
I have seen this differentiated histories trick simplify and improve the checking of linearizability in practice. And the paper leverages this for checking of causal consistency, and explains the importance of differentiated histories as follows: "In this characterization, the fact that we consider only differentiated executions is crucial. The reason is that all relations used to express bad patterns include the read-from relation that associates with each read operation the write operation that provides its value. This relation is uniquely defined for differentiated executions, while for arbitrary executions where writes are not unique, reads can take their values from an arbitrarily large number of writes. This is actually the source of complexity and undecidability in the non-data independent case."
A bad pattern is a set of operations occurring within an execution in some particular order corresponding to a causal consistency violation. They show that for a given execution, checking that it contains a bad pattern can be done in polynomial time. They also show that for each bad pattern, it is possible to construct effectively an observer (which is a state-machine of some kind) that is able, when running in parallel with an implementation, to detect all the executions containing the bad pattern. The efficiency insight here is that proving causal consistency for any given implementation with differentiated histories reduces to proving its causal consistency for a bounded data domain.
The paper then defines different flavors of causal consistency and relates them to each other formally.
- CC: allows non-causally dependent operations to be executed in different orders by different sites, and decisions about these orders to be revised by each site. This models mechanisms for solving the conflict between non-causally dependent operations where each site speculates on an order between such operations and possibly roll-backs some of them if needed later in the execution, e.g. Bayou and COPS.
- CCv: assumes that there is a total order between non-causally dependent operations and each site can execute operations only in that order (when it sees them). Therefore, a site is not allowed to revise its ordering of non-causally dependent operations, and all sites execute in the same order the operations that are visible to them, e.g., Gentle-Rain, Bolt-On Causal Consistency.
- CM: a site is allowed to diverge from another site on the ordering of non-causally dependent operations, but is not allowed to revise its ordering later on.
Both CCv and CM strengthen CC in independent and incomparable ways. And here are the bad patterns for checking whether a trace satisfies CC, CCv, or CM. Below RF is the ReadFrom relation. PO is program order say from executing in the same thread. The relation CO, defined as (PO \union RF)^+, represents the smallest causality order possible.
TLA+ specifications
Ok, back to the paper at hand. The paper presents TLA+ specifications for CC, CCv, and CM. They mention that model checking histories against CC , CCv , or CM as defined in Fig 7 are prohibitively inefficient. So they go through some optimizations to improve the checking time, which culminates in implementing an efficient partial order enumeration algorithm in Python, and letting TLC call it when necessary.
They use very small traces to check with TLA+ model checking. But they also mention that it is possible to combine/apply this to traces gathered from directing Jepsen to MongoDB.
Extended Jepsen testing
They use Jepsen to get histories from MongoDB, and apply Java based implementation of CC, CCv, CM, for checking causal consistency of these histories.
The experimental results confirm the claim in MongoDB’s documentation that in the presence of node failures or network partitions, causally consistent sessions guarantee causal consistency only for reads with majority readConcern and writes with majority writeConcern. In the absence of node failures or network partitions causal-consistency is guaranteed by all configurations including read concern local and write concern w1.
All of CC variants testing gave identical results: there was no configuration where CC, CCv, and CM checks disagreed with each other. So it was not clear if studying these variants explicitly bought us anything.
Discussion
Ok, let me return to the question I had in the beginning. Is it possible to define causal-consistency in a state-centric manner, rather than via an operational manner, using bad patterns?
I checked the client-centric database isolation paper, and it does not include anything close to causal consistency (which is fair, as this is a consistency property and not an isolation property). The paper presents SER, SI, and ReadCommitted. And of course ReadCommitted could be arbitrarily stale and that is not good. But, I still don't see why it could not be possible to have state-centric definition for causal consistency. Maybe the problem is about how much metadata that would involve, and if there would be a convenient way to represent/refer to that metadata.
Ok, moving on to the next discussion point. Causal consistency is interesting. It provides a nice tradeoff point for consistency space similar to how SnapshotIsolation does for the transactional isolation space. Causal consistency is the strictest consistency level where you can still make progress in the presence of partitions as discussed in the "Consistency, Availability, and Convergence" paper. And causal consistency provides "Read your own writes" which is very helpful and sought after for developing applications. That sounds like a good tradeoff point, no?
Comments