Paper Summary: Coordination Avoidance in Database Systems

Serializing transactions is sufficient for correctness, but it is not necessary for all operations of all applications. The downside of serialization is that it kills scalability and is overkill in many cases.

This paper (which will appear in VLDB'15) has the following insight: Given knowledge of application transactions and correctness criteria (i.e., invariants), it is possible to avoid this over-coordination of serializability and execute some transactions without coordination while still preserving those correctness criteria (invariants).

In particular the authors propose the concept of "invariant confluence" to relax the use of serialization for some operations of a coordination-requiring application. By operating on application-level invariants over database states (e.g., integrity constraints), the invariant confluence analysis provides a necessary and sufficient condition for safe, coordination-free execution. When programmers specify application invariants, this analysis allows databases to coordinate only when concurrency may violate those application invariants.

So how do they get the application invariants? "Many production databases today already support invariants in the form of primary key, uniqueness, foreign key, and row-level check constraints. We analyze this and show that many are invariant-confluent, including forms of foreign key constraints unique value generation, and check constraints, while others, like primary key constraints are, in general, not."

They claim that many common integrity constraints found in SQL and standardized benchmarks are invariant confluent, allowing order-of-magnitude performance gains over coordinated execution. To substantiate this claim, they apply invariant confluence analysis to a database prototype and show 25-fold improvement over prior TPC-C New-Order performance on a 200 server cluster. They find that 10 out of 12 of TPC-C's invariants are invariant-confluent, under the workload transaction.

The invariant-confluence model

Invariant-confluence captures a simple (informal) rule: coordination can be avoided if and only if all local commit decisions are globally valid. (In other words, the commit decisions are composable.)

They model transactions to operate over independent logical snapshots of database state. Transaction writes are applied at one or more snapshots initially when the transaction commits and are then integrated into other snapshots asynchronously via a merge operator that incorporates those changes into the snapshot's state. "Merge" is simply the set union of versions, and is used to capture the process of reconciling divergent states.

In effect, this model states that each transaction can modify its replica state without modifying any other concurrently executing transactions' replica state. Replicas therefore provide transactions with partial snapshot views of global state. They define local validity/consistency as a safety property, but global replica consistency is not defined as a safety property. Instead it is defined as a liveness property under the name "convergence".

(Formal definition of invariant-confluent:)
A set of transactions T is I-confluent with respect to invariant I if, for all I-T-reachable states Di, Dj with a common ancestor state, Di union Dj is I-valid.

Applying the invariant-confluence concept

As the definition implies, I-confluence holds for specific combinations of invariants and transactions. Removing a user from the database is I-confluent with respect to the invariant that the user IDs are unique. However, two transactions that remove two different users from the database are not I-confluent with respect to the invariant that there exists at least one user in the database at all times. As another example, uniqueness is not I-confluent for inserts of unique values. However, reads and deletions are both I-confluent under uniqueness invariants: reading and removing items cannot introduce duplicates.

Table 3 summarizes the 12 invariants found in TPC-C benchmark as well as their I-confluence analysis results as determined by Table 2. They classify the invariants into 3 broad categories: materialized view maintenance, foreign key constraint maintenance, and unique ID assignment.

Figure 5 shows the concurrency/throughput improvement made possible by applying the invariant-confluence analysis to the TPC-C workload.

Related work

This paper is an extension of the "CALM" approach that uses monotonicity and convergence concepts to relax the coordination needs of applications.

The "Scalable commutatitivity rule" paper, which was one of the best papers in SOSP'13, is a closely related work. In order to relax serializability and boost concurrency, that work prescribes exploiting the commutativity of operations. Another related work that exploits commutativity to relax serializability is the "Making Geo-Replicated Systems Fast as Possible, Consistent when Necessary" paper. The invariant confluence analysis concept is more general (but probably harder to apply) than the commutativity rule approach, because while commutativity is sufficient for correctness it is not always necessary.

In our previous work, the slow-fast paper (2010), we had also used the concept of "invariant-relaxed serializability" in distributed systems domain, particularly in application to wireless sensor/actor network concurrency control. (Maybe somewhat of a misnomer we called a slow action as one which can be executed in a concurrent/uncoordinated/nonatomic manner, and a fast action as one which needs to be executed in an atomic/coordinated/no-conflicts manner.)

I suspect our slow-fast approach used a less aggressive optimization than invariant-confluence: Slow-fast did not require/inspect program invariants explicitly, it only required access to the program actions (i.e., transactions). The slow-fast approach inferred that the invariant holds when program actions (transactions) execute atomically. (Invariant-confluence may potentially use even a weaker invariant than what slow-fast used.)

Our slow-fast analysis inspects the program actions, which are precondition guarded assignment statements, and determined for which actions the atomicity can be relaxed so that they can execute in an uncoordinated manner. Our finding was that if the precondition of a program action is "locally-stable" (i.e., this precondition predicate cannot get falsified by execution of other program actions), then it is safe to execute this program action in an uncoordinated/nonatomic manner. (This check probably implies "mergeability" of the state.) Our analysis also prescribed ways to break a coordination requiring action into two smaller actions to make it coordination free.

The "coordination avoidance in databases" paper applies the "invariant-relaxed serializability" idea in a more restricted and more useful domain, database transactions, and demonstrates the idea in a very practical way.


Popular posts from this blog

Graviton2 and Graviton3

Foundational distributed systems papers

Learning a technical subject

Your attitude determines your success

Learning about distributed systems: where to start?

Progress beats perfect

CockroachDB: The Resilient Geo-Distributed SQL Database

Warp: Lightweight Multi-Key Transactions for Key-Value Stores

Amazon Aurora: Design Considerations + On Avoiding Distributed Consensus for I/Os, Commits, and Membership Changes

Anna: A Key-Value Store For Any Scale