Extracting more concurrency from Distributed Transactions

This paper appeared in OSDI'14. The authors are: Shuai Mu, Yang Cui, Yang Zhang, Wyatt Lloyd, Jinyang Li. The paper, presentation slides and video are accessible here.

The paper proposes a concurrency control protocol for distributed transactions, and evaluates its performance comparing with two-phase locking (2PL) and optimistic concurrency control (OCC).

The protocol

The protocol introduced, ROCOCO (ReOrdering COnflicts for COncurrency) is targeted for extracting more concurrency under heavily contended workload than 2PL and OCC can handle. In fact, ROCOCO's improvements over OCC and 2PL comes after the peak throughput point of even ROCOCO. One of the questions after the OSDI presentation was about this. "That region where the system is thrashing is not a region you want to be. Why would you not employ admission control to refuse extra workload that pushes the system past peak performance?"
ROCOCO assumes that a distributed transaction consists of a set of stored procedures called pieces. Each piece accesses one or more data items stored on a single server using user-defined logic. Thus, each piece can be executed atomically with respect to other concurrent pieces by employing proper local concurrency control.

ROCOCO achieves safe interleavings without aborting or blocking transactions using two key techniques: 1) deferred and reordered execution using dependency tracking; and 2) offline safety checking based on the theory of transaction chopping.

ROCOCO's transaction reordering idea is adopted from the ePaxos protocol introduced a couple years ago. This is a neat idea. The first phase is sort of like a dry run for the transaction. Dependencies with other concurrently executing transactions are learned. In the second phase, the dependent transactions are synchronized. They are forced to wait for each other and executed that way.


This approach is basically pipelining the transactions. This is also similar to what the Calvin does with its log-based approach. Pipelining helps for throughput of course, but it also introduces a drawback.

Unlike 2PL and OCC which executes a depended-upon transaction to completion before allowing its dependent/conflicting transactions to execute, ROCOCO is deciding on an order and pipelining the execution of these conflicting transactions in some determined order. However, if the first transaction of these pipeline-executed transactions does not complete for some reason or due to a fault and needs to be rolled back, this also requires rolling-back the remaining transactions in the pipeline that depended on this transaction. This is a problem 2PL and OCC did not have.

This basic reordering protocol is when some of the transaction pieces/fragments are deferrable. For transaction fragments that are immediate (whose outputs are inputs to other pieces in a transaction), the reordering protocol is inapplicable, and the paper uses an offline checker to avoid conflicts in such situations.  The Offline checker works in following steps:
1. It constructs an S-C graph based on transaction chopping. Each edge in the graph is either a Sibling edge (an edge formed for pieces of same transaction instance) or a Conflict edge (an edge formed by pieces which access the same database table and any one of the piece issues a write).
2. Each vertex in the graph is either tagged as immediate (I) or deferrable (D). A conflict edge  can be an I-I edge or a D-D edge.
3. The checker observes all the S-C cycles formed by the graph. SC-cycles represent potential non-serializable interleavings. However, if an SC-cycle contains at least one D-D edge, ROCOCO can reorder the execution of the D-D edge's pieces to break the cycle and ensure serializability. For an unreorderable SC-cycle with all I-I C-edges, the checker proposes to merge those pieces in the cycle belonging to the same transaction into a larger atomic piece. ROCOCO relies on traditional distributed concurrency control methods such as 2PL or OCC to execute merged pieces.

Evaluation

ROCOCO is  implemented as an in-memory key-value store with 20K C++ code  and evaluated using a scaled TPC-C benchmark in comparison to OCC and 2PL. Given that Calvin  is a closely related work (because it also orders transactions and pipelines their execution), it would be good to see a comparison to Calvin, but the paper does not include that.

Some miscellaneous thoughts

In the introduction, the following paragraph is provided as a motivation for ROCOCO.
Unfortunately, contention is not rare in large-scale OLTP applications. For example, consider a transaction where a customer purchases a few items from a shopping website. Concurrent purchases by different customers on the same item create conflicts. Moreover, as the system scales—i.e., the site becomes more popular and has more customers, but maintains a relatively stable set of items—concurrent purchases to the same item are more likely to happen, leading to a greater contention rate.
The OSDI presentation also includes the same example as motivation. I don't think this is the right/best way to argue that contentions will happen, because this is a faux contention, and not an inherent contention. When the number of a sale item is very high (which is almost always the case), why do we care to carefully check the number of items remaining? Conflict-Free Replicated Data Types (CRDT) approach helps here to avoid conflicts easily. (For some nice papers on this see: CRDT1, CRDT2, CRDT3) The coordination avoidance in distributed databases paper also argues for  avoiding coordination when all local commit decisions are globally valid (in other words, when the commit decisions are composable).

Comments

Popular posts from this blog

Hints for Distributed Systems Design

Learning about distributed systems: where to start?

Making database systems usable

Looming Liability Machines (LLMs)

Advice to the young

Foundational distributed systems papers

Distributed Transactions at Scale in Amazon DynamoDB

Linearizability: A Correctness Condition for Concurrent Objects

Understanding the Performance Implications of Storage-Disaggregated Databases

Designing Data Intensive Applications (DDIA) Book