Paper Summary: Calvin, Distributed transactions for database systems

- November 14, 2014

Calvin is a transaction scheduling and replication management layer for distributed storage systems. By first writing transaction requests to a durable, replicated log, and then using a concurrency control mechanism that emulates a deterministic serial execution of the log's transaction requests, Calvin supports strongly consistent replication and fully ACID distributed transactions, and manages to incur lower inter-partition transaction coordination costs than traditional distributed database systems.

Calvin emphasizes modularity. The holy trinity in Calvin is: log, scheduler, executor. When a client submits a transaction request to Calvin, this is immediately appended to a durable log, before any actual execution begins. Calvin's scheduling mechanism then processes this request log, deciding when each transaction should be executed in a way that maintains an invariant slightly stronger than serializable isolation: Transaction execution may be parallelized but must be equivalent to a deterministic serial execution in log-specified order. As a result, the log of transaction requests itself serve as an ultimate "source of truth" about the state of a database, which makes the recovery very easy.

I thought Calvin resembles the Tango approach a lot. (I had discussed Tango here recently.) It is almost as if Calvin is Tango's cousin in databases domain. As such, Calvin has similar strengths and disadvantages like Tango. For the advantages: Calvin provides good throughput, but will not get stars for low-latency. Calvin provides scalable replication, and strongly-consistent replication. (After you have one authoritative log, this is not hard to provide anyways.)

The centralized log is the source of all disadvantages in Calvin as well. The transactions need to always go through the centralized log; so there are no truly local transactions. Thus Calvin will perform worse for workloads that have local/non-coordinating workload. So, the TPC-C workload Calvin uses for evaluation is actually best workload to show Calvin's relative performance to other systems.

The Log component

Calvin uses Paxos to achieve availability of the log by replicating it consistently. A group of front-end servers collect client requests into batches. Each batch is assigned a globally unique ID and written to an independent, asynchronously replicated block storage service such as Voldemort or Cassandra. Once a batch of transaction requests is durable on enough replicas, its GUID is appended to a Paxos “MetaLog”. Readers can then reassemble the global transaction request log by concatenating batches in the order that their GUIDs appear in the Paxos MetaLog.

Batching trades off throughput with low-latency: you cannot have transaction latency lower than the batching duration (epoch). So an epoch is a guaranteed overhead on latency for every transaction.

The scheduler component

The Scheduler component (which is a centralized component) examines a transaction before it begins executing and decides when it is safe to execute the whole transaction, then hands the transaction request off to the storage backend for execution with no additional oversight. The storage backend therefore does not need to have any knowledge of the concurrency control mechanism or implementation. Once a transaction begins running, the storage backend can be sure that it can process it to completion without worrying about concurrent transactions accessing the same data. However, each storage backend must provide enough information prior to transactional execution in order for the scheduler to make a well-informed decision about when each transaction can safely (from a concurrency control perspective) execute.

For transaction execution, the scheduler still uses locks. Deterministic locking ensures concurrent execution equivalent to the serial transaction order in the log, and also makes deadlock impossible (and the associated nondeterministic transaction aborts).

Questions

I don't know why Calvin doesn't adopt Tango style log maintenance: Using chain replication to improve throughput of the centralized log. This might actually help Calvin.

Similarly Calvin should also adopt selective/custom stream replication per replica feature in Tango. That would implement the flexibility/generality of Calvin.