Morty: Scaling Concurrency Control with Re-Execution
This EuroSys '23 paper reads like an SOSP best paper. Maybe it helped that EuroSys 2023 was in Rome. Academic conferences are more enjoyable when the venue doubles as a vacation.
The Problem
Morty tackles a fundamental question: how can we improve concurrency under serializable isolation (SER), especially without giving up on interactive transactions? Unlike deterministic databases (e.g., Calvin) that require transactions to declare read and write sets upfront, Morty supports transactions that issue dynamic reads and writes based on earlier results.
Transactional systems, particularly in geo-replicated settings, struggle under contention. High WAN latency stretches transaction durations, increasing the window for conflicts. The traditional answer is blind exponential backoff, but that leads to low CPU utilization. TAPIR and Spanner replicas often idle below 17% under contention as Morty's evaluation experiments show.
Morty's approach to tackle the problem is to start from first principles, and investigate what limits concurrency under serializability? For this, Morty formalizes serialization windows, defined per transaction and per object, stretching from the commit of the value read to the commit of the value written. Serializability requires these windows not to overlap. (Note that avoiding overlaps is necessary but not sufficient for SER: you also need to prevent cycles through read-only transactions or indirect dependencies, which Morty addresses at commit time validation checks.)
Figures 1 and 2 illustrate these serialization windows. With re-execution, Morty commits T2 with minimal delay, instead of aborting and retrying. Figure 3 shows another case where re-execution avoids wasted work.
Morty Design
Morty's re-execution mechanism hinges on two ideas: read unrolling and a priori ordering.
Read unrolling allows Morty to selectively rewind and recompute parts of a transaction that depended on outdated reads. Rather than aborting the entire transaction, Morty re-executes just the stale portion. This is possible because transactions are written in continuation-passing style (CPS), which makes control flow and data dependencies explicit. CPS is common in asynchronous programming (JavaScript, Go, Java, Python, and libraries like NodeJS, LibEvent, and Tokio) and it maps well to networked databases like Morty.
Morty’s CPS API supports re-execution directly. Get(ctx, key, cont) reads a key and resumes at cont. Put(ctx, key, val) tentatively writes. Commit(ctx, cont) initiates prepare and finalize. Abort drops the execution. Abandon means transaction is re-executed (often partially) using a continuation from an earlier point. Re-execution reuses the context, shifts the stale read forward, and resumes execution.
A priori ordering assigns each transaction a speculative timestamp at arrival, defining a total order a la MVTSO. This order is not revised, even if clocks are skewed or messages are delayed. Instead, if execution violates the speculative order (e.g., a read misses a write that should've come earlier), Morty detects the conflict and re-executes the transaction to realign with the original order. The system adapts execution to fit the speculative schedule, not vice versa. The paper claims aborts are rare since re-execution usually succeeds.
I think a key idea in Morty is that contrary to most approaches, Morty ignores read validity (that committed transactions only observe committed data) during execution to expose more concurrency to transactions. It exposes both committed and uncommitted write to transactions by leveraging MVTSO and allows reads from uncommitted versions. These speculative reads are later validated at prepare-time prior to commit. If a read depended on a write that never committed, or missed a newer write, Morty re-executes the transaction (through abondon call) or aborts it as a last resort.
In addition to serialization windows, Morty defines validity windows to measure how long a transaction waits for its inputs to commit. A transaction Ti's validity window on object x starts when its dependency commits and ends when Ti commits. Like serialization windows, overlapping validity windows are disallowed. But unlike serialization windows, Morty doesn't try to align validity windows, and instead focuses on minimizing their span. Long validity windows mean low throughput. Morty shortens validity windows by avoiding unnecessary delays between reads and commits, preventing cascading speculative reads, and favoring re-execution over abort-and-retry.
Re-execution typically occurs during the commit protocol, when replicas must check commit status across a quorum. If they detect a stale read or violated serialization window, they trigger re-execution before finalizing. Validation checks include:
- Reads didn't miss writes.
- Other transactions didn't miss our writes.
- Reads match committed state (no dirty reads).
- No reads from garbage-collected transactions.
- If all pass, replicas vote to commit. Otherwise, they vote to abandon and may supply a new value to trigger re-execution.
But why do replicas vote at all? Because Morty doesn't use Raft-style replica groups, with a leader calling the shots. In contrast to Raft-groups approach, Morty doesn't have a central log or a leader for serializing/ordering all commands. It is closer to TAPIR, and it uses timestamps to assign speculative order. By integrating concurrency control with replication, Morty aims to improve throughput under contention and achieve low-latency geo-replication. So, quorum-based voting ensures consistency and fault-tolerance as in TAPIR.
Voting ensures that a commit is durable across failures, visible to a majority, and recoverable even if the coordinator crashes. Without this, there's no way to guarantee correctness in a crash or partition.
Recovery is still tricky. Morty replicates across 2f+1 nodes and tolerates f failures. Coordinators may stall, so Morty uses a Paxos-style recovery protocol with view changes: any replica can step up and finalize the commit decision for a failed coordinator. This isn't trivial as it requires care to avoid split-brain and maintain consistency.
Morty's re-execution resembles CockroachDB’s read-refresh a bit. CRDB refreshes read timestamps if read spans haven't been overwritten, but it doesn't re-execute application logic. If one key's value changes, Morty rewinds only the dependent continuation. In contrast to CRDB, which must restart the whole transaction if refresh fails, Morty semantically rewinds and reruns logic with new values.
Evaluation
The results are impressive. On TPC-C, Morty achieves 7.4x the throughput of Spanner-style-TAPIR, 4.4x of TAPIR, and 1.7x of MVTSO. On the high-contention Retwis benchmark, Morty delivers 96x throughput over TAPIR.
Morty scales with CPU. On Zipfian Retwis, it grows from 7.8k to 35.3k txn/s with 20 cores. Spanner and TAPIR plateau early (at 17% CPU utilization) due to frequent aborts and exponential backoff.
Conclusion
Morty is one of the most technically rich papers on serializability in recent years. It's dense and demanding. It assumes deep familiarity with concurrency control, replication, and async programming. But for those in the distributed systems and databases intersection, Morty is a very rewarding read.
One gripe: the code link is broken. https://github.com/matthelb/morty/
Comments