Posts

Consus: Taming the Paxi

Image
This is a 2016 paper by Robert Escriva and Robbert van Renesse. As far as I can see the paper did not get published in a conference or journal, but it is available on arxiv . The code (C++) is also available as opensource. Consus introduces a leaderless-consensus-based commit protocol that completes in three one way message delays in the wide area. If your data centers were equidistant from each other with a uniform round trip time of 100ms between any pair of data centers, that means Consus could commit each transaction with 150ms of wide area latency. In this setup, each data center has a full replica of the data and a transaction processing engine. The transaction processor executes the transaction against the local data center, and through the Consus-commit protocol it achieves serializable transaction execution. (The fully replicated requirement can be relaxed, I think, if the voting is on the outcome of partial transactions in each datacenter.) Commit protocol The commit protocol...

Why should this paper be published?

Here is a simple suggestion that will immensely improve the publishing/reviewing experience. Every submission should include a subsection called: "Why should this paper be published?" (WSTPBP). Here, the authors should provide an objective/falsifiable statement of why this paper deserves publication. This may include statements like: this feature of the system is novel (no previous work has it) this is a more {fault-tolerant, secure, efficient, simple} solution than previous solutions This should be a targeted minimal set. If one of these statements is falsified by the reviewers, it could be a valid reason to reject the paper. In that sense, WSTPBP is the authors' contract with the reviewers. The authors put a stake in the ground, saying that the reviewers should focus on the claims in WSTPBP, rather than giving other pretexts, diverging to unclaimed features to reject the paper. This focuses the reviewing process as the reviewers evaluate the paper based on WSTPBP. If th...

Feral Concurrency Control: An Empirical Investigation of Modern Application Integrity

Image
The rise of data-intensive “Web 2.0” Internet services has led to popular new Object-Relational Mapping (ORM) programming frameworks. Rather than adopting the use of traditional transactional programming primitives, the ORM framework developers use feral (application-level) mechanisms for maintaining database integrity. They specify declarative correctness criteria/invariants (through validations and associations) and have the ORM enforce the criteria on their behalf. This paper (Sigmod 2015) examines the implications of this impedance mismatch between databases and modern ORM frameworks for application integrity. The paper investigates Rails in depth (and also surveys six additional frameworks) to evaluate the effectiveness of these feral mechanisms in practice and to quantify data integrity violations experimentally. Implications of these for database research are discussed at the end. Ruby on Rails Rails divides application code into Model-View-Controller architecture. Building a ...

Warp: Lightweight Multi-Key Transactions for Key-Value Stores

Image
This paper introduces a simple yet powerful idea to provide efficient multi-key transactions with ACID semantics on top of a sharded NoSQL data store. The Warp protocol prevents serializability cycles forming between concurrent transactions by forcing them to serialize via a chain communication pattern rather than using a parallel 2PC fan-out/fan-in communication. This avoids hotspots associated with fan-out/fan-in communication and prevents wasted parallel work from contacting multiple other servers when traversing them in serial would surface an invalidation/abortion early on in the serialization. I love the elegance of this idea. As far as I can see, this paper did not get published in any conference. The authors published a followup paper to this in NSDI 16, called "The Design and Implementation of the Warp Transactional Filesystem." But that paper does not talk about the internals of Warp protocol like this archive report, rather talks about the Warp Transactional File...

Anna: A Key-Value Store For Any Scale

Image
This paper (ICDE'18) introduces Anna, a CALM / CRDT implementation of a distributed key-value system both at the data structure level as well as system architecture and transaction protocol levels. Anna is a partitioned, multi-mastered key-value system that achieves high performance and elasticity via wait-free execution and coordination-free consistency. Anna employs coordination-free actors that perform state update via merge of lattice-based composite data structures. I love the strongly opinionated introduction of this paper. This is what papers should be about: opinionated, challenging conventions, making bets, and doing hypothesis testing in the small. Conventional wisdom says that software designed for one scale point needs to be rewritten when scaling up by 10x. Anna sets out to disprove this by showing how a key-value storage (KVS) system can be architected to scale across many orders of magnitude. (Spoiler Anna can give you only upto causal consistency, but cannot pro...

An Empirical Evaluation of In-Memory MultiVersion Concurrency Control

Image
This is a closely related paper to the "Evaluation of Distributed Concurrency Control" paper we just covered . This paper is also from VDLB'17, and also has Andy Pavlo as an author . This time we zoom-in on the multiversion concurrency control protocols, and investigate their performances on a single node database management system (DBMS) with in-memory multicore execution. (It is worth reminding that Andy's database courses, both the intro and advanced courses , are on YouTube and are worth checking to refresh your database knowledge.) Although multiversion concurrency control (MVCC) is an old idea (going back to 1970s), it is used today by almost every major relational DBMS as shown in Table1.  In a single-version system transactions always overwrite a tuple with new information whenever they update it. In contrast, multiversioning allows read-only transactions to access older versions of tuples without preventing read-write transactions from generating newer vers...

An Evaluation of Distributed Concurrency Control

Image
This VLDB'17 paper investigates the effects of distributed concurrency control protocols on performance of distributed transactional databases. They evaluate six protocols: two-phase locking with NO_WAIT and WAIT_DIE flavors, timestamp ordering (TIMESTAMP), multi-version concurrency control (MVCC), optimistic concurrency control (OCC), and a deterministic protocol Calvin. For the evaluations, they use an in-memory distributed database evaluation framework called Deneva ( https://github.com/mitdbg/deneva ), providing an apples-to-apples comparison between each protocol. This is a delightful paper. After we covered several distributed databases in recent posts , this paper helps us consider the performance implications of concurrency control schemes they use and how they would fare under different workloads. The paper is very well written and easy to follow. In my summary I lifted a lot of the text from the paper with little editing. The six transaction protocols considered The pape...

Popular posts from this blog

Hints for Distributed Systems Design

My Time at MIT

Scalable OLTP in the Cloud: What’s the BIG DEAL?

Foundational distributed systems papers

Advice to the young

Learning about distributed systems: where to start?

Distributed Transactions at Scale in Amazon DynamoDB

Making database systems usable

Looming Liability Machines (LLMs)

Analyzing Metastable Failures in Distributed Systems