Posts

Warp: Lightweight Multi-Key Transactions for Key-Value Stores

Image
This paper introduces a simple yet powerful idea to provide efficient multi-key transactions with ACID semantics on top of a sharded NoSQL data store. The Warp protocol prevents serializability cycles forming between concurrent transactions by forcing them to serialize via a chain communication pattern rather than using a parallel 2PC fan-out/fan-in communication. This avoids hotspots associated with fan-out/fan-in communication and prevents wasted parallel work from contacting multiple other servers when traversing them in serial would surface an invalidation/abortion early on in the serialization. I love the elegance of this idea. As far as I can see, this paper did not get published in any conference. The authors published a followup paper to this in NSDI 16, called "The Design and Implementation of the Warp Transactional Filesystem." But that paper does not talk about the internals of Warp protocol like this archive report, rather talks about the Warp Transactional File

Anna: A Key-Value Store For Any Scale

Image
This paper (ICDE'18) introduces Anna, a CALM / CRDT implementation of a distributed key-value system both at the data structure level as well as system architecture and transaction protocol levels. Anna is a partitioned, multi-mastered key-value system that achieves high performance and elasticity via wait-free execution and coordination-free consistency. Anna employs coordination-free actors that perform state update via merge of lattice-based composite data structures. I love the strongly opinionated introduction of this paper. This is what papers should be about: opinionated, challenging conventions, making bets, and doing hypothesis testing in the small. Conventional wisdom says that software designed for one scale point needs to be rewritten when scaling up by 10x. Anna sets out to disprove this by showing how a key-value storage (KVS) system can be architected to scale across many orders of magnitude. (Spoiler Anna can give you only upto causal consistency, but cannot pro

An Empirical Evaluation of In-Memory MultiVersion Concurrency Control

Image
This is a closely related paper to the "Evaluation of Distributed Concurrency Control" paper we just covered . This paper is also from VDLB'17, and also has Andy Pavlo as an author . This time we zoom-in on the multiversion concurrency control protocols, and investigate their performances on a single node database management system (DBMS) with in-memory multicore execution. (It is worth reminding that Andy's database courses, both the intro and advanced courses , are on YouTube and are worth checking to refresh your database knowledge.) Although multiversion concurrency control (MVCC) is an old idea (going back to 1970s), it is used today by almost every major relational DBMS as shown in Table1.  In a single-version system transactions always overwrite a tuple with new information whenever they update it. In contrast, multiversioning allows read-only transactions to access older versions of tuples without preventing read-write transactions from generating newer vers

An Evaluation of Distributed Concurrency Control

Image
This VLDB'17 paper investigates the effects of distributed concurrency control protocols on performance of distributed transactional databases. They evaluate six protocols: two-phase locking with NO_WAIT and WAIT_DIE flavors, timestamp ordering (TIMESTAMP), multi-version concurrency control (MVCC), optimistic concurrency control (OCC), and a deterministic protocol Calvin. For the evaluations, they use an in-memory distributed database evaluation framework called Deneva ( https://github.com/mitdbg/deneva ), providing an apples-to-apples comparison between each protocol. This is a delightful paper. After we covered several distributed databases in recent posts , this paper helps us consider the performance implications of concurrency control schemes they use and how they would fare under different workloads. The paper is very well written and easy to follow. In my summary I lifted a lot of the text from the paper with little editing. The six transaction protocols considered The pape

Calvin: Fast Distributed Transactions for Partitioned Database Systems

Image
The Calvin paper appeared in Sigmod 2012. I had read this paper before, and seen it covered in a reading group, but I never got to write a post about it, and I'm remedying that today. I think the name Calvin comes as a pun on the predestination doctrine in Calvinism . Calvin is all about pre-ordering transactions in logs in batches, and then deterministically executing this log in the same order in replicas. The key technical feature that allows Calvin scalability in the face of distributed transactions is a deterministic ordering/locking mechanism that enables the elimination of distributed commit protocols. Since the schedulers get a good look of which transactions are coming next in this log, they can  perform optimizations by pulling some independent transactions locally and executing them in parallel in worker partitions with no ill effect. Calvin achieves good performance thanks to the batching and optimizations pulled by the schedulers by looking ahead to transactions in th

Popular posts from this blog

Graviton2 and Graviton3

Foundational distributed systems papers

FoundationDB: A Distributed Unbundled Transactional Key Value Store (Sigmod 2021)

Anna: A Key-Value Store For Any Scale

Your attitude determines your success

Learning a technical subject

Progress beats perfect

Learning about distributed systems: where to start?

Cores that don't count

Amazon Aurora: Design Considerations + On Avoiding Distributed Consensus for I/Os, Commits, and Membership Changes