Showing posts from April, 2024

OLTP Through the Looking Glass, and What We Found There

This paper appeared in Sigmod 2008.  The goal of the paper is to rethink the Online Transaction Processing (OLTP) database architecture (which remained largely unchanged from late 1970s) and to test out a streamlined/stripped-out in-memory database architecture. To this end, the paper dissects where time is spent inside of a single node OLTP database system, and carefully measures the performance of various stripped down variants of that system. Motivation OLTP databases were optimized for the computer technology of the 1970s-80s.  Multi-threaded execution was adopted to allow other transactions to make progress while one transaction waits for data to be fetched from the slow, rotating disk drive. Locking-based concurrency control was adopted since conflicts were inevitable in these disk-bound transactions, and since the small dataset sizes and the workloads of the time resulted in extremely "hot" keys. The computing world now is quite a different environment. Fast SSDs and a

Chardonnay: Fast and General Datacenter Transactions for On-Disk Databases (OSDI'23)

This paper appeared in OSDI'23. I haven't read the paper, and I am going by the two presentations I have seen of the paper, and by a quick skim of the paper. The paper has a cool idea, and I decided I should not procrastinate more waiting to read the paper in detail, and instead just capture and communicate this idea. Datacenter networking got very fast, much faster than SSD/disk I/O. That means, rather than two-phase commit (2PC), cold reads from SSD/disk is the dominant latency. Chardonnay banks on this trend in an interesting manner. It executes read/write transactions in "dry run" mode first, just to discover the readset and writeset and pin them to main memory. Then in "definitive" mode, it executes these transactions using two-phase-locking (2PL) and 2PC swiftly because everything (mostly) is in-memory. Having learned the readset and writeset during dry run also allows definitive execution achieve deadlock-freedom (and prevent aborts due to deadlock a

A critique of ANSI SQL Isolation layers (Transaction Processing Book followup)

This paper is from Sigmod 1995. As the title says, it critiques the ANSI SQL 92 standard (book) with respect to problems in isolation layer definitions. This is also a good follow up to t he Transaction Book chapter 7 where we reviewed the isolation layers . In fact, here Gray (with co-authors) gets a chance to amend the content in the Transaction Book with recent developments and better understanding/mapping of isolation layers to the degrees 0-4. Snapshot isolation (SI) finally makes its debut here in Gray's writing. Why so late? As we discussed earlier in the 80s the database keys were hot, and people rightly ignored optimistic concurrency control as it was not suitable for those workloads. Moreover, memory space was very scarce commodity, and multi-version (as in snapshot storage) was also off the table. With its increased adoption (with MySQL's use of SI around the corner) we see coverage of snapshot isolation in Gray's writing.  The paper is very well written, and is

Implementation of Cluster-wide Logical Clock and Causal Consistency in MongoDB

This paper (SIGMOD 2019) discusses the design of causal consistency and cluster-wide logical clock for MongoDB.  Causal consistency preserves Lamport's happened-before (transitive and partial) order of events in a distributed system. If an event A causes another event B, causal consistency ensures that every process in the system observes A before observing B. Causal consistency guarantees read-your-writes, write-follows-reads, monotonic reads, and monotonic writes. Causal consistency is the strictest consistency level where the system can still make progress (always-available one-way convergence) in the presence of partitions as discussed in  the "Consistency, Availability, and Convergence" paper . This makes causal-consistency a very attractive model for modern distributed applications. The paper provides a simple and balanced design for causal-consistency in MongoDB. The ideas are simple, yet the implementation and operationalizing (backw

Popular posts from this blog

Learning about distributed systems: where to start?

Hints for Distributed Systems Design

Foundational distributed systems papers

Metastable failures in the wild

The demise of coding is greatly exaggerated

The end of a myth: Distributed transactions can scale

Scalable OLTP in the Cloud: What’s the BIG DEAL?

SIGMOD panel: Future of Database System Architectures

Why I blog

There is plenty of room at the bottom