Showing posts from November, 2022

OceanBase: A 707 Million tpmC Distributed Relational Database System

This paper appeared in VLDB2022. It presents the design of OceanBase, a scale-out multitenant relational database deployed on tens of thousands of servers at The design of OceanBase is not impressive. It looks like OceanBase uses a standard run-off-the-mill Paxos groups architecture. Maybe there are some interesting work going on at integration with storage. Unfortunately, the paper is poorly written. It talks and boasts about many things, some repeatedly. But it doesn't give enough detail about any parts. For example, even the type of concurrency control protocol used is not discussed in the paper. Turns out it uses a "MVCC" based approach to concurrency control. ( DBDB's catalog on OceanBase gives a good overview of OceanBase.) The paper would be much more useful and enjoyable if it focused on a certain component of OceanBase and did a thorough walk through on that part. For those interested in diving deeper, OceanBase has a Github repo here . The open

Highly Available Transactions: Virtues and Limitations

This is a VLDB 2014 paper from Peter Bailis, Aaron Davidson, Alan Fekete, Ali Ghodsi, Joseph M. Hellerstein, and Ion Stoica. This is a great paper, because it asked an important question that helped advanced our understanding of isolation properties with respect to high-availability. So here is the context. As part of his PhD, Peter Bailis started think about the benefits of relaxed consistency guarantees, and how can we leverage them. Peter's Probabilistic Bounded Staleness work is how things started. He also investigated eventual consistency limitations, and bolt-on causal consistency on that front. This paper brings that relaxed guarantees effort from the per-key consistency to the transaction isolation domain, and investigates Highly Available Transactions (HAT).  Remember this clickable chart from Jepsen I refer to often? The origin of that chart is the Figure 2 in this paper. This is what I meant when I said this paper advanced our understanding. This is a very subtle topic

Practical uses of synchronized clocks in distributed systems

This paper is from 1991, by Barbara Liskov. To set the context, viewstamped replication paper was published at 1988, and the Paxos tech report (which failed to publish) came in 1989. This paper also comes only 3 years after the NTP RFC was written. Although this is an old paper, it is a timely (!) one. The paper discusses how clocks can be used for improving the performance of distributed systems. This paper is a visionary paper, ahead of its time. It is possible to criticize the paper by saying it was too optimistic in assuming time synchronization has arrived (it took another 2 decades or so for time synchronization to make it make it). But going off with that optimistic premise, the paper made so many remarkable contributions. The next noteworthy update on the practical use of synchronized clocks came 21 years later in 2012 with Spanner . Sanjay Ghemawat on the Spanner paper was Liskov's student that worked on the Harp project. This brings me to the next impressive thing about

Timestamp-based Algorithms for Concurrency Control in Distributed Database Systems

This paper by Bernstein and Goldman appeared in VLDB 1980 . Yes, 1980,  more than 40 years ago. This may be the oldest paper I wrote a summary for. The paper is written by a typewriter! I really didn't like reading 10+ pages in coarse grained typewritten font. I couldn't even search the pdf for keywords. Yet, the paper talked about distributed databases, and laid out a forward looking distributed database management system (DDBMS) architecture with distributed transaction managers (TMs) and data managers (DMs), which is very relevant and in use even today. I really loved the forward looking vision and clear thinking and presentation in the paper. There are two main approaches to concurrency control, locking based (two phase locking) or timestamp order based. This paper builds up on many previous work (all of which happened in 1976-1980) and provides a framework to reason about and categorize timestamp based concurrency control protocols (basic, Thomas Write Rule, Multiversion,

Popular posts from this blog

Learning about distributed systems: where to start?

Hints for Distributed Systems Design

Foundational distributed systems papers

Metastable failures in the wild

Scalable OLTP in the Cloud: What’s the BIG DEAL?

The end of a myth: Distributed transactions can scale

Always Measure One Level Deeper

Dude, where's my Emacs?

There is plenty of room at the bottom

Know Yourself