Posts

The Seattle Report on Database Research (2022)

Every 5 years, researchers from academia and industry gather to write a state-of-the-union (SOTU) report on database research. This one was released recently . It is a very readable report, and my summary consists of important paragraphs clipped from the report. Emphasis mine in bolded sentences. I use square brackets for when I paraphrase a long text with a more direct statement. TL;DR: The SOTU is strong (the relational database market alone has revenue upwards of $50B) and growing stronger thanks to boom in cloud computing and machine learning (ML). For the next 5 years, research should scale up to address emerging challenges in data science, data governance, cloud services, and database engines.   What has changed in the last 5 years Over the last decade, our research community pioneered the use of columnar storage, which is used in all commercial data analytic platforms. Database systems offered as cloud services have witnessed explosive growth. Hybrid transactional/analytical pro

Strict-serializability, but at what cost, for what purpose?

Image
Strict-serializability guarantees that transactions appear to occur in an order consistent with the "real-time" ordering of those transactions: If transaction T1 commits before transaction T2 is invoked, then the commit timestamp of T1 precedes the commit timestamp of T2. This is, in fact, the real-time constraint from linearizability , but applied across transactions not just per-key. A strict-serializability system satisfies both serializability (transactions appear to occur as if they are executed one at a time in isolation) and linearizability per key (after all single-key reads/writes are transactions over one item). Below figure is from https://jepsen.io/consistency . However, this is a one-way implication, the other direction does not hold. You can satisfy both serializability per transactions and linearizability per key, but fail to satisfy strict-serializability. (Below I give an example accompanied with a TLA+ specification to check it.) This is because, in strict-

TAOBench: An End-to-End Benchmark for Social Network Workloads

Image
TAOBench is an opensource benchmarking framework that captures the social graph workload at Meta (who am I kidding, I'll call it Facebook). This paper (which will appear at VLDB'2022) studies the production workloads of Facebook's social graph datastore TAO, and distills them to a small set of representative features.  The integrity of TAOBench's workloads are validated by testing them against their production counterparts. The paper also describes several use cases of TAOBench at Facebook. Finally, the paper uses TAOBench to evaluate five popular distributed database systems (Spanner, CockroachDB, Yugabyte, TiDB, PlanetScale). The paper is a potpourri of many subquests. It feels a bit unfocused, but maybe I shouldn't complain because the paper is full to the brim, and provides three papers for the price of one. TAO TAO is a read-optimized geographically distributed in-memory data store that provides access to Facebook's social graph for diverse products and b

Automated Validation of State-Based Client-Centric Isolation with TLA+ (2021)

Image
This work provides a TLA+ formalization of t he state-based isolation model introduced in the PODC 2017 paper. The TLA+ models provided here will be very useful for you if you are a distributed database developer. Using these, you can show your database protocols satisfy the desired isolation levels, and as you extend the design, you can keep checking that the guarantees still hold. I have been suffering from this problem recently. I wrote a TLA+ model for a distributed database design. I thought I would be able to write serializability predicates using TLA+ formulas relatively easily. When I tried doing so, I realized what a mess I am in. I made a couple attempts and found that operation-centric specification of database isolation is hard to define and check. I decided approach the problem sideways and wrote some BaitInvariants to show what behaviors are out-ruled, only to find that what I thought would always be rejected was actually valid serialization behavior in some scenarios.

Amazon DynamoDB: A Scalable, Predictably Performant, and Fully Managed NoSQL Database Service (USENIX ATC 2022)

Image
This paper, which appeared in USENIX ATC'22 last week, describes the evolution of the design and implementation of DynamoDB in response to experiences operating it since its launch in 2012. DynamoDB has massive scale. In 2021, during the 66-hour Amazon Prime Day shopping event, Amazon systems made trillions of API calls to DynamoDB, peaking at 89.2 million requests per second. DynamoDB powers Alexa, Amazon.com sites, and all Amazon fulfillment centers. Many AWS services such as AWS Lambda, AWS Lake Formation, and Amazon SageMaker are built on DynamoDB. Moreover, hundreds of thousands of customer applications also use DynamoDB. First some clarification is in order. DynamoDB != Dynamo. DynamoDB's architecture does not share much with that of the Dynamo system (2007) . DynamoDB uses MultiPaxos for replication, for God's sake. Dynamo was a single-tenant system and teams were responsible for managing their own Dynamo installations. The resulting operational complexity becam

High Throughput Replication with Integrated Membership Management (USENIX ATC 2022)

Image
This paper, which appeared at USENIX ATC 2022, introduces ChainPaxos.  ChainPaxos applies ideas from chain replication to the MultiPaxos protocol. ( Here is an overview of chain replication if you are unfamiliar with it.) Since ChainPaxos is a Paxos protocol, its fault-tolerance is independent of an external coordination service. This allows for continuous execution of operations during reconfigurations, and uncoupling of the system's fault-tolerance from that of an external service. More interestingly, ChainPaxos introduces a local linearizable read operation that can be executed in any replica with no communication overhead, relying only on information used to process update operations. This read can be served by any single replica at the cost of increased latency. ChainPaxos protocol ChainPaxos is MultiPaxos that is communicating using a chain topology. Leveraging the chain, ChainPaxos combines and forwards multiple Multi-Paxos messages in a single message as shown in Figure 3.

OSDI2022, continued

Image
Venue, logistics, and travel The conference was held at the Omni La Costa Resort at Carlsbad CA. Carlsbad is 30 miles from San Diego, still an easy to access place compared to earlier SOSP/OSDI venues. This didn't stop my Lyft driver to complain; he asked me why they are not holding the conference in one of the ample city venues. I told him, this is part of the tradition. The resort was nice and clean. It had splash pools, which was great for people traveling with kids. (I, of course, tried the water slides, they were fun.) The resort also had a huge golf course, around which I ran in the early mornings with my body clock still on eastern time. San Diego is a very exotic place 15 miles from the border with Tijuana Mexico. There are palm trees everywhere. Just seeing the palm trees trigger a visceral relaxing reflex, as confirmed with several other friends. We started calling palm trees, log-structured trees, compared to the regular, run of the mill binary trees we see in the north.

Popular posts from this blog

Graviton2 and Graviton3

Foundational distributed systems papers

Learning a technical subject

Learning about distributed systems: where to start?

Strict-serializability, but at what cost, for what purpose?

CockroachDB: The Resilient Geo-Distributed SQL Database

Amazon Aurora: Design Considerations + On Avoiding Distributed Consensus for I/Os, Commits, and Membership Changes

Warp: Lightweight Multi-Key Transactions for Key-Value Stores

Anna: A Key-Value Store For Any Scale

Your attitude determines your success