Showing posts from July, 2022

TAOBench: An End-to-End Benchmark for Social Network Workloads

TAOBench is an opensource benchmarking framework that captures the social graph workload at Meta (who am I kidding, I'll call it Facebook). This paper (which will appear at VLDB'2022) studies the production workloads of Facebook's social graph datastore TAO, and distills them to a small set of representative features.  The integrity of TAOBench's workloads are validated by testing them against their production counterparts. The paper also describes several use cases of TAOBench at Facebook. Finally, the paper uses TAOBench to evaluate five popular distributed database systems (Spanner, CockroachDB, Yugabyte, TiDB, PlanetScale). The paper is a potpourri of many subquests. It feels a bit unfocused, but maybe I shouldn't complain because the paper is full to the brim, and provides three papers for the price of one. TAO TAO is a read-optimized geographically distributed in-memory data store that provides access to Facebook's social graph for diverse products and b

Automated Validation of State-Based Client-Centric Isolation with TLA+ (2021)

This work provides a TLA+ formalization of t he state-based isolation model introduced in the PODC 2017 paper. The TLA+ models provided here will be very useful for you if you are a distributed database developer. Using these, you can show your database protocols satisfy the desired isolation levels, and as you extend the design, you can keep checking that the guarantees still hold. I have been suffering from this problem recently. I wrote a TLA+ model for a distributed database design. I thought I would be able to write serializability predicates using TLA+ formulas relatively easily. When I tried doing so, I realized what a mess I am in. I made a couple attempts and found that operation-centric specification of database isolation is hard to define and check. I decided approach the problem sideways and wrote some BaitInvariants to show what behaviors are out-ruled, only to find that what I thought would always be rejected was actually valid serialization behavior in some scenarios.

Amazon DynamoDB: A Scalable, Predictably Performant, and Fully Managed NoSQL Database Service (USENIX ATC 2022)

This paper, which appeared in USENIX ATC'22 last week, describes the evolution of the design and implementation of DynamoDB in response to experiences operating it since its launch in 2012. DynamoDB has massive scale. In 2021, during the 66-hour Amazon Prime Day shopping event, Amazon systems made trillions of API calls to DynamoDB, peaking at 89.2 million requests per second. DynamoDB powers Alexa, sites, and all Amazon fulfillment centers. Many AWS services such as AWS Lambda, AWS Lake Formation, and Amazon SageMaker are built on DynamoDB. Moreover, hundreds of thousands of customer applications also use DynamoDB. First some clarification is in order. DynamoDB != Dynamo. DynamoDB's architecture does not share much with that of the Dynamo system (2007) . DynamoDB uses MultiPaxos for replication, for God's sake. Dynamo was a single-tenant system and teams were responsible for managing their own Dynamo installations. The resulting operational complexity becam

High Throughput Replication with Integrated Membership Management (USENIX ATC 2022)

This paper, which appeared at USENIX ATC 2022, introduces ChainPaxos.  ChainPaxos applies ideas from chain replication to the MultiPaxos protocol. ( Here is an overview of chain replication if you are unfamiliar with it.) Since ChainPaxos is a Paxos protocol, its fault-tolerance is independent of an external coordination service. This allows for continuous execution of operations during reconfigurations, and uncoupling of the system's fault-tolerance from that of an external service. More interestingly, ChainPaxos introduces a local linearizable read operation that can be executed in any replica with no communication overhead, relying only on information used to process update operations. This read can be served by any single replica at the cost of increased latency. ChainPaxos protocol ChainPaxos is MultiPaxos that is communicating using a chain topology. Leveraging the chain, ChainPaxos combines and forwards multiple Multi-Paxos messages in a single message as shown in Figure 3.

OSDI2022, continued

Venue, logistics, and travel The conference was held at the Omni La Costa Resort at Carlsbad CA. Carlsbad is 30 miles from San Diego, still an easy to access place compared to earlier SOSP/OSDI venues. This didn't stop my Lyft driver to complain; he asked me why they are not holding the conference in one of the ample city venues. I told him, this is part of the tradition. The resort was nice and clean. It had splash pools, which was great for people traveling with kids. (I, of course, tried the water slides, they were fun.) The resort also had a huge golf course, around which I ran in the early mornings with my body clock still on eastern time. San Diego is a very exotic place 15 miles from the border with Tijuana Mexico. There are palm trees everywhere. Just seeing the palm trees trigger a visceral relaxing reflex, as confirmed with several other friends. We started calling palm trees, log-structured trees, compared to the regular, run of the mill binary trees we see in the north.

OSDI 2022: nature is healing

This week I attended the USENIX Symposium on Operating Systems Design and Implementation (OSDI) conference , which was held at San Diego (well, at Carlsbad, CA to be more accurate). A big appeal of OSDI this year was that it was colocated with USENIX ATC . This allowed participants to attend two great conferences for the price of one registration and with one trip. Despite this allure, OSDI22 was still far from its pre-Covid glory and buzz. 900 people attended OSDI20, whereas OSDI22 was attended by 300 people in-person and another 250 as remote. On a more subjective measure, I found the energy of the conference lacking compared to previous years. There were no lively discussions in the sessions or in the hallways. (Aleksey and I had a very lively discussion about Pat's recent paper , which caused concerned looks from nearby tables.) There wasn't any specific buzzwords ringing at the conference. But, if I were to give a buzzword, maybe disaggregated systems and far-memory system

Popular posts from this blog

Graviton2 and Graviton3

Foundational distributed systems papers

Learning a technical subject

SQLite: Past, Present, and Future

Learning about distributed systems: where to start?

Strict-serializability, but at what cost, for what purpose?

CockroachDB: The Resilient Geo-Distributed SQL Database

The Seattle Report on Database Research (2022)

Amazon Aurora: Design Considerations + On Avoiding Distributed Consensus for I/Os, Commits, and Membership Changes

Warp: Lightweight Multi-Key Transactions for Key-Value Stores