Posts

TPC-E vs. TPC-C: Characterizing the New TPC-E Benchmark via an I/O Comparison Study

Image
This paper is from Sigmod 2010, and compares the two standard TPC benchmarks for OLTP, TPC-C which came in 1992, and the TPC-E which dropped in 2007. TPC-E is designed to be a more realistic and sophisticated OLTP benchmark than TPC-C by incorporating realistic data skews and referential integrity constraints. However, because of its complexity, TPC-E is more difficult to implement, and hard to provide reproducability of the results by others. As a result adoption had been very slow and little. (I linked TPC-C to its TPC-C wikipedia page above, but poor TPC-E does not even have a Wikipedia page. Instead it has this 1990s style webpage .) A more cynical view on the immense popularity of TPC-C over TPC-E would be that TPC-C can be easily abused to show your database in the best possible light. TPC-C is very easily shardable on the  warehouse_id to hide any contention on concurrency control, so your database would be able to display high-scalability. We had recently reviewed the OceanSt

Is Scalable OLTP in the Cloud a Solved Problem? (CIDR 2023)

Image
This paper appeared in CIDR 2023 recently. It is a thought provoking big picture paper, coming from a great  team. The paper draws attention to the divide between conventional wisdom on building scalable OLTP databases (shared-nothing architecture) and how they are built and deployed on the cloud (shared storage architecture). There are shared nothing systems like CockroachDB, Yugabyte, and Spanner, but the overwhelming trend/volume on cloud OLTP is shared storage, and even with single writer, like AWS Aurora , Azure SQL HyperScale, PolarDB, and AlloyDB. The paper doesn't analyze this divide in detail, and jumps off to discuss shared storage with multiple writer as a more scalable way to build cloud OLTP databases. It would have been nice to go deep on this divide. I think the big benefit from shared storage is the disaggregation we get between compute and storage, which allows us to scale/optimize compute and storage resources independently based on demand. Shared storage also al

Reconfiguring Replicated Atomic Storage

Image
Reconfiguration is a rich source of faults in distributed systems. If you don't reconfigure, you lose availability when you lose additional nodes. But the reconfiguration algorithms are subtle, and it is easy to get them wrong, and lose consistency during reconfiguration. This is because a reconfiguration operation is run concurrently with the read/write operations and with potentially other reconfiguration operations, and they run when the system is already in some turmoil due to faults. Even the consensus papers/protocols do a bad job of addressing the reconfiguration problem; reconfiguration is always addressed as secondarily, incompletely, and many times incorrectly. This paper, which appeared in the Bulletin of the EATCS: The Distributed Computing 2010, is about reconfiguration protocols for atomic storage. The atomic storage problem is a simpler problem than consensus or state machine replication (SMR). The difference is atomic storage is hedonistic (lives in the moment), it

Best of Metadata in 2022

It is time to reflect back on the year. Here are some selected papers I covered in 2022. Production systems Amazon Aurora: Design Considerations + On Avoiding Distributed Consensus for I/Os, Commits, and Membership Changes Aurora provides a a high-throughput relational database on AWS cloud by decoupling the storage to a multi-tenant scale-out storage service. Each database instance acts as a SQL endpoint and supports query processing and transactions. Many database functions, such as redo logging, materialization of data blocks, garbage collection, and backup/restore, are offloaded to the storage nodes. Aurora performs replication among the storage nodes by pushing the redo log; this reduces networking traffic and enables fault-tolerant storage that heals without database involvement. Amazon DynamoDB: A Scalable, Predictably Performant, and Fully Managed NoSQL Database Service (USENIX ATC 2022) DynamoDB has massive scale; it powers Alexa, Amazon.com sites, and all Amazon fulfillment c

Noria: dynamic, partially-stateful dataflow for high-performance web applications

Image
This paper, which appeared in OSDI 2018 , considers the question of how to design a read-optimized RDBMS database with built-in support for caching and fast lookup answers to complex queries. The resulting system, Noria is implemented in Rust, and is available as opensource at https://github.com/mit-pdos/noria . There are already two good summaries written of the paper (from Adrian Colyer and Micah Lerner ), so my job today will be simpler. Artwork created by Stable Diffusion Motivation and overview Databases (mostly relational databases) are overwhelmingly used with read-heavy workloads due to all those SQL querying they serve, but they fail to take advantage of this and improve their designs to optimize for read-heavy workloads. Some crutch solutions was added to address this: memcached, read-only replicas etc., but these have shortcomings. Many deployments employ an in-memory key-value cache (like Redis, memcached, or TAO) to speed up common-case read queries. Such a cache avoids r

NoSQL: The Hangover of Dropping ACID

Image
Created by Stable Diffusion The 70s were a time of excess and rebellion, a carefree era where people embraced their wildest impulses and let their hair down. But as the years went by, people realized that the true freedom lies not in the absence of rules, but in the discipline to follow them. And so, they put away their acid-washed jeans and neon-colored hair, and embraced a more structured, disciplined way of life. As we hit 2010, the 70s spirit of rebellion spawned a new breed of databases known as "NoSQL". The rise of NoSQL databases was a rebellion against the strict rules and constraints of SQL databases. NoSQL databases promised to free users from the rigid, structured world of SQL, with their strict schemas and complex query languages. Instead, NoSQL databases offered a simple, flexible approach to data storage and retrieval, allowing users to store and access data in whatever way they saw fit. In contrast to SQL databases, which provide strong consistency guarantees t

Vertical Paxos and Primary-Backup Replication

Image
This is a 2009 paper by Leslie Lamport, Dahlia Malkhi, and Lidong Zhou. This paper was very well written and it was clear, and easy to follow (at least for me after having read so many Paxos papers). I finished reading the paper in one hour -- a record time for a slow reader like me. I knew about this work, but had not read the paper carefully before. I think I was overindexing on the reconfiguration aspect of Vertical Paxos, but by reading this paper (by going to the source) I found that the primary-backup replication application of Vertical Paxos was as much the point of emphasis as the reconfiguration aspect. The paper talks about the connection between Paxos and primary-backup replication applications, and presents vertical Paxos as a way of bridging these two. The paper delves down in to the leader handover and reconfiguration, and shows a practical application of these in the context of primary-backup replication protocols. Although it is possible to let the state machine reconf

Popular posts from this blog

Foundational distributed systems papers

Learning about distributed systems: where to start?

Strict-serializability, but at what cost, for what purpose?

CockroachDB: The Resilient Geo-Distributed SQL Database

Amazon Aurora: Design Considerations + On Avoiding Distributed Consensus for I/Os, Commits, and Membership Changes

Anna: A Key-Value Store For Any Scale

Warp: Lightweight Multi-Key Transactions for Key-Value Stores

The Seattle Report on Database Research (2022)

Learning a technical subject

Checking statistical properties of protocols using TLA+