Showing posts from February, 2024

Adapting TPC-C Benchmark to Measure Performance of Multi-Document Transactions in MongoDB

This paper appeared in VLDB 2019 . Benchmarks are a necessary evil for database evaluation. Benchmarks often focus on narrow aspects and specific workloads, creating a misleading picture for broader real-world applications/workloads. However, for a quick performance snapshot, benchmarks remain a crucial tool. Popular benchmarks like YCSB, designed for simple key-value operations, fall short in capturing MongoDB's features, including secondary indexes, flexible queries, complex aggregations, and even multi-statement multi-document ACID transactions (since version 4.0). Standard RDBMS benchmarks haven’t been a good fit for MongoDB either, since they require normalized relational schema and SQL operations. Consider TPC-C, which simulates a commerce system with five types of transactions involving customers, orders, warehouses, districts, stock, and items represented with data in nine normalized tables. TPC-C requires specific relational schema and prescribed SQL statements. Adapting T

Fault tolerance (Transaction processing book)

This is Chapter 3 from the Transaction Processing Book Gray/Reuter 1992 . Why does the fault-tolerance discussion come so early in the book? We haven't even started talking about transactional programming styles, concurrency theory, concurrency control. The reason is that the book uses dealing with failures as a motivation for adopting transaction primitives and a transactional programming style. I will highlight this argument now, and outline how the book builds to that crescendo in about 50 pages. The chapter starts with an astounding observation. I'm continuously astounded by the clarity of thinking in this book: "The presence of design faults is the ultimate limit to system availability; we have techniques that mask other kinds of faults." In the coming sections, the book introduces the concepts of faults, failures, availability, reliability, and discusses hardware fault-tolerance through redundancy. It celebrates wins in hardware reliability through several examp

Transaction Processing Book, Gray/Reuter 1992

 I started reading this book as part of Alex Petrov's book club . I am loving the book. We will do Chapter 3 soon, and I am late on reporting on this, but here are some material from the first chapters. I don't have the time to write more fulfilling reviews about the first chapters unfortunately, so I will try to give you the gist.  Foreword: Why We Wrote this Book The purpose of this book is to give you an understanding of how large, distributed, heterogeneous computer systems can be made to work reliably. An integrated (and integrating) perspective and methodology is needed to approach the distributed systems problem. Our goal is to demonstrate that transactions provide this integrative conceptual framework, and that distributed transaction-oriented operating systems are the enabling technology. In a nutshell: without transactions, distributed systems cannot be made to work for typical real-life applications. I am very much impressed by the distributed systems insight Gray p

Verifying Transactional Consistency of MongoDB

This paper presents pseudocodes for the transaction protocols for the three possible MongoDB deployments: WiredTiger, ReplicaSet, and ShardedCluster, and shows that these satisfy different variants of snapshot isolation: namely StrongSI, RealtimeSI, and SessionSI, respectively. Background MongoDB transactions have evolved in three stages (Figure 1): In version 3.2, MongoDB used the WiredTiger storage engine as the default storage engine. Utilizing the Multi-Version Concurrency Control (MVCC) architecture of WiredTiger storage engine, MongoDB was able to support single-document transactions in the standalone deployment. In version 4.0, MongoDB supported multi-document transactions in replica sets (which consists of a primary node and several secondary nodes), In version 4.2, MongoDB further introduced distributed multi-document transactions in sharded clusters (which is a group of multiple replica sets among which data is sharded). I love that the paper managed to present the transact

Tunable Consistency in MongoDB

This paper appeared in VLDB 2019. It discusses the tunable consistency models in MongoDB and how MongoDB's speculative execution model and data rollback protocol enable this spectrum of consistency levels efficiently. Motivation Applications often tolerate short or infrequent periods of inconsistency, so it may not make sense for them to pay the high cost of ensuring strong consistency at all times. These types of trade-offs have been partially codified in the PACELC theorem . The Probabilistically Bounded Staleness work (and many followup work ) explored the trade-offs between operation latency and data consistency in distributed database replication and showcased their importance.  To provide users with a set of tunable consistency options, MongoDB exposes writeConcern and readConcern levels as parameters that can be set on each database operation. writeConcern specifies what durability guarantee a write must satisfy before being acknowledged to a client. Similarly, readConc

Design and Analysis of a Logless Dynamic Reconfiguration Protocol

This paper appeared in OPODIS'21 and describes dynamic reconfiguration in MongoDB. So, what is dynamic reconfiguration? The core Raft protocol implements state machine replication (SMR) using a static set of servers. ( Please read this to learn about how MongoDB adopted Raft for a pull-based SMR .) To ensure availability in the presence of faults, SMR systems must be able to dynamically (and safely) replace failed nodes with healthy ones. This is known as dynamic reconfiguration. MongoDB logless reconfiguration Since its inception, the MongoDB replication system has provided a custom, ad hoc, legacy protocol for dynamic reconfiguration of replicas. This legacy protocol managed configurations in a logless fashion, i.e., each server only stored its latest configuration. It decoupled reconfiguration processing from the main database operation log. The legacy protocol, however, was known to be unsafe in certain cases.  Revising that legacy protocol, this paper presents a redesigned saf

Popular posts from this blog

The end of a myth: Distributed transactions can scale

Foundational distributed systems papers

Hints for Distributed Systems Design

Learning about distributed systems: where to start?

Metastable failures in the wild

Scalable OLTP in the Cloud: What’s the BIG DEAL?

SIGMOD panel: Future of Database System Architectures

Amazon Aurora: Design Considerations + On Avoiding Distributed Consensus for I/Os, Commits, and Membership Changes

Dude, where's my Emacs?

There is plenty of room at the bottom