Posts

Showing posts with the label mongodb

Modular verification of MongoDB Transactions using TLA+

Image
Joint work with Will Schultz . A transaction groups multiple operations into an all-or-nothing logical-box to reduce the surface area exposed to concurrency control and fault recovery, simplifying the application programmer's job. Transactions support ACID guarantees: atomicity, consistency, isolation, durability. Popular isolation levels include, Read Committed (RC), Snapshot Isolation (SI), and Serializability (SER), which offer increasing protection against concurrency anomalies. MongoDB Transactions MongoDB’s transaction model has evolved incrementally.   v3.2 (2015): Introduced single-document transactions using MVCC in the WiredTiger storage engine.   v4.0 (2018): Extended support to multi-document transactions within a replica set (aka shard).   v4.2 (2019): Enabled fully distributed transactions across shards. Replica Set Transactions.  All transaction operations are first performed on the primary using the WiredTiger transaction workflow/algorithm. Before co...

Implementation of Cluster-wide Logical Clock and Causal Consistency in MongoDB

Image
This paper (SIGMOD 2019) discusses the design of causal consistency and cluster-wide logical clock for MongoDB.  Causal consistency preserves Lamport's happened-before (transitive and partial) order of events in a distributed system. If an event A causes another event B, causal consistency ensures that every process in the system observes A before observing B. Causal consistency guarantees read-your-writes, write-follows-reads, monotonic reads, and monotonic writes. Causal consistency is the strictest consistency level where the system can still make progress (always-available one-way convergence) in the presence of partitions as discussed in  the "Consistency, Availability, and Convergence" paper . This makes causal-consistency a very attractive model for modern distributed applications. https://jepsen.io/consistency The paper provides a simple and balanced design for causal-consistency in MongoDB. The ideas are simple, yet the implementation and operationalizing (backw...

Checking Causal Consistency of MongoDB

Image
This paper  declares the Jepsen testing of MongoDB for causal consistency a bit lacking in rigor, and goes on to test MongoDB against three variants of causal consistency (CC, CCv, and CM) under node failures, data movement, and network partitions. They also add TLA+ specifications of causal consistency and their checking algorithms, and verify them using the TLC model checker, and discuss how TLA+ specification can be related to Jepsen testing. This is a journal paper, so it is long. The writing could have been better. But it is apparent that a lot of effort went into the paper. One thing I didn't like in the paper was the wall-of-formalism around defining causal consistency in Section 2:Preliminaries. This was hard to follow. And I was upset about the use of operational definitions such as "bad patterns" for testing. Why couldn't they define this in a state-centric manner, as in the client-centric database isolation paper ? It turns out that Section 2 did not intr...

TLA+ modeling of MongoDB logless reconfiguration

Image
Here we do a walkthrough of the TLA+ specs for the MongoDB logless reconfiguration protocol we have reviewed recently. The specs are available at the  https://github.com/will62794/logless-reconfig  repo provided by Will Schultz, Siyuan Zhou, and Ian Dardik.  This is the protocol model for managing logless reconfiguration. Let's call this the "config state machine" (CSM). This is the protocol model for static MongoDB replication protocol based on Raft. Let's call this the "oplog state machine" (OSM).  Finally this model composes the above two protocols so they work in a superimposed manner. I really admire how these specs provided a modular composition of the reconfiguration protocol and Raft-based replication protocol. I figured I would explain how this works here, since walkthroughs of advanced/intermediate TLA+ specifications, especially for composed systems, are rare. I will cover the structure of the two protocols (CSM and OSM) briefly, before diving ...

Adapting TPC-C Benchmark to Measure Performance of Multi-Document Transactions in MongoDB

Image
This paper appeared in VLDB 2019 . Benchmarks are a necessary evil for database evaluation. Benchmarks often focus on narrow aspects and specific workloads, creating a misleading picture for broader real-world applications/workloads. However, for a quick comparative performance snapshot, they still remain a crucial tool. Popular benchmarks like YCSB, designed for simple key-value operations, fall short in capturing MongoDB's features, including secondary indexes, flexible queries, complex aggregations, and even multi-statement multi-document ACID transactions (since version 4.0). Standard RDBMS benchmarks haven’t been a good fit for MongoDB either, since they require normalized relational schema and SQL operations. Consider TPC-C, which simulates a commerce system with five types of transactions involving customers, orders, warehouses, districts, stock, and items represented with data in nine normalized tables. TPC-C requires specific relational schema and prescribed SQL statements...

Verifying Transactional Consistency of MongoDB

Image
This paper presents pseudocodes for the transaction protocols for the three possible MongoDB deployments: WiredTiger, ReplicaSet, and ShardedCluster, and shows that these satisfy different variants of snapshot isolation: namely StrongSI, RealtimeSI, and SessionSI, respectively. Background MongoDB transactions have evolved in three stages (Figure 1): In version 3.2, MongoDB used the WiredTiger storage engine as the default storage engine. Utilizing the Multi-Version Concurrency Control (MVCC) architecture of WiredTiger storage engine, MongoDB was able to support single-document transactions in the standalone deployment. In version 4.0, MongoDB supported multi-document transactions in replica sets (which consists of a primary node and several secondary nodes), In version 4.2, MongoDB further introduced distributed multi-document transactions in sharded clusters (which is a group of multiple replica sets among which data is sharded). I love that the paper managed to present the transact...

Tunable Consistency in MongoDB

Image
This paper appeared in VLDB 2019. It discusses the tunable consistency models in MongoDB and how MongoDB's speculative execution model and data rollback protocol enable this spectrum of consistency levels efficiently. Motivation Applications often tolerate short or infrequent periods of inconsistency, so it may not make sense for them to pay the high cost of ensuring strong consistency at all times. These types of trade-offs have been partially codified in the PACELC theorem . The Probabilistically Bounded Staleness work (and many followup work ) explored the trade-offs between operation latency and data consistency in distributed database replication and showcased their importance.  To provide users with a set of tunable consistency options, MongoDB exposes writeConcern and readConcern levels as parameters that can be set on each database operation. writeConcern specifies what durability guarantee a write must satisfy before being acknowledged to a client. Similarly, readConc...

Popular posts from this blog

Hints for Distributed Systems Design

My Time at MIT

Scalable OLTP in the Cloud: What’s the BIG DEAL?

Foundational distributed systems papers

Advice to the young

Learning about distributed systems: where to start?

Distributed Transactions at Scale in Amazon DynamoDB

Making database systems usable

Looming Liability Machines (LLMs)

Analyzing Metastable Failures in Distributed Systems