Posts

Showing posts with the label reconfiguration

DDIA: Chp 6. Partitioning

Image
Chapter 6 of the Designing Data Intensive Applications (DDIA) book discusses partitioning, a key technique for scaling large datasets and high query throughput in distributed databases. By breaking data into smaller partitions, it can be distributed across multiple nodes in a shared-nothing cluster. This allows the storage and processing load to be spread across many disks and processors. Fun fact: Partitioned databases were pioneered in the 1980s by products such as Teradata and Tandem NonStop SQL, and in 2000s rediscovered by NoSQL databases and Hadoop-based data warehouses. Partitioning is often combined with replication , where each partition is stored on multiple nodes for fault tolerance. In a typical setup, each partition has a leader node that handles writes, with follower nodes replicating the data. Partitioning of Key-Value Data There are two main approaches to partitioning key-value data. Partitioning by key range: Assign a continuous range of keys to each partition, like v...

TLA+ modeling of MongoDB logless reconfiguration

Image
Here we do a walkthrough of the TLA+ specs for the MongoDB logless reconfiguration protocol we have reviewed recently. The specs are available at the  https://github.com/will62794/logless-reconfig  repo provided by Will Schultz, Siyuan Zhou, and Ian Dardik.  This is the protocol model for managing logless reconfiguration. Let's call this the "config state machine" (CSM). This is the protocol model for static MongoDB replication protocol based on Raft. Let's call this the "oplog state machine" (OSM).  Finally this model composes the above two protocols so they work in a superimposed manner. I really admire how these specs provided a modular composition of the reconfiguration protocol and Raft-based replication protocol. I figured I would explain how this works here, since walkthroughs of advanced/intermediate TLA+ specifications, especially for composed systems, are rare. I will cover the structure of the two protocols (CSM and OSM) briefly, before diving ...

Design and Analysis of a Logless Dynamic Reconfiguration Protocol

Image
This paper appeared in OPODIS'21 and describes dynamic reconfiguration in MongoDB. So, what is dynamic reconfiguration? The core Raft protocol implements state machine replication (SMR) using a static set of servers. ( Please read this to learn about how MongoDB adopted Raft for a pull-based SMR .) To ensure availability in the presence of faults, SMR systems must be able to dynamically (and safely) replace failed nodes with healthy ones. This is known as dynamic reconfiguration. MongoDB logless reconfiguration Since its inception, the MongoDB replication system has provided a custom, ad hoc, legacy protocol for dynamic reconfiguration of replicas. This legacy protocol managed configurations in a logless fashion, i.e., each server only stored its latest configuration. It decoupled reconfiguration processing from the main database operation log. The legacy protocol, however, was known to be unsafe in certain cases.  Revising that legacy protocol, this paper presents a redesigned saf...

Reconfiguring Replicated Atomic Storage

Image
Reconfiguration is a rich source of faults in distributed systems. If you don't reconfigure, you lose availability when you lose additional nodes. But the reconfiguration algorithms are subtle, and it is easy to get them wrong, and lose consistency during reconfiguration. This is because a reconfiguration operation is run concurrently with the read/write operations and with potentially other reconfiguration operations, and they run when the system is already in some turmoil due to faults. Even the consensus papers/protocols do a bad job of addressing the reconfiguration problem; reconfiguration is always addressed as secondarily, incompletely, and many times incorrectly. This paper, which appeared in the Bulletin of the EATCS: The Distributed Computing 2010, is about reconfiguration protocols for atomic storage. The atomic storage problem is a simpler problem than consensus or state machine replication (SMR). The difference is atomic storage is hedonistic (lives in the moment), it...

Popular posts from this blog

Hints for Distributed Systems Design

My Time at MIT

Scalable OLTP in the Cloud: What’s the BIG DEAL?

Foundational distributed systems papers

Advice to the young

Learning about distributed systems: where to start?

Distributed Transactions at Scale in Amazon DynamoDB

Making database systems usable

Looming Liability Machines (LLMs)

Analyzing Metastable Failures in Distributed Systems