Showing posts from February, 2024

Transaction models (Chapter 4. Transaction processing book)

Atomicity does not mean that something is executed as one instruction at the hardware level with some magic in the circuitry preventing it from being interrupted. Atomicity merely conveys the impression that this is the case, for it has only two outcomes: the specified result or nothing at all, which means in particular that it is free of side effects. Ensuring atomicity becomes trickier as faults and failures are involved. Consider the disk write operation, which comes in four quality levels: Single disk write: when something goes wrong the outcome of the action is neither all nor nothing, but something in between. Read-after write: This implementation of the disk write first issues a single disk write, then rereads the block from disk and compares the result with the original block. If the two are not identical, the sequence of writing and rereading is repeated until the block is successfully written. This has problems due to no abort path, no termination guarantee, and no partial ex

Recent reads (Feb 2024)

Here are the three books I have read recently. Argh, I wish I took some notes while going through these books. It feels hard to write these reviews in retrospect. The Culture Code: The Secrets of Highly Successful Groups (2018) "What do Pixar, Google and the San Antonio Spurs basketball team have in common?" That was the pitch for the culture code book, which came out. That didn't age well, for Google's case at least. Well, the Google example was not about team work, but rather Jeff Dean fixing search for adwords over a weekend, so this is neither here or there. We can forgive the book for trying to choose sensational examples. I did like the book overall. It identifies three things to get the culture right: 1) creating belonging, 2) sharing vulnerability, and 3) establishing purpose. Creating belonging is about safety/security. Maslow's hierarchy emphasizes safety and security as fundamental human needs. In a work environment where we feel judged or constantly ne

TLA+ modeling of MongoDB logless reconfiguration

Here we do a walkthrough of the TLA+ specs for the MongoDB logless reconfiguration protocol we have reviewed recently. The specs are available at the  repo provided by Will Schultz, Siyuan Zhou, and Ian Dardik.  This is the protocol model for managing logless reconfiguration. Let's call this the "config state machine" (CSM). This is the protocol model for static MongoDB replication protocol based on Raft. Let's call this the "oplog state machine" (OSM).  Finally this model composes the above two protocols so they work in a superimposed manner. I really admire how these specs provided a modular composition of the reconfiguration protocol and Raft-based replication protocol. I figured I would explain how this works here, since walkthroughs of advanced/intermediate TLA+ specifications, especially for composed systems, are rare. I will cover the structure of the two protocols (CSM and OSM) briefly, before diving

Adapting TPC-C Benchmark to Measure Performance of Multi-Document Transactions in MongoDB

This paper appeared in VLDB 2019 . Benchmarks are a necessary evil for database evaluation. Benchmarks often focus on narrow aspects and specific workloads, creating a misleading picture for broader real-world applications/workloads. However, for a quick comparative performance snapshot, they still remain a crucial tool. Popular benchmarks like YCSB, designed for simple key-value operations, fall short in capturing MongoDB's features, including secondary indexes, flexible queries, complex aggregations, and even multi-statement multi-document ACID transactions (since version 4.0). Standard RDBMS benchmarks haven’t been a good fit for MongoDB either, since they require normalized relational schema and SQL operations. Consider TPC-C, which simulates a commerce system with five types of transactions involving customers, orders, warehouses, districts, stock, and items represented with data in nine normalized tables. TPC-C requires specific relational schema and prescribed SQL statements

Fault tolerance (Transaction processing book)

This is Chapter 3 from the Transaction Processing Book Gray/Reuter 1992 . Why does the fault-tolerance discussion come so early in the book? We haven't even started talking about transactional programming styles, concurrency theory, concurrency control. The reason is that the book uses dealing with failures as a motivation for adopting transaction primitives and a transactional programming style. I will highlight this argument now, and outline how the book builds to that crescendo in about 50 pages. The chapter starts with an astounding observation. I'm continuously astounded by the clarity of thinking in this book: "The presence of design faults is the ultimate limit to system availability; we have techniques that mask other kinds of faults." In the coming sections, the book introduces the concepts of faults, failures, availability, reliability, and discusses hardware fault-tolerance through redundancy. It celebrates wins in hardware reliability through several examp

Transaction Processing Book, Gray/Reuter 1992

 I started reading this book as part of Alex Petrov's book club . I am loving the book. We will do Chapter 3 soon, and I am late on reporting on this, but here are some material from the first chapters. I don't have the time to write more fulfilling reviews about the first chapters unfortunately, so I will try to give you the gist.  Foreword: Why We Wrote this Book The purpose of this book is to give you an understanding of how large, distributed, heterogeneous computer systems can be made to work reliably. An integrated (and integrating) perspective and methodology is needed to approach the distributed systems problem. Our goal is to demonstrate that transactions provide this integrative conceptual framework, and that distributed transaction-oriented operating systems are the enabling technology. In a nutshell: without transactions, distributed systems cannot be made to work for typical real-life applications. I am very much impressed by the distributed systems insight Gray p

Verifying Transactional Consistency of MongoDB

This paper presents pseudocodes for the transaction protocols for the three possible MongoDB deployments: WiredTiger, ReplicaSet, and ShardedCluster, and shows that these satisfy different variants of snapshot isolation: namely StrongSI, RealtimeSI, and SessionSI, respectively. Background MongoDB transactions have evolved in three stages (Figure 1): In version 3.2, MongoDB used the WiredTiger storage engine as the default storage engine. Utilizing the Multi-Version Concurrency Control (MVCC) architecture of WiredTiger storage engine, MongoDB was able to support single-document transactions in the standalone deployment. In version 4.0, MongoDB supported multi-document transactions in replica sets (which consists of a primary node and several secondary nodes), In version 4.2, MongoDB further introduced distributed multi-document transactions in sharded clusters (which is a group of multiple replica sets among which data is sharded). I love that the paper managed to present the transact

Popular posts from this blog

Learning about distributed systems: where to start?

Hints for Distributed Systems Design

Foundational distributed systems papers

Metastable failures in the wild

The demise of coding is greatly exaggerated

Scalable OLTP in the Cloud: What’s the BIG DEAL?

The end of a myth: Distributed transactions can scale

SIGMOD panel: Future of Database System Architectures

Why I blog

There is plenty of room at the bottom