Posts

Modular verification of MongoDB Transactions using TLA+

Image
Joint work with Will Schultz . A transaction groups multiple operations into an all-or-nothing logical-box to reduce the surface area exposed to concurrency control and fault recovery, simplifying the application programmer's job. Transactions support ACID guarantees: atomicity, consistency, isolation, durability. Popular isolation levels include, Read Committed (RC), Snapshot Isolation (SI), and Serializability (SER), which offer increasing protection against concurrency anomalies. MongoDB Transactions MongoDB’s transaction model has evolved incrementally.   v3.2 (2015): Introduced single-document transactions using MVCC in the WiredTiger storage engine.   v4.0 (2018): Extended support to multi-document transactions within a replica set (aka shard).   v4.2 (2019): Enabled fully distributed transactions across shards. Replica Set Transactions.  All transaction operations are first performed on the primary using the WiredTiger transaction workflow/algorithm. Before co...

Concurrency Control and Recovery in Database Systems: Preface and Chapter 1

Image
I'm catching up on Phil Eaton's book club and just finished the preface and Chapter 1 of Concurrency Control and Recovery in Database Systems by Bernstein, Hadzilacos, and Goodman. This book came out in 1987: two years before the fall of Berlin wall, 5 years before Windows 3.1, and before the Gray/Reuters book on Transaction Processing .  I was first surprised about why "Recovery" was featured prominently in the book title, but then realized that in 1987 there was no solid solution for recovery. The ARIES paper on the write-ahead-log came out in 1992. Once I realized that, the structure made sense: concurrency control (Chapters 1–5), recovery (Chapters 6–7), and a forward-looking chapter on replication (Chapter 8), where they candidly admit: "No database systems that we know of support general purpose access to replicated distributed data." The Problem Serializability Theory Two Phase Locking Non-Locking Schedulers Multiversion Concurrency Control Centrali...

Notes from the TLA+ Community Event

I attended the TLA+ Community Event at Hamilton, Ontario on Sunday. Several talks pushed the boundaries of formal methods in the real world through incorporating testing, conformance, model translation, and performance estimation. The common thread was that: TLA+ isn't just for specs anymore. It's being integrated into tooling: fuzzers, trace validators, and compilers. The community is building bridges from models to reality, and it's a good time to be in the loop. Below is a summary of selected talks, followed by some miscellaneous observations. This is just a teaser; the recordings will be posted soon on the TLA+ webpage . Model-Based Fuzzing for Distributed Systems — Srinidhi Nagendra Traditional fuzzing relies on random inputs and coverage-guided mutation, and works well for single-process software. But it fails for distributed systems due to too many concurrent programs, too many interleavings, and no clear notion of global coverage. Srinidhi's work brings model-b...

TLA+ Community Event at ETAPS 2025

This Sunday, I'll be attending (and speaking at) the TLA+ Community Event , co-located with ETAPS 2025 in Hamilton, Ontario.  The setting is fitting. ETAPS (European Joint Conferences on Theory and Practice of Software) has long been a hub for research that combines theory with software engineering. It seems that, while the U.S. academia largely left software engineering to industry, European researchers remained more strongly involved in software engineering discipline. ETAPS has consistently hosted work on model checking, type systems, static analysis, and formal methods. Think of work on abstract interpretation, K frameworks, or compilers verified in Coq. I have never been to ETAPS before. It seems that they are rebranding as "International Joint Conferences On Theory and Practice of Software" and droppping the European. And this year is the first time, after 28 years, that the event moves outside of Europe . Interesting. McMaster University, the ETAPS 2025 host, is ...

Multi-Grained Specifications for Distributed System Model Checking and Verification

Image
This EuroSys 2025 paper wrestles with the messy interface between formal specification and implementation reality in distributed systems. The case study is ZooKeeper. The trouble with verifying something big like ZooKeeper is that the spec and the code don’t match. Spec wants to be succinct and abstract; code has to be performant and dirty.  For instance, a spec might say, “this happens atomically.” But the actual system says, “yeah, buddy, right.” Take the case of FollowerProcessNEWLEADER: the spec bundles updating the epoch and accepting the leader’s history into a single atomic step. But in the real system, those steps are split across threads and separated by queuing, I/O, and asynchronous execution. Modeling them as atomic would miss real, observable intermediate states, and real bugs. To bridge this model-code gap, the authors use modularization and targeted abstraction. Don’t write one spec, write multi-grained specs. Different parts of the system are modeled at different ...

What I'd do as a College Freshman in 2025

Do Computer Science Absolutely. Still would. Many are spooked by LLMs. Some, like Jensen Huang, argue that "nobody has to learn how to program."   I argue the opposite . And I double down.  Being supported by AI tools is not a substitute for mastery . You can’t borrow skills. You have to earn them.   Computer science builds vital skills : hacking, debugging, abstract thinking, and quick adaptation. These don’t go out of style.   Do STEM. It’s LLM-resistant. LLMs can retrieve and remix information, but do you know what to do with them? Like the dog chasing the car, what now? STEM teaches you that. It teaches you to think, to reason, to act. It gets you from information to wisdom. But only after you've mastered the foundations. We're heading into the age of π-shaped people: depth in two areas, and generalist across . Building depth first, and then ranging is good strategy .   So yes, I would learn the foundations of both CS and AI. And then do AI + X, where X is s...

Cedar: A New Language for Expressive, Fast, Safe, and Analyzable Authorization

Image
This paper (2024) presents Cedar, AWS's new authorization policy language. By providing a clear declarative way to manage access control policies, Cedar addresses the common limitations of embedding authorization logic directly into application code: problems with correctness, auditing, and maintainence. Cedar introduces a domain-specific language (DSL) to express policies that are separate from application code, and improves readability and manageability. In that sense, this is like aspect-oriented programming but for authorization policy. The language is designed with four main objectives: expressiveness, performance, safety, and analyzability. Cedar balances these goals by supporting role-based (RBAC), attribute-based (ABAC), and relationship-based (ReBAC) access control models. Policy Structure and Evaluation Cedar policies consist of three primary components: effect, scope, and conditions. The effect can either be "permit" or "forbid", defining whether acc...

ParallelRaft: Out-of-Order Executions in PolarFS

Image
This paper (2021) dives deeper in the Parallel Raft protocol introduced with PolarFS. PolarFS (VLDB'18) is a distributed file system with ultra-low latency and high availability, developed by Alibaba. Its variant of the Raft consensus protocol, ParallelRaft,  relaxes Raft's strict serialization constraints. ParallelRaft allows state machines to commit and execute log entries out of order. This study provides a formal specification of ParallelRaft in TLA+, proves its correctness, and identifies consistency issues caused by ghost log entries. To address these issues, the refined ParallelRaft-CE protocol limits parallelism during leader transitions. Introduction Raft is a tight-ass. It enforces sequential commitment and execution of log entries. No funny business. Multi-Paxos is a bit rebellious. It allows out-of-order commitment, but it still insists on ordered execution. In order not to leave any performance on the table, PolarFS's ParallelRaft rebels completely: it not on...

Publish and Perish: Why Ponder Stibbons Left the Ivory Tower

Image
(With apologies to Terry Pratchett) Ponder Stibbons sat in the shadowy recesses of the Uncommon Room, clutching a lukewarm cup of tea and the last vestiges of his patience. Across from him, the Dean, wrapped in his usual self-satisfaction, puffed his pipe and surveyed the room as if it were all a great and glorious joke that he alone understood. "Leaving?" the Dean spluttered, when Ponder finally managed to slide the conversation in that direction. "What do you mean, leaving? Where would you even go?" "Industry," said Ponder, trying not to make it sound like a curse. "Databases. Big ones." The Dean blinked. "But we have databases here. The Archchancellor’s hat alone contains centuries of magical indexing..." "Yes, but in the real world, we use them to actually store and retrieve information, not just argue about what information is worth storing in the first place," Ponder snapped. "And I’ll be paid properly." "...

Our K-Drama Journey

Finding a show that suits everyone in our family has always been a challenge. Intense action/stress is too much for our youngest daughter, and we prefer to keep things PG-13. We’ve finally found a great solution: Korean-dramas. We’ve been watching them back to back recently, and they’ve been a hit across all ages. Hometown Cha-Cha-Cha  (2021)  This was our gateway into K-dramas. Since it’s dubbed in English, it was easy to start. Set in a small seaside town, it’s a wholesome, family-friendly show with both comedic moments and touching scenes. It also gave us a glimpse into everyday Korean life. It's a solid feel-good watch. Queen of Tears  (2024)  We picked this next because it was also dubbed. This one was a commitment (16 episodes, each 1.5 hours long) but it was worth it. It had more romance, more drama, better cinematography, and a stellar cast. The intense drama in some scenes made our youngest bawling. We got so hooked that we ended up binge-watching the l...

Smart Casual Verification of the Confidential Consortium Framework

Image
This paper (NSDI'25) applies lightweight formal methods (hence the pun "smart casual" in contrast to formal attire) to the Confidential Consortium Framework (CCF). CCF is an open-source platform for trustworthy cloud applications, used in Microsoft's Azure Confidential Ledger service. The authors combine formal specification, model checking, and automated testing to validate CCF's distributed protocols, specifically its custom consensus protocol. Background CCF uses trusted execution environments (TEEs) and state machine replication. Its consensus protocol is based on Raft but has major modifications: signature transactions (Merkle tree root over the whole log  signed by the leader), optimistic acknowledgments, and partition leader step-down. CCF also avoids RPCs, using a uni-directional messaging layer. These changes add complexity, requiring formal verification. At the time of writing, CCF's implementation spans 63 kLoC in C++. The authors define five requi...

What makes entrepreneurs entrepreneurial?

Image
Entrepreneurs think and act differently from managers and strategists. This 2008 paper argues that entrepreneurs use effectual reasoning, the polar opposite of causal reasoning taught in business schools. Causal reasoning starts with a goal and finds the best way to achieve it. Effectual reasoning starts with available resources and lets goals emerge along the way. Entrepreneurs are explorers, not generals. Instead of following fixed plans, they experiment and adapt to seize whatever opportunities the world throws at them. Consider this example from the paper. A causal thinker starts an Indian restaurant by following a fixed plan: researching the market, choosing a prime location, targeting the right customers, securing funding, and executing a well-designed strategy. The effectual entrepreneur doesn’t start with a set goal. She starts with what she has (her skills, knowledge, and network), and she experiments. She might begin by selling homemade lunches to friends’ coworkers. If that...

My Time at MIT

Image
Twenty years ago, in 2004-2005, I spent a year at MIT’s Computer Science department as a postdoc working with Professor Nancy Lynch. It was an extraordinary experience. Life at MIT felt like paradise, and leaving felt like being cast out. MIT Culture MIT’s Stata Center was the best CS building in the world at the time. Designed by Frank Gehry, it was a striking abstract architecture masterpiece ( although like all abstractions it was a bit leaky ). Furniture from Herman Miller complemented this design. I remember seeing price tags of $400 on simple yellow chairs. The building buzzed with activity.  Every two weeks, postdocs were invited to the faculty lunch on Thursdays, and alternating weeks we had group lunches. Free food seemed to materialize somewhere in the building almost daily, and the food trucks outside were also good. MIT thrived on constant research discussions, collaborations, and talks. Research talks were advertised on posters at the urinals, as a practical touch of M...

Popular posts from this blog

Hints for Distributed Systems Design

My Time at MIT

Making database systems usable

Looming Liability Machines (LLMs)

Advice to the young

Learning about distributed systems: where to start?

Scalable OLTP in the Cloud: What’s the BIG DEAL?

Foundational distributed systems papers

What I'd do as a College Freshman in 2025

Distributed Transactions at Scale in Amazon DynamoDB