Posts

Showing posts with the label abstraction

Real Life Is Uncertain. Consensus Should Be Too!

Image
Aleksey and I sat down to read this paper on Monday night. This was an experiment which aimed to share how experts read papers in real time. We haven't read this paper before to keep things raw. As it is with research, we ended up arguing with the paper (and between each other) back and forth. It was messy, and it was also awesome. We had a lot of fun. Check our discussion video below (please listen at 1.5x, I sound less horrible at that speed, ah also this thing is 2 hours long). The paper I annotated during our discussion is also available here. This paper appeared in HotOS 2025 , so it is very recent. It's a position paper arguing that the traditional F-threshold fault model in consensus protocols is outdated and even misleading. Yes, the F-threshold fault model does feel like training wheels we never took off. In his essay " the joy of sects" , Pat Helland bring this topic to tease distributed systems folk: " Distributed systems folks. These people vacilla...

Morty: Scaling Concurrency Control with Re-Execution

Image
This EuroSys '23 paper reads like an SOSP best paper. Maybe it helped that EuroSys 2023 was in Rome. Academic conferences are more enjoyable when the venue doubles as a vacation. The Problem Morty tackles a fundamental question: how can we improve concurrency under serializable isolation (SER), especially without giving up on interactive transactions? Unlike deterministic databases (e.g., Calvin ) that require transactions to declare read and write sets upfront, Morty supports transactions that issue dynamic reads and writes based on earlier results. Transactional systems, particularly in geo-replicated settings, struggle under contention. High WAN latency stretches transaction durations, increasing the window for conflicts. The traditional answer is blind exponential backoff, but that leads to low CPU utilization. TAPIR and Spanner replicas often idle below 17% under contention as Morty's evaluation experiments show. Morty's approach to tackle the problem is to start from...

Chapter 2: Serializability Theory (Concurrency Control Book)

Image
Chapter 2 of Concurrency Control and Recovery in Database Systems (1987) by Bernstein, Hadzilacos, and Goodman is a foundational treatment of serializability theory. It is precise, formal, yet simple and elegant, a rare combination for foundational theory in a systems domain. Databases got lucky here: serializability theory is both powerful and clean. The chapter builds up the theory step by step, introducing: Histories Serializable histories The Serializability Theorem Recoverability and its variants Generalized operations beyond reads/writes View equivalence Each section motivates definitions clearly, presents tight formalism, and illustrates ideas with well-chosen examples. 2.1 Histories This section lays the groundwork. It starts slow, and doesn't do anything fancy. It first defines what it means for the operations within a transaction to form a well-founded partial order. This intra-transaction ordering extends naturally to inter-transaction operations, forming a superset rel...

Smart Casual Verification of the Confidential Consortium Framework

Image
This paper (NSDI'25) applies lightweight formal methods (hence the pun "smart casual" in contrast to formal attire) to the Confidential Consortium Framework (CCF). CCF is an open-source platform for trustworthy cloud applications, used in Microsoft's Azure Confidential Ledger service. The authors combine formal specification, model checking, and automated testing to validate CCF's distributed protocols, specifically its custom consensus protocol. Background CCF uses trusted execution environments (TEEs) and state machine replication. Its consensus protocol is based on Raft but has major modifications: signature transactions (Merkle tree root over the whole log  signed by the leader), optimistic acknowledgments, and partition leader step-down. CCF also avoids RPCs, using a uni-directional messaging layer. These changes add complexity, requiring formal verification. At the time of writing, CCF's implementation spans 63 kLoC in C++. The authors define five requi...

UB Hacking 2024

I attended the University at Buffalo Hacking event over the weekend. It was fun. There were 90+ projects, I judged 15 projects. There were some interesting talks as well. It was good to see youth energy. It feels good to teach next generation something. Another thing,  GeoGuessr played as a group game under time pressure is a lot of fun. This may be a great family activity. Why should you care about proofs, if all you want to do is coding? Atri and Andrew decided this would be a good talk to give at a hackathon. Daring! They did a good job imparting their passion about the joys and benefits of mathematical thinking. They talked about Paul Erdos 's the book of proofs concept, and the difference between a  correct proof versus a great proof from "the book". They talked about the deep insight that you can achieve through an abstract mathematical thinking. They also mentioned that if you don't have good insight to the problem or your program, you will have a hard time de...

Optimizing Distributed Protocols with Query Rewrites

Image
This paper (Sigmod 2024) formalizes scalability optimizations as rule-driven rewrites, inspired by SQL query optimizations. It focuses on two well-known and popular techniques: decoupling (distributing logic/code across nodes for introducing pipeline parallelism) and partitioning (distributing data across nodes--no new components, but instances of them-- for workload parallelism). Whittaker et al. (hi!) applied decoupling and partitioning optimizations to Paxos in the Compartmentalized Paxos work. That work deconstructed Paxos and showed how to reconstruct it using decoupling and partitioning to be more scalable by individually focusing on each component/role in Paxos. This is a simple but effective trick. Even after you learn the trick, you still keep getting surprised by how effective it is. The current paper aims to do this application via query rewrite rules more methodically rather than the traditional/ad-hoc way.  To this end, the paper utilizes Dedalus, a Datalog¬ dialect ...

The demise of coding is greatly exaggerated

Image
NVDIA CEO Jensen Huang recently made very contraversial remarks : "Over the course of the last 10 years, 15 years, almost everybody who sits on a stage like this would tell you that it is vital that your children learn computer science, and everybody should learn how to program. And in fact, it’s almost exactly the opposite. It is our job to create computing technology such that nobody has to program and that the programming language is human. Everybody in the world is now a programmer.  This is the miracle of artificial intelligence." I am not going to wise crack and say that this is power poisioning and this is what happens when your company valuation more than triples in a year and surpasses Amazon and Google. (Although I don't discount this effect completely.) Jensen is very smart and also has some great wisdom , so I think we should give this the benefit of doubt and try to respond in a thoughtful manner.  A response is warranted because this statement got a lot of p...

Cabbage, Goat, and Wolf Puzzle in TLA+

Image
Over the past couple of months, I have been harping on the value of TLA+ for teaching engineers the art of abstraction . It is important to emphasize that this is an art, not a science, and it is best learned through studying examples and practicing hands-on with modeling. TLA+ excels in providing rapid feedback on your modeling and designs, which facilitates this learning process significantly. Modeling the "cabbage, goat, and wolf" puzzle taught me that tackling real/physical-world scenarios is a great way to practice abstraction and design -- cutting out the clutter and focusing on the core challenge. How will you represent this world? What will be your actors, objects, and the relations between them? Will one approach result in a cleaner protocol model than another? Or, as I like to ask, does the protocol flow cleanly from the model? The puzzle description Here is the puzzle. A farmer with a wolf, a goat, and a cabbage must cross a river by boat. The boat can carry only t...

Popular posts from this blog

Hints for Distributed Systems Design

My Time at MIT

Scalable OLTP in the Cloud: What’s the BIG DEAL?

Foundational distributed systems papers

Advice to the young

Learning about distributed systems: where to start?

Distributed Transactions at Scale in Amazon DynamoDB

Making database systems usable

Looming Liability Machines (LLMs)

Analyzing Metastable Failures in Distributed Systems