Posts

Vive la Difference: Practical Diff Testing of Stateful Applications

Image
This Google paper (to appear in VLDB'25) is about not blowing up your production system. That is harder than it sounds, especially with stateful applications with memories. When rolling out new versions of stateful applications, the "shared, persistent, mutable" data means bugs can easily propagate across versions. Modern rollout tricks (canaries, blue/green deployments) don't save you from this. Subtle cross-version issues often slip through pre-production testing and surface in production, sometimes days or weeks later. These bugs can be severe, and the paper categorizes them as data corruption, data incompatibility, and false data assumptions. The paper mentions real-world incidents from Google and open-source projects to emphasize these bugs' long detection and resolution times, and the production outages and revenue loss they cause. So, we need tooling that directly tests v1/v2 interactions on realistic data before a rollout. The paper delivers a prototype o...

Towards Optimal Transaction Scheduling

Image
This paper (VLDB'2024) looks at boosting transaction throughput through better scheduling. The idea is to explore the schedule-space more systematically and pick execution orders that reduce conflicts. The paper's two main contributions are a scheduling policy called Shortest Makespan First (SMF) and a MVTSO concurrency control variant called MVSchedO. SMF uses a greedy heuristic to pick low-conflict schedules that minimize the increase in total execution time (makespan) at each step. MVSchedO enforces the chosen schedule at a fine-grained level by adapting multi-version timestamp ordering (MVTSO). The authors implement SMF and MVSchedO in RocksDB (R-SMF) and show up to 3.9x higher throughput and 3.2x lower tail latency on benchmarks as well as Meta's TAO workload. I mean, this is a stellar systems work, and it makes a convincing case that search-based scheduling is a promising direction for extracting higher throughput from database systems. Motivation Reducing contention...

Neurosymbolic AI: The 3rd Wave

Image
The paper (arXiv 2020, also AI review 2023) opens up with discussing recent high-profile AI debates: the Montréal AI Debate and the AAAI 2020 fireside chat with Kahneman, Hinton, LeCun, and Bengio. A consensus seems to be emerging: for AI to be robust and trustworthy, it must combine learning with reasoning. Kahneman's "System 1 vs. System 2" dual framing of cognition maps well to deep learning and symbolic reasoning. And AI needs both. Neurosymbolic AI promises to  combine data-driven learning with structured reasoning, and provide modularity, interpretability, and measurable explanations. The paper moves from philosophical context to representation, then to system design and technical challenges in neurosymbolic AI.  Neurons and Symbols: Context and Current Debate This section lays out the historic divide within symbolic AI and neural AI. Symbolic approach supports logic, reasoning, and explanation. Neural approach excels at perception and learning from data. Symbolic...

Neurosymbolic AI: Why, What, and How

Image
The paper (2023) argues for integrating two historically divergent traditions in artificial intelligence (neural networks and symbolic reasoning) into a unified paradigm called Neurosymbolic AI. It argues that the path to capable, explainable, and trustworthy artificial intelligence lies in marrying perception-driven neural systems with structure-aware symbolic models.  The authors lean on Daniel Kahneman’s story of two systems in the mind ( Thinking Fast and Slow ). Neural networks are the fast ones: pattern-hungry, intuitive, good with unstructured mess. Symbolic methods are the slow ones: careful, logical, good with rules and plans. Neural networks, especially in their modern incarnation as large language models (LLMs), excel at pattern recognition, but fall short in tasks demanding multi-step reasoning, abstraction, constraint satisfaction, or explanation. Conversely, symbolic systems offer interpretability, formal correctness, and composability, but tend to be brittle (not in...

Can a Client–Server Cache Tango Accelerate Disaggregated Storage?

Image
This paper from HotStorage'25 presents OrcaCache, a design proposal for a coordinated caching framework tailored to disaggregated storage systems. In a disaggregated architecture, compute and storage resources are physically separated and connected via high-speed networks. These became increasingly common in modern data centers as they enable flexible resource scaling and improved fault isolation. (Follow the money as they say!) But accessing remote storage introduces serious latency and efficiency challenges. The paper positions OrcaCache as a solution to mitigate these challenges by orchestrating caching logic across clients and servers. Important note: in the paper's terminology the server means the storage node, and the client means the compute node. As we did last week for another paper , Aleksey and I live-recorded our reading/discussion of this paper. We do this to teach t he thought-process and mechanics of how experts read papers in real time. Check our discussion vi...

Transaction Healing: Scaling Optimistic Concurrency Control on Multicores

Image
This paper from SIGMOD 2016 proposes a transaction healing approach to improve the scalability of Optimistic Concurrency Control (OCC) in main-memory OLTP systems running on multicore architectures. Instead of discarding the entire execution when validation fails, the system repairs only the inconsistent operations to improve throughput in high-contention scenarios. If this sounds familiar, it's because we recently reviewed the Morty paper from EuroSys 2023 , which applied healing ideas to interactive transactions using continuations to support re-execution. This 2016 Transaction Healing paper is scoped to static stored procedures, and focuses more on integrating healing into OCC for stored procedures.  Key Ideas OCC works well under low contention because it separates reads from writes and  keeps critical sections short (only for validation). But under high contention, especially in workloads with skewed access patterns (like Zipfian distributions), transactions are frequent...

Analysing Snapshot Isolation

Image
This paper (PODC'2016) presents a clean and declarative treatment of Snapshot Isolation (SI) using dependency graphs. It builds on the foundation laid by prior work, including the SSI paper we reviewed recently , which had already identified that SI permits cycles with two adjacent anti-dependency (RW) edges, the so-called inConflict and outConflict edges. While the SSI work focused on algorithmic results and implementation, this paper focuses more on the theory (this is PODC after all) of defining a declarative dependency-graph-based model for SI. It strips away implementation details such as commit timestamps and lock management, and provides a purely symbolic framework. It also proves a soundness result (Theorem 10), and leverages the model for two practical static analyses: transaction chopping and robustness under isolation-level weakening. Soundness result and dependency graph model Let's begin with Theorem 10, which establishes both the soundness and completeness of the...

Recent reads (July 2025)

Image
 I know I should call this recent listens, but I am stuck with the series name. So here it goes. These are some recent "reads" this month. Billion Dollar Whale Reading the Billion Dollar Whale was exhausting. I am not talking about the writing, which was well-paced and packed with a lot of detail. The problem is the  subject, Jho Low , who is a slippery and soulless character, who conned Malaysia out of billions via the 1Malaysia Development Berhad sovereign wealth fund. Jho Low is a Wharton grad. He is a big  spender and party boy. Dropping millions of dollars a night for gambling and partying. His party buddies included Leonardo DiCaprio, Paris Hilton, and Jamie Foxx. Jho was a showoff and pretentious ass. What does Wharton teach these people? Do they actively recruit for this type of people? Jho was aided by the complicity of Prime Minister Najib Razak and his luxury-addicted wife. We are talking entire stores shut down for private shopping and flights hauling nothin...

Real Life Is Uncertain. Consensus Should Be Too!

Image
Aleksey and I sat down to read this paper on Monday night. This was an experiment which aimed to share how experts read papers in real time. We haven't read this paper before to keep things raw. As it is with research, we ended up arguing with the paper (and between each other) back and forth. It was messy, and it was also awesome. We had a lot of fun. Check our discussion video below (please listen at 1.5x, I sound less horrible at that speed, ah also this thing is 2 hours long). The paper I annotated during our discussion is also available here. This paper appeared in HotOS 2025 , so it is very recent. It's a position paper arguing that the traditional F-threshold fault model in consensus protocols is outdated and even misleading. Yes, the F-threshold fault model does feel like training wheels we never took off. In his essay " the joy of sects" , Pat Helland bring this topic to tease distributed systems folk: " Distributed systems folks. These people vacilla...

Morty: Scaling Concurrency Control with Re-Execution

Image
This EuroSys '23 paper reads like an SOSP best paper. Maybe it helped that EuroSys 2023 was in Rome. Academic conferences are more enjoyable when the venue doubles as a vacation. The Problem Morty tackles a fundamental question: how can we improve concurrency under serializable isolation (SER), especially without giving up on interactive transactions? Unlike deterministic databases (e.g., Calvin ) that require transactions to declare read and write sets upfront, Morty supports transactions that issue dynamic reads and writes based on earlier results. Transactional systems, particularly in geo-replicated settings, struggle under contention. High WAN latency stretches transaction durations, increasing the window for conflicts. The traditional answer is blind exponential backoff, but that leads to low CPU utilization. TAPIR and Spanner replicas often idle below 17% under contention as Morty's evaluation experiments show. Morty's approach to tackle the problem is to start from...

Popular posts from this blog

Hints for Distributed Systems Design

My Time at MIT

Advice to the young

Scalable OLTP in the Cloud: What’s the BIG DEAL?

Foundational distributed systems papers

Learning about distributed systems: where to start?

Distributed Transactions at Scale in Amazon DynamoDB

Making database systems usable

Looming Liability Machines (LLMs)

Analyzing Metastable Failures in Distributed Systems