Posts

BugBash'26 Morning of Day 1

Continuing with notes from the BugBash talks. Yes, all of this goodness, including Will Wilson's keynote was before lunch the first day. Where all the ladders start Peter Alvaro, Associate Professor of Computer Science @ UC Santa Cruz In this talk, Peter reflects back on his 20 years of distributing systems work. The cover image is Don Quixote (which is Peter)  attacking the windmill (robust distributed systems) with a spear (which is some singular solution often borrowed from databases). The first attack was through the use of arcane algebras . This is a purist approach of getting it right the first time. This was during Peter's PhD at UC Berkeley, where Neil Conway was also a peer and collaborator. In his own admission, this was incited by a naive framing around what makes distributed systems difficult? The target was uncertainty regarding order and timing, which cause distributed consistency problems, and require  coordination. But distributed coordination comes at the ...

BugBash'26 Keynote: We won, what now?

Image
I attended the BugBash 2026 these last two days, and had a blast. Here are my notes from the first keynote. I will try to find time to publish my notes from the other talks in the coming days. Keynote: We won, what now? Will Wilson, Co-founder & CEO @ Antithesis The Antithesis team opened with a great animation/teaser clip, then Will took the stage. Here is the summary of his talk.  This is not a software testing conference. This is about building reliable software by any means: testing, observability, formal methods, people/culture, better languages. He shows a meme of the five planeteers invoking Captain Planet using their rings. It is time to acknowledge the elephant in the room.  A new contender emerges: AI!! We are now taking a fundamentally unreliable system (AI) to make the systems we are developing reliable. And it is working somehow?! There is a vibe quality to it. When the cost of software generation goes down drastically, you can do a whole lot of it as per Jev...

OSTEP Chapter 14: Interlude -- Memory API

Image
This is a short chapter covering the nuts and bolts of memory allocation in C: malloc(), free(), and the many ways programmers get them wrong. This is part of our series going through OSTEP book chapters. The OSTEP textbook is freely available at Remzi's website if you like to follow along. Stack vs. Heap C gives you two kinds of memory. Stack memory  is automatic: the compiler allocates it when you enter a function and reclaims it when you return. Heap memory  is manual: you allocate it with malloc()  and free it with free() . Let's remember the layout from Chapter 13. The distinction is simple in principle: use the stack for short-lived local data, use the heap for anything that must outlive the current function call. The heap is where the trouble lives. It forces the programmer to reason about object lifetimes at every allocation site. The compiler won't save you; a C program with memory bugs compiles and runs just fine, until it doesn't. The API malloc(size_t siz...

PolitePaxos: A New Consensus Variant Where the Proposer Just Asks Nicely

Image
Paxos consensus protocol, despite its many theoretical virtues, is fundamentally rude. One need only look at the way it behaves to see the problem. A leader seizes power. It dictates values. When two leaders happen to propose simultaneously, they do not pause, tip their hats, and work things out over tea. No,  they duel, each furiously incrementing ballot numbers at the other like barbarians engaging in a perpetual pissing contest (please excuse my language). The follower nodes, meanwhile, are reduced to the role of obedient subjects, promising their allegiance to whichever proposer shouts loudest and most recently. Having spent decades studying this rude behavior (with WPaxos , PigPaxos , and through a great many posts on this very blog ) I became convinced that the field had made a fundamental error. We had been asking "how do we reach agreement?" when the real question we should have been asking, all along, was "would it kill you to ask nicely?" It is therefor...

Measuring AI Ability to Complete Long Software Tasks

Image
This paper from METR (Model Evaluation & Threat Research) introduces a new metric for tracking AI progress: the "50%-task-completion time horizon". This denotes the length of software engineering task (measured by how long a skilled human developer takes to complete it) that the AI model can finish with 50% success rate. The researchers evaluated 12 frontier AI models on 170 tasks across three benchmarks: HCAST (97 diverse software tasks ranging from 1 minute to 30 hours), RE-Bench (7 difficult ML research engineering tasks, each 8 hours), and SWAA (66 short software actions taking 1-30 seconds). To calibrate task difficulty, they collected over 800 human baselines from professional developers, totaling 2,529 hours of work. The headline finding is that this time horizon has been doubling every 7 months since 2019. GPT-2 could handle tasks that take humans about 2 seconds. The o3 model reached 110 minutes. Extrapolating this trend, AI reaches a one-month time horizon (167...

OSTEP Chapter 13: The Abstraction of Address Spaces

Image
Chapter 13 of OSTEP provides a primer on how and why modern operating systems abstract physical hardware. This is part of our series going through OSTEP book chapters.  The OSTEP textbook is freely available at Remzi's website if you like to follow along. Multiprogramming and Time Sharing In the early days of computing, machines didn't provide much of a memory abstraction to users. The operating system was essentially a library of routines sitting at the bottom of physical memory, while a single running process occupied the rest of the available space.  I am not that old, but I got to experience this in 2002 through programming sensor nodes with TinyOS . These "motes" were operating on extremely constrained hardware with just 64KB of memory, so there was no complex virtualization. The entire OS and the application were compiled together as a single process, and all memory was statically preallocated. Ah, the joy of debugging a physically distributed multi-node deplo...

SysMoBench: Evaluating AI on Formally Modeling Complex Real-World Systems

Image
This paper presents SysMoBench, a benchmark designed to evaluate generative AI's ability to formally model complex concurrent and distributed systems. Although the paper is published on January 2026, the AI landscape moves so fast that the models evaluated (like Claude-Sonnet-4 and GPT-5) already feel dated, after the release of heavy hitters like Claude 3.5 Opus or OpenAI's Codex.  The paper draws a distinction between algorithms/protocols and system modeling. As the paper (somewhat circularly) defines it, "system models enable verification of system code via comprehensive testing and model checking". The paper says: Modeling these systems requires the AI to deeply understand the system design, reason about safety and liveness under unexpected faults, and abstract system behavior into an executable program. This paper is a great paper to follow up my post yesterday on TLA+ Mental Models . It is instructive to see how the paper corroborates my claims in that post. My ...

TLA+ mental models

Image
In the age of LLMs, syntax is no longer the bottleneck for writing, reading, or learning TLA+. People are even getting value by generating TLA+ models and counterexamples directly from Google Docs descriptions of the algorithms. The accidental complexity of TLA+ (its syntax and tooling) is going away. But the intrinsic complexity remains: knowing where to start a model, what to ignore, and how to choose the right abstractions. This is modeling judgment, and it is the hardest skill to teach. Engineers are trained to think in code, control flow, and local state. TLA+ forces you into a different mode: mathematical, declarative, and global. You specify what must hold, not how to achieve it. Once you get comfortable with this shift, it changes how you think about systems, even away from the keyboard. In a companion post , I described TLA+ as a design accelerator based on lessons from 8 industry projects. Here I want to go deeper and articulate the mental models behind effective TLA+ use. Th...

Beat Paxos

Image
As I mentioned in my previous blog post, I recently got my hands on Claude Code. In the morning, I used it to build a  Hybrid Logical Clocks (HLC) visualizer . That evening, I couldn't pull myself away and decided to model something more ambitious.  I prompted Claude Code to design a Paxos tutorial game, BeatPaxos, where the player tries to "beat" the Paxos algorithm by killing one node at a time and slowing nodes down. Spoiler alert: you cannot violate Paxos safety (agreement). The best you can do is delay the decision by inducing a series of well-timed failures, and that is how you increase your score. Thinking of the game idea and semantics was my contribution, but the screen design and animations were mostly Claude’s. I do claim credit for the " red , green , blue " theme for the nodes and the colored messages; those worked well for visualization. I also specified that double-clicking a node kills it, and that the player cannot kill another node un...

Claude Code experiment: Visualizing Hybrid Logical Clocks

Image
Yesterday morning I downloaded Claude Code, and wanted to see what this bad boy can do. What better way to learn how this works than coding up a toy example with it. The first thing that occurred to me was to build a visualizer for Hybrid Logical Clocks (HLC). HLC is a simple idea we proposed in 2014 : combine physical time with a logical counter to get timestamps that are close to real time but still safe under clock skew. With HLC, you get the best of both worlds: real-time affinity augmented with causality when you need it. Since then HLC has been adopted by many distributed databases, including MongoDB, CockroachDB, Amazon Aurora DSQL, YugabyteDB, etc. This felt well scoped for a first project with Claude Code. Choosing Javascript enabled me to host this on Github (Git Pages) for free. Easy peezy way of sharing something small yet useful with people.  Claude Code is a clever idea. It is essentially an agent wrapped around the Claude LLM. Chat works well for general Q/A, but it ...

Popular posts from this blog

Hints for Distributed Systems Design

The F word

The Agentic Self: Parallels Between AI and Self-Improvement

Learning about distributed systems: where to start?

Foundational distributed systems papers

Are We Becoming Architects or Butlers to LLMs?

Building a Database on S3

Advice to the young

Cloudspecs: Cloud Hardware Evolution Through the Looking Glass

End of Productivity Theater