Posts

OSTEP Chapters 6,7

Image
How does your computer create the illusion of running dozens of applications simultaneously when it only has a few physical cores? Wait, I forgot the question because I am now checking my email. Ok, back to it... The answer is CPU Virtualization. Chapters 6, 7 of OSTEP explore the engine behind this illusion, and how to balance raw performance with absolute control. The OSTEP textbook is freely available at Remzi's website if you like to follow along. Chapter 6. The Mechanism: Limited Direct Execution The crux of the challenge is: How do we run programs efficiently without letting them takeover the machine?  The solution is Limited Direct Execution (LDE) --the title spoils it. "Direct Execution" means the program runs natively on the CPU for maximum speed. "Limited" means the OS retains authority to stop the process and prevent restricted access. This requires some hardware support. To prevent chaos, hardware provides two execution modes. Applications run in ...

The F word

Back in 2005, when I first joined the SUNY Buffalo CSE department, the department secretary was a wonderful lady named Joann, who was over 60. She explained that my travel reimbursement process was simple: I'd just hand her the receipts after my trip, she'd fill out the necessary forms, submit them to the university, and within a month, the reimbursement check would magically appear in my department mailbox. She handled this for every single faculty member, all while managing her regular secretarial duties. Honestly, despite the 30-day turnaround, it was the most seamless reimbursement experience I've ever had. But over time the department grew, and Joann moved on. The university partnered with Concur, as corporations do, forcing us to file our own travel reimbursements through this system. Fine, I thought, more work for me, but it can't be too bad. But, the department also appointed a staff member to audit our Concur submissions. This person's job wasn't to hel...

OSTEP Chapters 4,5

Image
I recently started reading "Operating Systems: Three Easy Pieces" (OSTEP) as part of Phil Eaton's offline reading group . We are tackling a very doable pace of 2 chapters a week. The book is structured into three major parts: Virtualization, Concurrency, and Persistence. It is openly accessible to everyone for free, which is a tremendous contribution to computer science education by the Arpaci-Dusseau couple (Remzi and Andrea). This is a very user-friendly book, sprinkled with a lot of jokes and asides that keep the mood light. The fourth wall is broken upfront, the authors talk directly to you, which is great. It makes it feel like we are learning together rather than being lectured at. Their approach is more than superficial; it inspires you, motivates the problems, and connects them to the big picture context. The book  builds scaffolding through "The Crux" of the problem and "Aside" panels. It actively teaches you the thought processes, not just t...

CockroachDB Serverless: Sub-second Scaling from Zero with Multi-region Cluster Virtualization

Image
This paper describes the architecture behind CockroachDB Serverless. At first glance, the design can feel like cheating. Rather than introducing a new physically disaggregated architecture with log and page stores, CRDB retrofits its existing implementation through logical disaggregation: It splits the binary into separate SQL and KV processes and calls it serverless. But dismissing this as fake disaggregation would miss the point. I came to appreciate this design choice as I read the paper (a well written paper!). This logical disaggregation (the paper calls it cluster virtualization) provides a pragmatic evolution of the shared-nothing model. CRDB pushes the SQL–KV boundary (as in systems like TiDB and FoundationDB) to its logical extreme to provide the basis for a multi-tenant storage layer. From here on, they solve the sub-second cold starts problem and admission control problems with good engineering rather than an architectural overhaul. System Overview If you split the stack at...

Welcome to Town Al-Gasr

Image
Al-Gasr began as an autonomous agent town, but no one remembers now who deployed it. The original design documents were very clear. There were tasks. There were agents. There was persistence. Everything else had been added later by a minister's cousin. Al-Gasr ran on nine ministries. The Ministry of Compute handled execution, except when it didn't, in which case responsibility was transferred to the Ministry of Storage Degradation. The Ministry of Truth published daily bulletins. The Ministry of Previously Accepted Truth issued corrections. The Ministry of Future Truth prepared explanations in advance. Each ministry employed agents whose sole job was to supervise agents supervising their own nephews. At the top sat the Emir. Or possibly the late Emir. Or the Emir-in-Exile, depending on which dashboard you trusted. The system maintained three Emirs simultaneously to ensure high availability. This caused no confusion at all. The Emir du Jour governed by instinct and volume. Each ...

Agentic AI and The Mythical Agent-Month

Image
The premise of this position paper is appealing . We know Brooks' Law : adding manpower to a late software project makes it later. That is, human engineering capacity grows sub-linearly with headcount due to communication overhead and ramp-up time. The authors propose that AI agents offer a loophole: "Scalable Agency". Unlike humans, agents do not need days/weeks to ramp up, they load context instantly. So, theoretically, you can spin up 1,000 agents to explore thousands of design hypotheses in parallel, compressing the Time to Integrate (TTI: duration required to implement/integrate new features/technologies into infrastructure systems) for complex infrastructure from months to days. The paper calls this vision Self-Defining Systems (SDS), and suggests that thanks to Agentic AI future infrastructure will design, implement, and evolve itself. I began reading with great excitement, but by the final sections my excitement soured into skepticism. The bold claims of the intro...

Rethinking the University in the Age of AI

Image
Three years ago, I wrote a post titled "Getting schooled by AI, colleges must evolve" . I argued that as we entered the age of AI, the value of "knowing" was collapsing, and the value of "doing" was skyrocketing. (See Bloom's taxonomy. ) Today, that future has arrived. Entry-level hiring has stalled because AI agents absorb the small tasks where new graduates once learned the craft. So how do we prepare students for this reality? Not only do I stand by my original advice, I am doubling down. Surviving this shift requires more than minor curriculum tweaks; it requires a different philosophy of education. I find two old ideas worth reviving: a systems design mindset that emphasizes holistic foundations , and alternative education philosophies of the 1960s that give students real agency and real responsibility. Holistic Foundations Three years ago, I begged departments: "Don't raise TensorFlow disk jockeys. Teach databases! Teach compilers! Tea...

Cloudspecs: Cloud Hardware Evolution Through the Looking Glass

Image
This paper (CIDR'26) presents a comprehensive analysis of cloud hardware trends from 2015 to 2025, focusing on AWS and comparing it with other clouds and on-premise hardware. TL;DR: While network bandwidth per dollar improved by one order of magnitude (10x), CPU and DRAM gains (again in performance per dollar terms) have been much more modest. Most surprisingly, NVMe storage performance in the cloud has stagnated since 2016. Check out the NVMe SSD discussion below for data on this anomaly. CPU Trends Multi-core parallelism has skyrocketed in the cloud. Maximum core counts have increased by an order of magnitude over the last decade. The largest AWS instance u7in now boasts 448 cores. However, simply adding cores hasn't translated linearly into value. To measure real evolution, the authors normalized benchmarks (SPECint, TPC-H, TPC-C) by instance cost. SPECint benchmarking shows that cost-performance improved roughly 3x over ten years. A huge chunk of that gain comes from AWS G...

The Sauna Algorithm: Surviving Asynchrony Without a Clock

Image
While sweating it out in my gym's sauna recently, I found a neat way to illustrate the happened-before relationship in distributed systems. Imagine I suffer from a medical condition called dyschronometria , which makes me unable to perceive time reliably, such that 10 seconds and 10 minutes feel exactly the same to me. In this scenario, the sauna lacks a visible clock. I'm flying blind here, yet I want to leave after a healthy session. If I stay too short, I get no health benefits. If I stay too long, I risk passing out on the floor. The question becomes: How do I, a distributed node with no local clock, ensure operating within a safety window in an asynchronous environment? Thankfully, the sauna has a uniform arrival of people. Every couple of minutes, a new person walks in. These people don't suffer from dyschronometria and they stay for a healthy session, roughly 10 minutes. My solution is simple: I identify the first person to enter after me, and I leave when he leaves....

Are Database System Researchers Making Correct Assumptions about Transaction Workloads?

Image
In this blog, we had reviewed quite a number of deterministic database papers, including Calvin , SLOG , Detock , which aimed to achieve higher throughput and lower latency. The downside of these systems is sacrificing transaction expressivity. They rely on two critical assumptions: first, that transactions are "non-interactive", meaning they are sent as a single request (one-shot) rather than engaging in a multi-round-trip conversation with the application, and second, that the database can know a transaction's read/write set before execution begins (to lock data deterministically). So when these deterministic database researchers write a paper to validate how these assumptions hold in the real world, we should be skeptical and cautious in our reading. Don't get me wrong, this is a great and valuable paper. And we still need to be critical in our reading.  Summary The study employed a semi-automated annotation tool to analyze 111 popular open-source web applications...

Too Close to Our Own Image?

Image
Recent work suggests we may be projecting ourselves onto LLMs more than we admit. A paper in Nature reports that GPT-4 exhibits "state anxiety". When exposed to traumatic narratives (such as descriptions of accidents or violence), the model's responses score much higher on a standard psychological anxiety inventory. The jump is large, from "low anxiety" to levels comparable to highly anxious humans. The same study finds that therapy works: mindfulness-style relaxation prompts reduce these scores by about a third, though not back to baseline. The authors argue that managing an LLM's emotional state may be important for safe deployment, especially in mental health settings and perhaps in other mission-critical domains. Another recent paper argues that LLMs can develop a form of brain rot. Continual training on what the authors call junk data (short, viral, sensationalist content typical of social media) leads to models developing weaker reasoning, poorer lon...

The Agentic Self: Parallels Between AI and Self-Improvement

2025 was the year of the agent. The goalposts for AGI shifted; we stopped asking AI to merely "talk" and demanded that it "act". As an outsider looking at the architecture of these new agents and agentic system, I noticed something strange. The engineering tricks used to make AI smarter felt oddly familiar. They read less like computer science and more like … self-help advice . The secret to agentic intelligence seems to lie in three very human habits: writing things down, talking to yourself, and pretending to be someone else. They are almost too simple. The Unreasonable Effectiveness of Writing One of the most profound pieces of advice I ever read as a PhD student came from Prof. Manuel Blum, a Turing Award winner. In his essay "Advice to a Beginning Graduate Student", he wrote: "Without writing, you are reduced to a finite automaton. With writing you have the extraordinary power of a Turing machine." If you try to hold a complex argument enti...

Rethinking the Cost of Distributed Caches for Datacenter Services

Image
This paper (HOTNETS'25) re-teaches a familiar systems lesson: caching is not just about reducing latency, it is also about saving CPU! The paper makes this point concrete by focusing on the second-order effect that often dominates in practice: the monetary cost of computation. The paper shows that caching --even after accounting for the cost of DRAM you use for caching-- still yields 3–4x better cost efficiency thanks to the reduction in CPU usage. In today's cloud pricing model, that CPU cost dominates. DRAM is cheap. Well, was cheap... The irony is that after this paper got presented, the DRAM prices jumped by 3-4x ! Damn Machine Learning ruining everything since 2018! Anyways, let's ignore that point conveniently to get back to the paper. Ok, so caches do help, but when do they help the most? Many database-centric or storage-side cache designs miss this point. Even when data is cached at the storage/database cache, an application read still needs to travel there, pay fo...

Randomer Things

I aspire to get bored in the new year I've realized that chess has been eating my downtime. Because it lives on my phone (Lichess), it is frictionless to start a bullet game, and get a quick dopamine hit. The problem is that I no longer get bored. That is bad. I need to get bored so I can start to imagine, daydream, think, self-reflect, plan, or even get mentally prepared for things (like the Stoics talked about). I badly need that empty space back. So bye chess. Nothing personal. I will play only when teaching/playing with my daughters. I may occasionally cheat and play a bullet game on my wife's phone. But no more chess apps on my phone. While I was at it, I installed the  Website Blocker extension for Chrome. I noticed my hands typing reddit or twitter at the first hint of boredom. The blocker is easy to disable, but that is fine. I only need that slight friction to catch myself before opening the site on autopilot. I am disappointed by online discourse In 2008, Reddit had a...

LeaseGuard: Raft Leases Done Right!

Image
Many distributed systems have a leader-based consensus protocol at their heart. The protocol elects one server as the "leader" who receives all writes. The other servers are "followers", hot standbys who replicate the leader’s data changes. Paxos and Raft are the most famous leader-based consensus protocols. These protocols ensure consistent state machine replication , but reads are still tricky. Imagine a new leader L1 is elected, while the previous leader L0 thinks it's still in charge. A client might write to L1, then read stale data from L0, violating Read Your Writes . How can we prevent stale reads? The original Raft paper recommended that the leader communicate with a majority of followers before each read, to confirm it's the real leader. This guarantees Read Your Writes but it's slow and expensive. A leader lease is an agreement among a majority of servers that one server will be the only leader for a certain time. This means the leader can run...

Popular posts from this blog

Hints for Distributed Systems Design

TLA+ modeling tips

Foundational distributed systems papers

Optimize for momentum

The Agentic Self: Parallels Between AI and Self-Improvement

Learning about distributed systems: where to start?

Advice to the young

Cloudspecs: Cloud Hardware Evolution Through the Looking Glass

My Time at MIT

Agentic AI and The Mythical Agent-Month