Posts

TLA+ as a Design Accelerator: Lessons from the Industry

Image
After 15+ years of using TLA+, I now think of it is a design accelerator. One of the purest intellectual pleasures is finding a way to simplify and cut out complexity. TLA+ is a thinking tool that lets you do that. TLA+ forces us out of implementation-shaped and operational reasoning into mathematical declarative reasoning about system behavior. Its global state-transition model and its deliberate fiction of shared memory make complex distributed behavior manageable. Safety and liveness become clear and compact predicates over global state. This makes TLA+ powerful for design discovery. It supports fast exploration of protocol variants and convergence on sound designs before code exists. TLA+ especially shines for distributed/concurrent complex systems. In such systems, complexity exceeds human intuition very quickly. (I often point out to very simple interleaving/nondeterministic execution puzzles to show how much we suck at reasoning about such systems.) Testing is inadequate for su...

Building a Database on S3

Image
Hold your horses, though. I'm not unveiling a new S3-native database. This paper is from 2008. Many of its protocols feel clunky today. Yet it nails the core idea that defines modern cloud-native databases: separate storage from compute. The authors propose a shared-disk design over Amazon S3, with stateless clients executing transactions. The paper provides a blueprint for serverless before the term existed. SQS as WAL and S3 as Pagestore The 2008 S3 was painfully slow, and 100 ms reads weren't unusual. To hide that latency, the database separates "commit" from "apply". Clients write small, idempotent redo logs to Amazon Simple Queue Service (SQS) instead of touching S3 directly. An asynchronous checkpoint by a client applies those logs to B-tree pages on S3 later. This design shows strong parallels to modern disaggregated architectures . SQS becomes the write-ahead log (WAL) and logstore. S3 becomes the pagestore. Modern Aurora follows a similar logic : t...

800th blog post: Write that Blog!

Image
I had given an email interview to the "Write That Blog!" newsletter. That came out today , which coincided with my 800th blog post. I am including my answers also here.  Why did you start blogging – and why do you continue? In 2010, when I was a professor, one of my colleagues in the department was teaching a cloud computing seminar. I wanted to enter that field coming from theory of distributed systems, and later wireless sensor networks fields. So I attended the seminar. As I read the papers, I started blogging about them. That is how I learn and retain concepts better, by writing about them. Writing things down helps crystalize ideas for me. It lets me understand papers more deeply and build on that understanding.  The post on MapReduce , the first paper discussed in the seminar, seems to have opened the floodgates of my blogging streak, which has been going strong for 15 years. I think a big influence on me has been  the EWD documents . I remember the day I came acros...

Popular posts from this blog

Hints for Distributed Systems Design

The F word

The Agentic Self: Parallels Between AI and Self-Improvement

Learning about distributed systems: where to start?

Foundational distributed systems papers

Cloudspecs: Cloud Hardware Evolution Through the Looking Glass

Advice to the young

Agentic AI and The Mythical Agent-Month

Are We Becoming Architects or Butlers to LLMs?

Welcome to Town Al-Gasr