The Two Abstractions of System Design: Hide or Reduce

When talking about TLA+, I keep referring to "abstraction" as the most important thing to learn. And it is about the hardest to learn as well.

But a contradiction has been bugging me. Aren't CS people already supposed to be good at abstraction? Isn't abstraction supposed to be at the root of OS, networking, software engineering? Abstract Data Types (ADTs) are a staple of every in CS curriculum. So why do I (and every other formal methods/modeling person) see such a large skill gap in abstraction, and flag it as the core, make-or-break skill for modeling?

I think I finally get to the root of this cognitive disonance. There are two kinds of "abstraction" conflated under the same umbrella term.

  • Modularity abstraction: This is the traditional abstraction taught in CS curricula as ADTs, APIs, layered design, etc. It is all about encapsulation, drawing boundaries, and hiding internals.
  • Modeling abstraction: This is what I talk about when I talk about abstraction in the context of modeling. This is the same sense of abstraction mathematicians and physicists when building models for thinking and reasoning. The goal is to find the minimal and most elegant description that preserves the property you care about. It is all about cutting away everything orhtogonal to the essence of that property.

These two couldn't be further apart in terms of their goal! Let me try to explain in the next two sections.


Modularity abstraction hides. Modeling abstraction reduces.

Modularity abstraction is about interfaces that hide internals. Modeling abstraction is about behaviors, and about reducing a system to its minimal behavioral skeleton for the property you care.

Modularity abstraction encapsulates, draws a vertical boundary, and hides the layer below. Modeling abstraction is crosscutting: it slices the system along a behavioral plane and keeps only what is absolutely relevant to the property under investigation, and even then in the form of "what", not "how". This slice usually looks nothing like the system's organization.


Modularity abstraction hides concurrency. Modeling abstraction exposes it.

Modularity abstraction is all about sealing the leaks, hence Joel Spolsky's famous post lamenting that "all abstractions are leaky". [ Note that his list is all about modularity abstraction: TCP (hide IP), string libraries (hide character arrays), file systems (hide spinning disks), virtual memory / flat address space (hide MMU and paging), SQL (hide query plans), NFS / SMB (hide the network), C++ string classes (hide char*). ] Modularity abstraction aspires to hide the interleavings and present operations as if they were atomic. Its goal is to make the module easy to use, but in doing so it forgoes exposing concurrency or efficiency opportunities.

In stark contrast, the modeling abstraction is about identifying what should leak and leveraging it! It exposes the fine-grained actions and orderings, and proves that invariants hold despite the interleavings. The payoff for this work is to harvest the maximum safe concurrency from the system.


Examples of modeling abstraction 

There is an abundance of modeling abstraction in distributed systems field. It feels like almost all protocols are designed this way.

  • Lamport logical clocks: throw away wall-clock time, keep happens-before
  • Hybrid logical clocks: keep wall-clock and causality, throw away the rest
  • TrueTime: time as a bounded-uncertainty interval 
  • Consensus: agree on a single decision. The way Lamport designed Paxos is a masterclass in abstraction; from Consensus, Voting, to the final protocol. 
  • Linearizability (and really all consistency models ): throw away replication, caching, retries
  • Log is the database idea: throw away materialized state as the source of truth; keep only the ordered, append-only sequence of events.
  • MapReduce/Spark: throw away orchestration, parallelism, scheduling, and fault tolerance. Keep a DAG of deterministic transforms over partitioned data—and let the framework reconstitute the rest from that skeleton.
Sometimes it may look like the two definitions overlap (e.g., linearizability, consensus, log is the database, map-reduce). But this is actually a reuse rather than an overlap. A really well-designed artifact can serve simultaneously as a spec to refine against (modularity) and a skeleton to reason from (modeling). This just means the two coincided on one artifact, but the abstraction roles still remain distinct. 

Comments

Popular posts from this blog

Hints for Distributed Systems Design

The Agentic Self: Parallels Between AI and Self-Improvement

Learning about distributed systems: where to start?

Foundational distributed systems papers

Building a Database on S3

Cloudspecs: Cloud Hardware Evolution Through the Looking Glass

TLA+ mental models

Advice to the young

Analyzing Metastable Failures in Distributed Systems

My Time at MIT