BugBash'26 Morning of Day 1

Continuing with notes from the BugBash talks. Yes, all of this goodness, including Will Wilson's keynote was before lunch the first day.


Where all the ladders start

Peter Alvaro, Associate Professor of Computer Science @ UC Santa Cruz

In this talk, Peter reflects back on his 20 years of distributing systems work. The cover image is Don Quixote (which is Peter)  attacking the windmill (robust distributed systems) with a spear (which is some singular solution often borrowed from databases).

The first attack was through the use of arcane algebras. This is a purist approach of getting it right the first time. This was during Peter's PhD at UC Berkeley, where Neil Conway was also a peer and collaborator.

In his own admission, this was incited by a naive framing around what makes distributed systems difficult? The target was uncertainty regarding order and timing, which cause distributed consistency problems, and require  coordination. But distributed coordination comes at the cost of latency, performance variability, decreased availability. Peter said, they were influenced by James Hamilton (LADIS'08 talk), and Pat Helland's work around avoiding coordination. 

Peter then talked about two similarly sounding problems, deadlock detection versus garbage collection, in a partitioned replicated distributed system. While these have similar formulations around strongly connected components, the first one is a monotonic problem, and the second not! When new information arrives, the first is positive, and second is the negative direction-seeking.

They had formulated the CALM theorem to capture how monotonicity composes to properties on programs. This  sounds grand and very promising, but not even close to reaching the target, solving the robust distributed systems problem.

The first problem is that disorder is not the whole story! The partial failure problem, rather than just binary failures, throws a wrench in the works. The second problem is that nobody wants to buy declarative languages.

The next attack is through lineage driven fault injection. This goes with the premise that fault-tolerance is just through redundancy. This was a practical and useful approach. It found real bugs in real systems at Netflix, Ebay, and Uber. But this also suffered from relying on sufficient observability and whitebox instrumentation/testing. A second limitation was that redundancy is not always a good thing as it comes with costs, both performance and monetary. 

The other attack was through simulation to address metastability problems. This was joint work with Rupak and Rebecca at AWS. Peter talked about CTMCs, attractors, and how these relate to metastability, the bad place, and the recovery to good place. He  said this is beautiful work, but it does not work! The limitation is that the simulation needs to be very finely calibrated, and this can be very hard.

Recently, he has been working on yet another attack,  Descartes: deterministic testing with  Rupak Majumdar. The idea here is to feel out safety margins, and do stability testing.

Is this Sisyphus at work? No, this doesn't feel like punishment. This is fun. And we are making progress with each attack. Yes there is no silver bullet, but that won't stop us for prodding and searching for unifying approaches. 

Peter was also good at making people laugh with inside jokes about "postmortem party", and mentioning some unhelpful misguided suggestions offered at postmortems. We have a long way to go as a discipline.


From dams to data: how to think about infrastructure

Deb Chachra, Professor of Engineering @ Olin College

Here are some out-of-context quotes to summarize the talk. Piecing them together is an exercise left to the reader.

"Technology is the active human interface with the material world." "Infrastructure is relational, it is based on relationships." "Infrastructure has a trajectory." "The gifts of nature are for the public." "Niagara Falls: 1903 triscuit, baked by electricity, Sir Henry Pellet the financier." "Robert R. Moses, boo! Utilitarianist ethics." "We are going to build a system: a small number of people will be harmed, but a lot of people will benefit." "Unfortunately, we are spectacularly bad at that decision making." "Renewable energy is incredibly abundant: Earth is not a closed system for energy." "Renewables are inherently decentralized and distributed." "Infrastructure becomes a political right." "If you can't read in a world where 80% cannot, you are probably fine. But if you can't read in a world where 80% can... you are not fine." 

Yes, this was not a software talk. Yes, there are many technology and AI parallels here, but the speaker did not go into these.


Lightning Talks 

What 20 years of kernel bugs taught us about finding the next one

Jenny Qu, AI Researcher @ Pebblebed

30% of kernel bugs hide 5+ years. These share the same pattern: control and data paths implicit. There is a shared channel, assumed sequencing, no enforcement. In other words, mostly concurrency bugs, which are hard to reproduce. LLMs are going to make this a lot worse! (Maybe the understatement of the year.) Every data channel is also a channel that can reach to control path. There is no control path anymore. What does Jepsen look like for LLM generated systems?


Formal verification in the web dev workflow

Fernanda Graciolli, Co-Founder @ Midspiral

This talk was about proofs. Thanks to AI, we can do it, and because of AI, we must do it! Claude can write a dafny proof, now what? She talked about Dafny to React components workflow. Pure logic is proved before compiling. Logic lives in specs, iterate on specs, not code. spec.yaml: structured spec


Old Tom Bombadil is a merry fuzzer!

Oskar Wickström, Senior Software Engineer @ Antithesis

This talk described property-based testing for web apps using typescript/javascript. It described the Bombadil project from Antithesis, and demoed it.

Comments

Popular posts from this blog

Hints for Distributed Systems Design

The F word

The Agentic Self: Parallels Between AI and Self-Improvement

Learning about distributed systems: where to start?

Foundational distributed systems papers

Are We Becoming Architects or Butlers to LLMs?

Building a Database on S3

Advice to the young

Cloudspecs: Cloud Hardware Evolution Through the Looking Glass

End of Productivity Theater