BugBash'26: Day 2

Ok, finally getting sometime to put my butt down to write about day 2 of BugBash.


Why do so few buildings fall down?

Brian Potter, Senior Infrastructure Fellow @ Institute for Progress, Author of Construction Physics  newsletter.

Buildings rarely collapse. The rate of major structural failing is  between 1/100K to 1/ 1 million. (This is how I know this is a serious statistic: it is an interval.) 

Why don't more buildings fall down?

There are some technical reasons to it: buildings are simple stuctures with no (or little) moving parts. Buildings exhibit a limited number of behavior when you load their structure: stress, deflection, vibration, creep, etc. And these behaviors are commensurate to the  proportion of the force you put in. Finally, buildings are designed for 2X-3X of expected load.

Let's go deeper into structural elements. We have good theories for how structural elements behave, and individual components are tested extensively and are standardized. A building is exposed to a bounded load by default, and the maximum design forces (earthquakes, hurricanes) rarely occur. And many buildings are stitched together with alternate load paths, providing redundancy if a major element fails.

There are also cultural reasons to why buildings don't fall down. We have building codes in place: international building code, residential code, fire code, mechanical code, and plumbing code. These  codes enforce and improve best practices over time through reactive updates. There is a saying in civil engineering: building codes are written in blood. A recent example is Boston's big dig, which changed the code on anchoring blocks to the ceiling of the tunnel.

Other cultural reasons are builders are required to be licensed in many countries, and the profession's has strong risk aversion and conservatism culture. Civil engineers are really conservative people. They don't attempt building flying spinning restaurants, for example. They  still like to rely on hand calculations as backup. (There is a lesson here for AI era.) Brian, himself, did work for 5 years designing just parking garages, and then 10 years designing just apartment buildings.

When these things stop being true, building collapses become more common. For example, when we had less knowledge of building techniques, failures were more common. There are studies that tie the rate of failure of bridges to the lack of engineering knowledge, and showing 10X improvement when the knowledge increased.

Leaving large safety margins are also a big part of this.  For building types, that have little margin of safety (e.g., offshore platforms since they need to be submersible), the rate of failure increases. For buildings with unusual or  out of sample typologies, failure risk increases. A famous example is the Citicorp Center Tower. It had unusual structure, and they made a small change to it, which they later determined to be a wind hazaard, design modification undone before storm

Another famous example is the Tacoma Narrows Bridge, which collapsed due to extreme wind induced oscillations, and became an important lesson. 


Gary Marcus fireside chat

Gary is a Prof. Emeritus of Psychology and Neural Science  at NYU. Will Wilson interviewed him for this fireside chat.

Gary says, AI researchers don't want him on fireside chats, because he called bullshit on AI. He says that AI does do something, but does it badly. You cannot pour more money, and expect to achieve AGI. We might get to AGI by other means, but not through LLMs. Neurosymbolic AI may be the way to get there. When pressed on LLMs recently crossing a threshold, and taking over a lot of programming tasks, Gary ties this to neurosymbolic AI. He argues Claude code is not a pure LLM, you cannot scale pure LLMs, so they are using a lot of harness and tools, which apparently counts as neurosymbolic stuff. At a later point in the conversation, Gary said claude code is a bad attempt at symbolic AI: if-else statements, regular expressions, etc. It may only count as neurosymbolic hybrid. Gary gave some Richard Stallman and GNU Linux vibes by trying to rename the advances in AI as neurosymbolic AI.

Most of successful AI is narrow AI: deepblue, jeopardy computer versus broad AI, which is AGI. For AGI, multidimensional intelligence is needed. Is memorization smart? Is being useful for technical tasks smart? Calculators are smart under that definition as well. The definition should be cognitive, just being economically useful doesn't cut it for AGI definition.

Neural networks are parallel statistical computation. In 1967, there was a lot of excitement around NN that it will change the world. There was no proof on converging of a NN with backpropagation to a specified outcome, but it was still a useful system. In 2001, Gary wrote a book on why you are not gonna get there without techniques from symbolic AI. Gary said Geoff Hinton ridiculed him, and he said he would be happy to debate me, but he lied. He said he is open to the debate, but Hinton was very hostile to having any symbol manipulation, and kept saying they just need scaling.

What happened in the last years is LLMs using harnesses, as they are like bulls in a china shop. Will pressed back, asking why does it matter, the type of LLM scales well enough to get a lot of use out of it already, and Gary replied that we are still looking at one tiny corner, and  if you care about science, you wanna know what matters and what not.

Gary also said that winner takes all bullcase for investment is wrong. First of all, it looks like there won't be a single one winner capture everything. Secondly, the government nationalizing is a risk for investment. Finally, the model could get stolen. Gary argued that studies show  people don't get ROI on AI investment.

Gary also predicted there won't be AGI until 2027 or 2028. Phew! He said that AGI would able to watch a new movie and understand it, but the current AI systems would not be able to do it for a new movie outside their training set, for example, "One Battle After Another", and understand the Sean Penn character. I don't know man... Maybe somebody should give this a try. Similarly, he said, current AI tech won't be able to read a new novel and understand. Again, I think this is underestimating the current LLMs. Gary made a bet on this on SubStack back in 2024, and listed 10 things that won't happen by 2027.

Gary says his predictions come from cognitive analysis, and he doesn't see a clue that there is "world building" in the models.  It is only image space over time, and no world building. They are not able to do reasoning, and  the conceptual/algorithmic breakthrough may not come soon, and we would need 5-10 breakthroughs. World models, requires being neurosymbolic, but being neurosymbolic doesn't give you world models, you need ontologies.

There is also the question of, what if the barking dog finally catches up to the car.  Let's suppose AGI get created, what happens to society? Gary said we should endow it with human values. If AGI is able to be jailbroken like LLMs, we are in for a bad time.


Building confidence in an always-in-motion distributed streaming system

Frank McSherry is famous for originating timely-dataflow/differential-dataflow work and bringing SQL view materialization to market at Materialize with 1PB deployed capacity. He is also famous for his "scalability but at what cost" work.

Frank said he and his company gets 10-100x benefit from AI, sorry Gary, Neurosymbolic AI. But building confidence is a process, and the talk provides his opinions about building confidence about using AI for coding. 

Systems that work, work for a reason: Frank said he is a theory person, and theory people in CS (unlike that in physics) made the practice possible. He says there is one reason timely-dataflow/differential-dataflow works: virtual time (Jefferson'85)! This was originally suggested for discrete event based simulations, and as a second use case for concurrency control. Materarialized collections are built on (time, diff, data) abstraction. The changelog provides a specific collection at each time. Operations transform changelogs, while preserving the virtual time. The operations compose, and this means SQL plans compose. This way Materialize pre-resolves nondeterminism at the boundary. It removes logical contention from the critical path.

Abstraction is a superpower: Frank said, when he started as a software systems builder, he thought his job was to be smart and clever, but he eventually concluded that his job is to provide effective abstractions. The job is to manage, delete, and package complexity (which no one wants). Virtual time is a great abstraction for Materialize. It is hard to misuse components that respect virtual time. Composability of virtual time made all the difference, and removed the logical contention, clearing the deck.

Use it or lose it: Dogfood your own work. Benchmark and communicate its value.  Confidence is something you provide to others


Lightning Talks

Borrowing FoundationDB's simulator for layer development

Pierre Zemb of Clever Cloud talked about his journey from HBase operational nightmares (network splits, manual repairs with hbck) to building Rust-based layers on FoundationDB. He was motivated by FDB's deterministic simulator which abstracts every fallible interaction (network, disk, time, randomness) behind swappable interfaces. He and his team figured out how to inject their own Rust code into FDB's simulated cluster, initially just to verify transactional consistency but eventually testing increasingly rich workloads. This surfaced bugs everywhere, and make them switch to a simulation-first development.

I didn't take notes for these two lightning talks, so I just mention them by title.

Symbolic execution for invariant discovery (not just bug finding); Anish Agarwal, Head of Product @ Olympix 

Fuzzamoto: Full system fuzzing for Bitcoin nodes; Niklas Gögge, Security Engineer @ Brink


Lightning Talks

CUDA over TCP: reverse engineering the CUDA API; Shivansh Vij, CEO @ Loophole Labs

Verifying Cedar Policy’s correctness with PBT & differential response testing; Lucas Käldström, Staff Engineer @ Upbound

Keeping up with code being written 24/7; Josh Ip, Founder & CEO @ Ranger

Hacking kiosks


Behaviors as the backbone of software correctness

Gabriela Moreira, CEO of Quint, talked about her path from Informal Systems and the blockchain space into building Quint as a friendlier alternative to TLA+. She said loves TLA+ and its core abstraction of modeling systems as states and transitions, but only about 10% of her colleagues ever adopted it, citing the syntax as the sticking point. (I would like to propose a rule that, anybody who can read and write Rust syntax don't get to complain about TLA+ syntax. You literally need to learn 5-10 keywords, and that's it.)

Gabriela said that Quint keeps the underlying power but offers a different syntax plus type checking, and "quint run" performs random simulation of the state space (something TLA+ also supports), which tends to be faster than full model checking. The big question here is how you know you're done. In the rest of the talk, Gabriela explained that this confidence should come from understanding, sanity checks, testing with failures, and witnesses such as vacuity checks and traces toward a property. She said reproducible examples are central: tests written as `init.then(...).then(...).expect(...)` chains also serve as documentation you can actually trust. Behaviors should become the backbone across the whole software development life cycle, enabling model-based testing, trace validation, and hybrid approaches. The AI angle gives this fresh urgency of course. Spec-driven development and model-based testing becomes both more necessary and considerably easier with AI in the loop.


Steel, Rust, and Truth

Steve Klabnik, co-author of The Rust Programming Language, opened by talking about Pittsburgh. He then talked about his grandfather Keith and his father, a tool-and-die man who literally checked tools for rust. Klabnik counted himself lucky that his own passion turned out to be economically viable. After 40 minutes of this, he posed the question hanging over the room: in this moment, in 2026, which one are you: the grandfather who rode out the change, the father who didn't make it out, or Pittsburgh itself, which had to become something else entirely? This was a talk about feelings, which is awkward territory for software people who haven't been big about this topic, but who couldn't do much to dodge this existential thinking the last year or so.

Steve traced the history of correctness: Descartes stripping away every uncertain thing, Leibniz with his calculemus, Hilbert trying to formalize mathematics, and Gödel arriving to say sorry bro. The thread continues into our field: Hoare triples, Milner and type theory, O'Hearn and separation logic. These all aimed at the question of whether we can prove programs correct. The pragmatic answer for decades has been "it ran, is that enough?" 

Derrida in 1967 said meaning is never fully present in the sign. We never really knew reality anyway, but we'd settled on "good enough" because we wrote the code, understood it, and tried it. AI broke all three of those at once.

Steve said that the formal methods community has been taking correctness seriously for sixty years with contracts, specs, invariants, refinements, types, proofs. He argued that this crowd already knows how to wield the tools every programmer is now going to need. Harkening back on Derrida, you can't fully understand reality, but you can try to improve your grasp of it. And this craft suddenly matters to everyone.

Steve is a great speaker. The talk almost felt like stand up at times, and philosophy class at other times. He also spoke some uneasy truths to the software engineering crowd. We had been telling ourselves we were making the world a better place while "disrupting" other industries, and now the disruption has finally come to us.

Comments

Popular posts from this blog

Hints for Distributed Systems Design

The Agentic Self: Parallels Between AI and Self-Improvement

Learning about distributed systems: where to start?

Foundational distributed systems papers

Building a Database on S3

Cloudspecs: Cloud Hardware Evolution Through the Looking Glass

TLA+ mental models

Advice to the young

Analyzing Metastable Failures in Distributed Systems

My Time at MIT