Showing posts from October, 2018

Everything is broken

Last Wednesday, I attended one of the monthly meetings of the "Everything is Broken" meet up at Seattle. It turns out I selected a great meeting to attend, because both speakers, Charity Majors and Tammy Butow , were excellent. Here are some select quotes without context. Observability-driven development - Charity Majors Chaos engineering is testing code in production. "What if I told you: you could test both in and before production." Deploying code is not a binary switch; deploying code is a process of increasing your confidence in your code. "Microservices are hard!" as a caption for a figure comparing the LAMP stack 2005 versus the complexity of the Parse stack 2015. We are all distributed systems engineers and unknowns outnumber the knowns! Distributed systems have an infinite number of almost-impossible failures! Without observability you don't have chaos engineering, you have a chaos. Monitoring systems have not changed signi

Debugging designs with TLA+

This post talks about why you should model your systems and exhaustively test these models/designs with the TLA+ framework. In the first part, I will discuss why modeling your designs is important and beneficial, and in the second part I will explain why TLA+ is a very suitable framework for modeling, especially for distributed and concurrent systems. Modeling is important If you have worked on a large software system, you know that they are prone to corner cases , failed assumptions , race conditions , and cascading faults . There are many corner cases because there are many parameters, and these do interfere in unanticipated ways with each other. The corner cases violate your seemingly reasonable implicit assumptions about the system components and environment, e.g.,"1-hop is faster than 2-hops", "0-hop is faster than 1-hop", and "processes work with the same rate". There are abundant race conditions because today (with the rise of SOA, cloud, and

Popular posts from this blog

Learning a technical subject

Foundational distributed systems papers

Learning about distributed systems: where to start?

Strict-serializability, but at what cost, for what purpose?

CockroachDB: The Resilient Geo-Distributed SQL Database

Amazon Aurora: Design Considerations + On Avoiding Distributed Consensus for I/Os, Commits, and Membership Changes

Anna: A Key-Value Store For Any Scale

Warp: Lightweight Multi-Key Transactions for Key-Value Stores

The Seattle Report on Database Research (2022)

Graviton2 and Graviton3