Showing posts from October, 2018

Everything is broken

Last Wednesday, I attended one of the monthly meetings of the "Everything is Broken" meet up at Seattle. It turns out I selected a great meeting to attend, because both speakers, Charity Majors and Tammy Butow, were excellent.

Here are some select quotes without context.

Observability-driven development - Charity Majors
Chaos engineering is testing code in production. "What if I told you: you could test both in and before production."

Deploying code is not a binary switch; deploying code is a process of increasing your confidence in your code.

"Microservices are hard!" as a caption for a figure comparing the LAMP stack 2005 versus the complexity of the Parse stack 2015.

We are all distributed systems engineers and unknowns outnumber the knowns!
Distributed systems have an infinite number of almost-impossible failures!

Without observability you don't have chaos engineering, you have a chaos.

Monitoring systems have not changed significantly in 20 years…

Debugging designs with TLA+

This post talks about why you should model your systems and exhaustively test these models/designs with the TLA+ framework. In the first part, I will discuss why modeling your designs is important and beneficial, and in the second part I will explain why TLA+ is a very suitable framework for modeling, especially for distributed and concurrent systems.

Modeling is important If you have worked on a large software system, you know that they are prone to corner cases, failed assumptions, race conditions, and cascading faults.

There are many corner cases because there are many parameters, and these do interfere in unanticipated ways with each other. The corner cases violate your seemingly reasonable implicit assumptions about the system components and environment, e.g.,"1-hop is faster than 2-hops", "0-hop is faster than 1-hop", and "processes work with the same rate". There are abundant race conditions because today (with the rise of SOA, cloud, and microserv…

Popular posts from this blog

I have seen things

SOSP19 File Systems Unfit as Distributed Storage Backends: Lessons from 10 Years of Ceph Evolution

PigPaxos: Devouring the communication bottlenecks in distributed consensus

Learning about distributed systems: where to start?

My Distributed Systems Seminar's reading list for Fall 2020

Fine-Grained Replicated State Machines for a Cluster Storage System

My Distributed Systems Seminar's reading list for Spring 2020

Cross-chain Deals and Adversarial Commerce

Book review. Tiny Habits (2020)

Zoom Distributed Systems Reading Group