Showing posts from October, 2018

Everything is broken

Last Wednesday, I attended one of the monthly meetings of the "Everything is Broken" meet up at Seattle. It turns out I selected a great meeting to attend, because both speakers, Charity Majors and Tammy Butow , were excellent. Here are some select quotes without context. Observability-driven development - Charity Majors Chaos engineering is testing code in production. "What if I told you: you could test both in and before production." Deploying code is not a binary switch; deploying code is a process of increasing your confidence in your code. "Microservices are hard!" as a caption for a figure comparing the LAMP stack 2005 versus the complexity of the Parse stack 2015. We are all distributed systems engineers and unknowns outnumber the knowns! Distributed systems have an infinite number of almost-impossible failures! Without observability you don't have chaos engineering, you have a chaos. Monitoring systems have not changed signi

Debugging designs with TLA+

This post talks about why you should model your systems and exhaustively test these models/designs with the TLA+ framework. In the first part, I will discuss why modeling your designs is important and beneficial, and in the second part I will explain why TLA+ is a very suitable framework for modeling, especially for distributed and concurrent systems. Modeling is important If you have worked on a large software system, you know that they are prone to corner cases , failed assumptions , race conditions , and cascading faults . There are many corner cases because there are many parameters, and these do interfere in unanticipated ways with each other. The corner cases violate your seemingly reasonable implicit assumptions about the system components and environment, e.g.,"1-hop is faster than 2-hops", "0-hop is faster than 1-hop", and "processes work with the same rate". There are abundant race conditions because today (with the rise of SOA, cloud, and

Popular posts from this blog

Foundational distributed systems papers

Your attitude determines your success

My Distributed Systems Seminar's reading list for Fall 2020

I have seen things

Learning about distributed systems: where to start?

PigPaxos: Devouring the communication bottlenecks in distributed consensus

Read papers, Not too much, Mostly foundational ones

Sundial: Fault-tolerant Clock Synchronization for Datacenters

Facebook's software architecture

Paxos unpacked