Posts

Showing posts from October, 2018

Everything is broken

Last Wednesday, I attended one of the monthly meetings of the "Everything is Broken" meet up at Seattle. It turns out I selected a great meeting to attend, because both speakers, Charity Majors and Tammy Butow , were excellent. Here are some select quotes without context. Observability-driven development - Charity Majors Chaos engineering is testing code in production. "What if I told you: you could test both in and before production." Deploying code is not a binary switch; deploying code is a process of increasing your confidence in your code. "Microservices are hard!" as a caption for a figure comparing the LAMP stack 2005 versus the complexity of the Parse stack 2015. We are all distributed systems engineers and unknowns outnumber the knowns! Distributed systems have an infinite number of almost-impossible failures! Without observability you don't have chaos engineering, you have a chaos. Monitoring systems have not changed signi...

Debugging designs with TLA+

Image
This post talks about why you should model your systems and exhaustively test these models/designs with the TLA+ framework. In the first part, I will discuss why modeling your designs is important and beneficial, and in the second part I will explain why TLA+ is a very suitable framework for modeling, especially for distributed and concurrent systems. Modeling is important If you have worked on a large software system, you know that they are prone to corner cases , failed assumptions , race conditions , and cascading faults . There are many corner cases because there are many parameters, and these do interfere in unanticipated ways with each other. The corner cases violate your seemingly reasonable implicit assumptions about the system components and environment, e.g.,"1-hop is faster than 2-hops", "0-hop is faster than 1-hop", and "processes work with the same rate". There are abundant race conditions because today (with the rise of SOA, cloud, and ...

Popular posts from this blog

Hints for Distributed Systems Design

Learning about distributed systems: where to start?

Making database systems usable

Looming Liability Machines (LLMs)

Advice to the young

Foundational distributed systems papers

Distributed Transactions at Scale in Amazon DynamoDB

Linearizability: A Correctness Condition for Concurrent Objects

Understanding the Performance Implications of Storage-Disaggregated Databases

Designing Data Intensive Applications (DDIA) Book