Showing posts from 2018

Two-phase commit and beyond

In this post, we model and explore the two-phase commit protocol using TLA+.

The two-phase commit protocol is practical and is used in many distributed systems today. Yet it is still brief enough that we can model it quickly, and learn a lot from modeling it. In fact, we see that through this example we get to illustrate a fundamental impossibility result in distributed systems directly.

The two-phase commit problem A transaction is performed over resource managers (RMs). All RMs must agree on whether the transaction is committed or aborted.

The transaction manager (TM) finalizes the transaction with a commit or abort decision. For the transaction to be committed, each participating RM must be prepared to commit it. Otherwise, the transaction must be aborted.

Some notes about modeling We perform this modeling in the shared memory model, rather than in message passing, to keep things simple. This also ensures that the model checking is fast. But we add nonatomicity to the "read fr…

Master your questioning skills

My new years resolution for 2018 was to "ask more/better/crazier questions."

To realize this resolution, I knew I needed to implement it as a system. So I committed to "ask at least a couple of MAD questions in each blog post." After close to 12 months and 70 posts, these are what I learned from this exercise.

1. Questioning is very useful Adding the MAD questions section significantly extended/complemented my posts, and with little effort. Just by adding this section, I went for another 15-20 minutes, forcing myself out of my default mode. Sometimes I grudgingly did it, just because I had committed to it, but almost always I was surprised by how useful it was.

Many times the MAD questions section got longer than the main post body. And on average it was about 25% of the main body. Questioning also improved the main content, and I had incorporated the findings/leads from some of the MAD questions in the main body of the post. I keep many half-baked posts in my blog…

Year in Review

How to fix scholarly peer review

All academics invariably suffer from an occasional bad and unfair review. We pour our sweat (and sometimes tears) over a paper for many months to watch it summarily and unfairly shot-down by a reviewer. Opening a decision email to read such a review feels so sickening that it hurts in the stomach. Even after 20 years, I am unable to desensitize myself to the pain of being shot-down by an unfair review. I suspect quite many people quit academia being frustrated over bad reviews.

Peer review is a complex and sensitive topic, and this post will inevitably fall short of capturing some important aspects of it. But I believe this topic is so important that it deserves more attention and conversation. Here I first write a bit about how to handle bad reviews. Then I outline some problems with our current peer review process and suggest some fixes to start a conversation on this.

The first rule of review club The first rule of review club is to always give the benefit of doubt to the reviewers…

HotNets'18: Networking in Space

HotNets'18 was held at Microsoft Research, Building 99. This is walking distance to my office at Cosmos DB, where I am working at my sabbatical. So I got tempted and crushed this workshop for a couple sessions. And oh my God, am I happy I did it. The space session was particularly very interesting, and definitely worth writing about.

My God, it is full of satellites! According to a 2018 estimate, there are 4,900 satellites in orbit, of which about 1,900 operational, while the rest have lived out their useful lives and become space debris. Approximately 500 operational satellites are in low-Earth orbit, 50 are in medium-Earth orbit (at 20,000 km), and the rest are in geostationary orbit (at 36,000 km).

The low earth orbit LEO satellites are not stationary and fast moving around the earth at 1.5 hour per rotation. We are talking about the lowest ring in this picture, where International Space Station (ISS) resides.

Since LEO satellites are close to Earth, this makes their communicat…

My Emacs journey

This is a follow up to my "Master your tools" post. As an example of one of my tools, I talk about my Emacs use.

I have been using Emacs for the last 20 years. At this point, I don't even know Emacs, my fingers do. If you ask me the shortcut for something, I will need to let my fingers do it and try to observe what they are doing. And sometimes ---as in the story of the caterpillar who forgets how to walk when asked to demonstrate it--- I forget about how to do something when I try to attempt it consciously.
Four stages of competence:

4. Unconscious Competence (Right Intuition)
3. Conscious Competence (Right Analysis)
2. Conscious Incompetence (Wrong Analysis)
1. Unconscious Incompetence (Wrong Intuition) — Colby Serpađź’ˇ (@ColbySerpa) November 6, 2018

From text-editing to text-wrangling I have been learning Emacs at a glacial pace, but I think that worked for me better. I figured I can internalize so much at a time, so I d…

Is Twitter causally-consistent?

For the past 7-8 years, several research papers have used this example to motivate causal consistency. You must have read this example, right?

Alice removes her boss from her friend list, and posts that on her feed that she is looking for a new job. Tom removes his mom from the friend list, and posts his Spring Break photos.
Well, being the empirical researchers we are, Aleksey and I wanted to put Twitter in to test for this scenario.

On September 25, we performed this test. (I also have a video recording of this. But since I can't stand to hear myself talk in recordings, I am not posting it. I sound really weird, man.)

I first blocked Aleksey on my Twitter account, and then tweeted that Aleksey drinks a lot of tea (it's true). When we checked Aleksey's timeline, we saw that his timeline indeed did not display my tweet.

@alekseycharapko drinks way too much tea. — Murat Demirbas (@muratdemirbas) September 25, 2018 So, this was kind of an anticlimax. Twitter passed the causal…

How to be a good machine learning product manager

There are a lot of interesting meetups at Seattle, and I try to attend one every couple weeks. Ruben Lozano Aguilera was the speaker for this meetup on Oct 17. Ruben is a product manager at Google Cloud, and before that he was a product manager at Amazon.

What is ML? Programming transforms data + rules into answers.

Machine learning turns data + answers into rules.

When should you use ML? Use ML if the problem:

handles complex logicscales up really fastrequires specialized personalizationadapts in real-time
For example ML is a good fit for the "search" problem.  Search requires complex logic, for which it is not easy to develop rules.  It scales up really fast in terms of new keywords, combinations and content. It requires personalization depending on the context, and has some real-time adaptation component as well.

Another important point is that the problem should have existing examples of actual answers. When you bootstrap from a good enough dataset, you can scale further,…

SDPaxos: Building efficient semi-decentralized geo-replicated state machines

In the last decade, the Paxos protocol family grew with the addition of new categories.

Rotating leader: MenciusLeaderless: EPaxos, Fast PaxosPaxos federations: Spanner, vertical Paxos Dynamic key-leader: WPaxos
This paper, which appeared in SOCC 18, proposes SDPaxos which prescribes separating the control plane (single leader) from the replication plane (multiple leaders). SD in SDPaxos stands for "semi-decentralized".

The motivation for this stems from the following observation. Single leader Paxos approach has a centralized leader and runs into performance bottleneck problems. On the other hand, the leaderless (or opportunistic multileader) approach is fully decentralized but suffers from the conflicting command problems. Taking a hybrid approach to capture the best of both worlds, SDPaxos makes the command-leaders to be decentralized (the closest replica can lead the command), but the ordering-leader (i.e., the sequencer) is still centralized/unique in the system.

Below I …

An unexpected phone call at the elevator

Sometime ago, I was visiting an apartment block. This was a relatively new apartment block, I think less than 5 years old.

I entered the elevator, and I heard "Hello, hello... Are you there?" booming from the speakers. This never happened to me before of course; elevators don't talk to me often. I thought maybe someone had pressed the call for help button, and left the elevator, and I can help sort this out.

I said, "Hi, yes!"

The voice continued saying:
-"The reason I am calling you is because of your last energy bill".

After a second of cognitive dissonance, I started laughing:
-"Umm, I don't know what happened, but I am in an elevator now, and your voice is literally streaming from the elevator."

The voice stopped for a brief couple of seconds, then continued explaining something about the bill, following the script he was given.

-"Dude, this is my floor, and I got to go now. Ok, bye."

It turns out most elevator phones ---wh…

Popular posts from this blog

I have seen things

SOSP19 File Systems Unfit as Distributed Storage Backends: Lessons from 10 Years of Ceph Evolution

PigPaxos: Devouring the communication bottlenecks in distributed consensus

Frugal computing

Fine-Grained Replicated State Machines for a Cluster Storage System

Learning about distributed systems: where to start?

My Distributed Systems Seminar's reading list for Spring 2020

Cross-chain Deals and Adversarial Commerce

Book review. Tiny Habits (2020)

Zoom Distributed Systems Reading Group