Posts

Showing posts from January, 2018

Paxos derived

Lamport's fault-intolerant state machine replication algorithmIn 1978, Lamport published his classic "Time, Clocks, and the Ordering of Events in a Distributed System". As an application of logical clocks, he presented a distributed replicated state machine algorithm (and then he instantiated that algorithm to solve mutual exclusion as an example). Lamport complains that no one seemed to be aware of the distributed replicated state machine algorithm introduced in the paper:
"This is my most often cited paper. Many computer scientists claim to have read it. But I have rarely encountered anyone who was aware that the paper said anything about state machines. People seem to think that it is about either the causality relation on events in a distributed system, or the distributed mutual exclusion problem. People have insisted that there is nothing about state machines in the paper. I’ve even had to go back and reread it to convince myself that I really did remember wha…

Modeling the DAO attack in PlusCal

Image
Maurice Herlihy's paper: "Blockchains from a distributed computing perspective" explains the DAO attack as follows:

"Figure 1 shows a fragment of a DAO-like contract, illustrating a function that allows an investor to withdraw funds. First, the function extracts the client's address (Line 2), then checks whether the client has enough funds to cover the withdrawal (Line 3). If so, the funds are sent to the client through an external function call (Line 4), and if the transfer is successful, the client’s balance is decremented (Line 5).  This code is fatally  flawed. In June 2016, someone exploited this function to steal about $50 million funds from the DAO. As noted, the expression in Line 3 is a call to a function in the client's contract. Figure 2 shows the client's code. The client's contract immediately calls withdraw() again (Line 4). This re-entrant call again tests whether the client has enough funds to cover the withdrawal (Line 3), and becaus…

Spring 18 Distributed Systems Seminar

Erasable pens for editing papers

Image
I recently discovered the Pilot Frixion pens and I like them a lot. (I am not getting any advertisement money from them I swear :-)

The pens have erasable ink, so they are great for marking your comments/edits on a paper while reading. They erase via heat. Each pen comes with a plastic nub, and if you apply friction to the page with the plastic nub at the top, and it erases the writing --mostly clean. A word of caution though, this means if you leave your writing in a hot car, you will find it erased, which you can remedy by putting it in a freezer. I am not kidding.

So, don't use it for writing you want to keep permanently, but it is great for writing comments and marking on a paper when you are reading.

I print the research paper I am reading and I do a lot of marking on paper. If I use a regular pen, I cross over some of my guesswork, nonsensical questions, or misinformed comments, and it messes up the paper. But using Frixion pens, I erase and modify my comments without creatin…

Remember peer-to-peer systems?

Image
Traditionally computer systems use client server model. This is more of a centralized approach; server sits there and responds to clients requests. If one server is not enough for computation/analysis, a "hierarchical" organization of servers model is adopted in datacenter and cloud computing. One node becomes the master, other nodes act as workers. This is called the master-worker model. This simple model make sense if you have an infrastructure. Centralized control architecture is simple, so you can keep the coordination simple and efficient.
Peer-to-peer model is on the other end of the spectrum: it calls for a fully decentralized system model. There is no distinguished master. Each node acts as both server and client, each node is a peer. This model does not require stable infrastructure and it can self-organize with what is presently available. As such, they are great for circumventing laws, bans, and censorship. In 2000s, peer-to-peer systems were all the craze. Peer-…

Paper summary. A Berkeley view of systems challenges for AI

Image
This position paper from Berkeley identifies an agenda for systems research in AI for the next 10 years. The paper also serves to publicize/showcase their research, and steer interest towards these directions, which is why you really write position papers.

The paper motivates the systems agenda by discussing how systems research/development played a crucial role in fueling AI’s recent success. It says that the remarkable progress in AI has been made possible by a "perfect storm" emerging over the past two decades, bringing together: (1) massive amounts of data, (2) scalable computer and software systems, and (3) the broad accessibility of these technologies.

The rest of the paper talks about the trends in AI and how those map to their systems research agenda for AI.

Trends and challenges The paper identifies 4 basic trends in the AI area:
Mission-critical AI: Design AI systems that learn continually by interacting with a dynamic environment in a timely, robust, and secure man…

The Lambda and the Kappa Architectures

This article, by Jimmy Lin, looks at the Lambda and Kappa architectures, and through them considers a larger question: Can one size fit all?

The answer, it concludes, is it depends on what year you ask! The pendulum swings between the apex of one tool to rule them all, and the other apex of multiple tools for maximum efficiency. Each apex has its drawbacks: One tool leaves efficiency on the table, multiple tools spawns integration problems.

In the RDBMS world, we already saw this play out. One size RDBMS fitted all, until it couldn't anymore. Stonebraker declared "one size does not fit all", and we have seen a split to dedicated OLTP and OLAP databases connected by extract-transform-load (ETL) pipelines. But these last couple years we are seeing a lot of one size fits all "Hybrid Transactional/Analytical Processing (HTAP)" solutions being introduced again.

Lambda and Kappa OK, back to telling the story from the Lambda and Kappa architectures perspective. What a…

Paper summary. TFX: A TensorFlow-Based Production-Scale Machine Learning Platform

Image
This paper from Google appeared at KDD 2017 Applied Data Science track. The paper discusses Google's quality assurance extensions to their machine learning (ML) platforms, called TensorFlow Extended (TFX). (Google is not very creative with names, they should take cue from Facebook.)

TFX supports continuous training and serving pipelines and integrates best practices to achieve production-level reliability and scalability. You can argue that the paper does not have a deep research component and a novel insight/idea. But you can argue the same thing for the checklist manifesto by Atul Gowande, which nevertheless does not decrease from its effectiveness, usefulness, and impact.

On the other hand, the paper could definitely have been written much succinctly. In fact, I found this blog post by Martin Zinkevich, the last author of the paper, much easier to follow than the paper. (Are we pushed to make papers artificially obfuscated to be publication-worthy?)  This blog post on serving s…

Why you should use modeling [with TLA+/PlusCal]

Image
I recently gave a two day seminar on "debugging your designs with TLA+/PlusCal" at Dell. So I wanted to write some of the motivation for modeling and debugging your models while this is still fresh in my mind.

You need modeling
No, not that kind of modeling! Actually the naming clash is not accidental after all: fashion designers need models to test/showcase their designs.

You need modeling because:

Failing to plan is planning to fail Everything is a distributed systemThe corner cases ... they are so manyDo it for the development processBeing smart does not scale
Failing to plan is planning to fail This is from the paper, "Use of formal methods at Amazon Web Services, 2014".
"Before launching any complex service, we need to reach extremely high confidence that the core of the system is correct. We have found that the standard verification techniques in industry (deep design reviews, code reviews, static code analysis, stress testing, fault-injection testing, et…

Salute to Prof. Mohamed Gouda: Elegance in computing

Image
A couple months ago, I attended a special half-day workshop organized honoring Prof. Mohamed Gouda's contributions to computer science, and particularly the self-stabilizing systems community.

Mohamed is the Mike A. Myers Centennial Professor at University of Texas at Austin. He has been at Austin Texas since 1980, for almost 40 years. His research contributions to the distributed systems has been phenomenal (borrowing a word Mohamed likes to use for things that excite him.) I am proud that Mohamed is my academic grandfather; he was the PhD advisor of my PhD advisor, Prof. Anish Arora. I wrote about "how to find your advisor" in my previous post, I hope elegance/rigor from Mohamed and Anish rubbed off on me a bit.

At the workshop, there were about 10 talks technical in nature, but at the end of the talks, each speaker mentioned how their research and career has been enriched by Mohamed's contributions/help.

I talked about my technical report on "Does the cloud n…

How to find your advisor

I had tweeted this earlier about "Rocking your PhD":

Find a hardworking advisorGet a senior PhD student mentorRead a lot of papers criticallyWrite a lot, get feedbackPublish your first paper early to build confidencePublish your 2nd, 3rd, 4th papersTo sustain, exercise regularly

It is that simple. This is actually a concise version of a longer advice I provided earlier.

Since I haven't talked about it before, I like to now write some suggestions on finding an advisor.

How to find your advisorAsk around and get advice from senior PhD students in the department about faculty as potential advisors.

In the first semester of your graduate studies, take 3-4 classes you are interested in. This provides a good opportunity to meet and impress your prospective advisor. If there is a class project, go overboard and exceed expectations. Try to improve on an algorithm mentioned in the class, and discuss this with the prospective advisor.

Before you commit with an advisor, make sure it…

Logical clocks and Vector clocks modeling in TLA+/PlusCal

Image
In a distributed system, there is no shared state, counter, or any other kind of global clock.  So we can not implicitly associate an event with its time, because one node may have a different clock than another. Time synchronization is not easy to achieve, and failures complicate things.

It turns out we care about time because of its utility in ordering of the events. Using this observation, in 1978, Leslie Lamport offered a time-free definition for "happens before": Event A happens before event B (denoted as A hb B) if and only if A can causally affect B.

In the context of distributed systems, A hb B iff
1. A and B are on the same node and A is earlier in computation than B
2. A is the send of a message and B is the receive event for that message
3. There is some third event C, which A hb C, and C hb B.

This also suggest the definition for "concurrent" relation. Events A and B are concurrent iff $\neg( A ~hb~ B) \land \neg( B ~hb~ A)$

To capture the hb relation, …

Mad questions

I am a very curious natured person. Every child starts asking lots of questions around 3-4, but according to my mom, I took that to another level constantly asking "but why?" and drove her crazy. On the other hand, I believe I owe my being curious to my mom. She was an elementary school teacher (a damn good one), and was instrumental in my development. She was (and still is) a very curious person, and she taught me how to ask more and better questions. For example, while traveling, she would notice different plants and would ask me why the landscape is different here? And we would make guesses.

The Turkish education system was not big on asking questions (these days it is waaaay waaaaay worse). Since the lazy path is to memorize and regurgitate answers, that is what it demanded from the students. But I think my questioning skills mostly survived. Among my friends, I was famous for replying questions with questions of my own, and if not, my answer was often "I don't …

Popular posts from this blog

I have seen things

SOSP19 File Systems Unfit as Distributed Storage Backends: Lessons from 10 Years of Ceph Evolution

PigPaxos: Devouring the communication bottlenecks in distributed consensus

Frugal computing

Fine-Grained Replicated State Machines for a Cluster Storage System

Learning about distributed systems: where to start?

My Distributed Systems Seminar's reading list for Spring 2020

Cross-chain Deals and Adversarial Commerce

Book review. Tiny Habits (2020)

Zoom Distributed Systems Reading Group