Posts

Showing posts from 2016

Selected blog posts from 2016

This is my 42nd post for the year. As is the custom in a year-end post, I mention some highlights among my 41 posts in 2016.

Machine Learning
Learning Machine Learning: A beginner's journey. I wrote this to quickly recap about what I found confusing and useful while I learned about machine learning. I had no idea this would blow up. This received the most pageviews: 45,000 and counting.TensorFlow: A system for large-scale machine learning. 

Facebook papers
Realtime Data Processing at Facebook.Measuring and Understanding Consistency at Facebook.Holistic Configuration Management at Facebook.

Fault-tolerance
Why Does the Cloud Stop Computing? Lessons from Hundreds of Service Outages. TaxDC: A Taxonomy of nondeterministic concurrency bugs in datacenter distributed systems.

Distributed Coordination
Modular Composition of Coordination Services. Make sure you read the comment at the end, where  one of the authors Kfir Lev-Ari provides answers the questions raised in my post. Implementing Lin…

Learning Machine Learning: A beginner's journey

Image
I have been learning about machine learning and deep learning (ML/DL) for the last year. I think ML/DL is here to stay. I don't think this is a fad or bubble! Here is why:

ML/DL has results. It is hard to argue against success.ML/DL has been on the rise organically since 1985 (with the backpropagation algorithms) and went through another phase of acceleration after 2005 (with the wide availability of big data and distributed data processing platforms). The rise of ML/DL is following a rising curve pattern, not the pattern for a hyped ephemeral bubble. Since it grew gradually over many years, I am betting it will be around for at least the same amount of time. ML/DL has been co-developed with applications. It has developed very much on the practice side with trial and error, and its theory is still lagging a bit behind and is unable explain many things. According to Nassim Taleb's heuristics ML/DL is antifragile.ML/DL has the market behind it. Big money provides big incentive an…

Book review: Intuition pumps and other tools for thinking

The title of this book grabbed my attention immediately. Intuition pumps is a very visual term, and who doesn't like to learn about tools for thinking. The premise of the book is given in the first quote:
"You can't do much carpentry with your bare hands and you can't do much thinking with your bare brain." -- Bo Dahlbom.
The book is by Philosopher Daniel Dennett. The book is surprisingly readable for a philosophy book, which are full of jargon and big words. Dennett took special care in writing in a simple an clean way. For this, he recruited help from undergraduate students in his university, Tufts. The book content was discussed at an undergraduate seminar Dennett offered, and he then got help from these students review the book. This is later revealed as one of Dennett's thinking tools: "Explain to nonexperts: use a decoy audience". I think that worked: the book is accessible to an undergraduate, but a motivated one.

The first chapter explained …

TLA+ project2 solution (2016)

Image
In a previous post, I had given the description of project2.Here is the TLA+ project2 solution made available on github. Below I will briefly explain the solution.

If you are not familiar with the chain-replication protocol, read this brief summary. 

The setup and variables
The first part is about the setup. We declare the process IDs for storage nodes (denoted as Nodes), the clients, and the configurator process.


The nodes maintain "db" each, where the item is updated and queried. For simplicity, our distributed key-value store maintains only a single item, and initialized to ver=-1, val=-1, cli=-1.

The nodes also have an auxiliary variable "up", which is used for modeling the status of the node. A node can go "down" at any time provided that less than FAILNUM nodes are currently down.

Each node and client has a message box, initially all message boxes are empty.

Finally, there is a global variable chain. Initially it is an empty sequence. As the configur…

Feedback from my distributed systems class

I am done with my distributed systems class this semester. At the end of the semester, I polled my students for feedback about the class. I was especially interested in hearing about their feedback on using TLA+ in the course and course projects, and using research paper reviewing for homework assignments. Of course, I didn't mention these when I solicited their feedback. I asked them to write a brief review for what they have learned in my distributed systems class this semester.

I received overwhelmingly positive feedback. Below I share some of the comments about TLA+ and the paper review assignments. (I removed the flattering feedback about my teaching, while I enjoyed reading them.  I put a lot of effort in my teaching and I am happy when the students recognize how much I care. That keeps me motivated.)

Feedback about using TLA+ I assigned two TLA+ projects. The first project was modeling Voldemort key-value store with client-side routing, and the second project was about exte…

Modeling a Replicated Storage System in TLA+, Project 2

Image
Two weeks ago, I had described phase 1 of the TLA+ project I assigned in my distributed systems class. I promised to give you the solution soon. I put the solution for phase1 on github.

In this post, I will describe the phase 2 of the project. In phase 2 of the project, we use Chain Replication to ensure persistence and consistency of the objects stored. Read my chain replication summary to refresh on the algorithm.


Before performing the write, the client reads from the tail of chain to learn the highest versioned write completed. The client then increments the version number, and performs the write to the head of chain with this version number. Since a node failure may lead to a request being lost (be it a read request, or update request), the client needs to repeat the request until a response is returned from the tail. The client cannot start a new request before it receives a reply to its earlier request.

A configurator process (an infallible process which would in practice be imp…

Emerging trends in big data software

Image
Mike Franklin, a famous expert on data science, had visited our department at University at Buffalo in May to talk about emerging trends in big data software. I had taken some notes during his talk, and decided to summarize and share them here.

Mike recently joined University of Chicago as the chair of Computer Science Department, and before that he was the head of UC Berkeley's AMPLab (Algorithms, Machines and People Laboratory), where he was involved with the Spark and Mesos projects which had wide academic and industrial impact. Naturally, the talk included a lot of discussion about AMPLab projects, and in particular Spark.

Mike described AMPLab as a federation of faculty that collaborate around an interesting emerging trend. AMPLab has in total 90 faculty and students. Half of the funding comes from government, and the other half from industrial sponsors. The industrial sponsors also provide constant feedback about what the lab works on and how it matters for them. As an AMPLab…

How does one get motivated for teaching?

A friend recently complained that, even after years in academia, he never got quite adjusted to teaching. He said he doesn't see much incentive in teaching and found it boring, and asked about how one can get motivated for teaching.

This is actually a good question and it is important to get in to the habit of questioning and checking oneself. Here is how I answer this question.

I don't need to get motivated for teaching. Teaching is a responsibility bestowed upon me as an academician. So I look at this from a responsibility perspective. I also have responsibilities as a parent, as a husband, or as a driver (if I am the designated driver), and for matters of responsibility, I don't need motivation. I know I should rise to the occasion.Teaching is my service to the students, who really need it. Today students have many alternatives. They can watch a lecture on YouTube, and can find a variety of study material on the Internet. But it is only I that stand in front of them in p…

My Distributed Systems Seminar's reading list for Spring 2017

Modeling a Replicated Storage System in TLA+, Project 1

Image
Why a TLA+ project? The first project assignment in my distributed systems class this semester was modeling a replicated storage system in TLA+. Assigning a TLA+ project makes me a rarity among distributed systems professors. A common project would be a MapReduce programming assignment or a project to implement a simple distributed service (such as a key-value store) in Java.

I think that a MapReduce deployment project does not teach much about distributed systems, because MapReduce is a very crude abstraction and hides all things distributed under the hood. Using MapReduce for the distributed systems class project would be like handing people a mechanics certification upon successful completion of a driving test.

Implementing a simple distributed service, on the other hand, would teach students that indeed programming and debugging distributed systems is very hard. However, I would suspect that much of the hardship in that project would be due to accidental complexities of the imple…

Book Review: Einstein (by Walter Isaacson)

I read this book recently and liked it a lot. It was a surprisingly engaging read. I thought I knew a lot about Einstein, and the book would be redundant and bland. But the book proved me wrong. I learned a lot of new things about Einstein, especially about his personality and his research perspective and style. The book also did a great job on explaining Einstein's theories, scientific achievements in an accessible and interesting manner.

Einstein's personality Let's get this out of the way. Einstein didn't fail math. He was always a very smart and hardworking student. In primary school, he was at the top of his class and "far above the school requirements" in math. And before he was 15 he had mastered differential and integral calculus.

What may have helped propagate the myth was Einstein's unruly and defiant personality. Einstein would do great at the things he likes, and not good at the things he doesn't like. He had excelled in his university cla…

How Complex Systems Fail

This is a 4 page report about the nature of failures in complex systems. It is a gloomy report. It says that complex systems are always ridden with faults, and will fail when some of these faults conspire and cluster. In other words, complex systems constantly dwell on the verge of failures/outages/accidents.

The writing of the report is peculiar. It is written as a list of 18 items (ooh, everyone loves lists). But the items are not independent. For example, it is hard to understand items 1 and 2, until you read item 3. Items 1 and 2 are in fact laying the foundations for item 3.

The report is written by an MD, and is primarily focused on healthcare related complex systems, but I think almost all of the points also apply for other complex systems, and in particular cloud computing systems. In two recent posts (Post1, Post2), I had covered papers that investigate failures in cloud computing systems, so I thought this report would be a nice complement to them.


1) Complex systems are int…

Modeling Paxos and Flexible Paxos in Pluscal and TLA+

Image
The first part of this post describes modeling Paxos in Pluscal. The second part shows how to modify that model to achieve a flexible quorum Paxos. The idea for flexible quorums was introduced in a paper in 2016 summer. This simple and surprising result says "majority agreement is not required in Paxos and the sets of acceptors required to participate in agreement (known asquorums) do not even need to intersect with each other".

Modeling Paxos in Pluscal While there are many examples of Paxos modeling in TLA+ available, I haven't found any Pluscal modeling of Paxos, except this one from Lamport, which helped me come up with my Pluscal model below. The problem with TLA+ is that it is too low-level (i.e., too declarative and math-like) for writing--and reading-- distributed algorithms. The PlusCal language provides a higher-level pseudocode, which is easier to follow.

However, as you go through my Pluscal model below, you will find that it doesn't follow your expectati…

Popular posts from this blog

I have seen things

SOSP19 File Systems Unfit as Distributed Storage Backends: Lessons from 10 Years of Ceph Evolution

PigPaxos: Devouring the communication bottlenecks in distributed consensus

Frugal computing

Learning about distributed systems: where to start?

Fine-Grained Replicated State Machines for a Cluster Storage System

My Distributed Systems Seminar's reading list for Spring 2020

Cross-chain Deals and Adversarial Commerce

Book review. Tiny Habits (2020)

Zoom Distributed Systems Reading Group