Posts

Showing posts from November, 2017

What I learn from teaching

After I teach a class I am debilitated/incapacitated for about an hour. I can't do anything productive in that time. This is because teaching demands a lot of mental power (at least for me). I am sure, being an introvert doesn't help either; I need some recovery time after too much exposure. When I am lecturing though, I let go (at least after the first 5-10 minutes or so), and I don't think about myself or being introvert. After the initial warm up period, I get excited about what I am teaching, and all I can think of is the message I am trying to convey. This is actually good. When I find myself lost in the moment (or maybe that is getting into the teaching zone), I know I am doing good. I had written about this earlier on presenting: you should center on the message not on yourself. Oh wow, actually I wrote twice on that topic.

So, in that down hour after teaching, what do I do? I have regret-flashbacks about some parts of the lecture I delivered. I have remorse about h…

Paper summary. Blazes: Coordination analysis and placement for distributed programs

Image
This paper is by Peter Alvaro, Neil Conway, Joseph M. Hellerstein, and David Maier. It appears in October 2017 issue of ACM Transactions on Database Systems. A preliminary conference version appeared in ICDE 2014. 

This paper builds a theory of dataflow/stream-processing programs, which cover the Spark, Storm, Heron, Tensorflow, Naiad, TimelyDataflow work.

The paper introduces compositional operators of labels, and shows how to infer the coordination points in dataflow programs. When reading below, pay attention to the labeling section, the labels "CR, CW, OR, OW", and the section on reduction rules on labels.

To figure out these coordination points, the Blazes framework relies on annotations of the dataflow programs to be supplied as an input. This is demonstrated in terms of a Twitter Storm application and a Bloom application.

The paper has many gems.  It says at the conclusion section in passing that when designing systems, we should pay attention to coordination locality…

On dataflow systems, Naiad and Tensorflow

The below definition for dataflow programming is from Wikipedia (with some small edits):
"Traditionally, a program is modeled as a series of operations happening in a specific order; this may be referred to as sequential, procedural, or imperative programming. The program focuses on commands for programming, where data is normally /at rest/.

In contrast, dataflow programming emphasizes the movement of data; it models a program as a series of connections with explicitly defined inputs and outputs, and connect operations. An operation runs as soon as all of its inputs become available. Thus, dataflow languages are inherently parallel and can work well in large, decentralized systems."

Some examples of dataflow frameworks are map-reduce, Spark, Storm & Heron, GraphX, GraphLab, Naiad (now implemented in Rust as Timely-Dataflow), and Tensorflow.

Map-Reduce uses a very simple directed-acyclic-graph (DAG) with only two operations: map and reduce. Spark extends map-reduce to a co…

Two paper summaries on scheduling in heterogenous computing

Today, I have two short paper summaries from HotCloud'17 on heterogenous computing.

Heterogeneous GPU reallocationThis paper appeared in HOTCloud'17, and the authors are James Gleeson and Eyal de Lara, University of Toronto.

It looks like they have developed a GPU virtualization tool called Crane recently. "General purpose GPU (GPGPU) computing in virtualized environments leverages PCI passthrough to achieve GPU performance comparable to bare-metal execution. However, GPU passthrough prevents service administrators from performing virtual machine migration between physical hosts. Crane is a new technique for virtualizing OpenCL-based GPGPU computing that achieves within 5.25% of passthrough GPU performance while supporting VM migration. Crane interposes a virtualization-aware OpenCL library that makes it possible to reclaim and subsequently reassign physical GPUs to a VM without terminating the guest or its applications. Crane also enables continued GPU operation while th…

Paper summary: A Computational Model for TensorFlow

Image
This paper appeared in MAPL 17. It is written by Martin Abadi, Michael Isard, and Derek G. Murray at Google Brain. It is a 7-page paper, and the meat of the paper is in Section 3.

I am interested in the paper because it talks about TLA+ modeling of TensorFlow graphs and uses that for creating an operational semantics for TensorFlow programs. In other words, the paper provides a conceptual framework for understanding the behavior of TensorFlow models during training and inference.

As you recall, TensorFlow relies on dataflow graphs with mutable state. This paper describes a simple and elementary semantics for these dataflow graphs using TLA+. The semantics model does not aim to account for implementation choices: it defines what outputs may be produced, without saying exactly how. A framework of this kind does not just have theoretical/academical value; it can be useful to assess correctness of TensorFlow's dataflow graph (symbolic computation graph) rewriting optimizations such as…

The YouTube clips I show in my distributed systems class

I made it a tradition to start my CSE 4/586: Distributed Systems lectures with a short YouTube video. As the students trickle into their seats, I play a YouTube video related to the lecture of the day.

Video is an underutilized medium in teaching. I don't understand why more professors do not use them. They are visual and they wake the students up, and get them interested in the class. After the students are primed with the video clip about the context of the lecture, they listen the class more attentively. I get overwhelmingly positive feedback from the students over many years about this practice.

Here are some of the videos I show in my class.

This is the video I show in my first class when I introduce the definition of a distributed system. I tell them a murmuration of starlings can be considered a distributed system, because the definition fits: A collection of autonomous nodes communicating to address a problem collectively, with no shared memory, no common physical clock, an…

Paper summary. Dynet: The Dynamic Neural Network Toolkit

Image
The programming model that underlies several popular toolkits such as TensorFlow uses a static declaration approach: they separate declaration and execution of the network architecture.

Static declaration has a number of advantages. After the computation graph is defined, it can be optimized in a number of ways so that the subsequent repeated executions of computation can be performed as quickly as possible. This also simplifies distribution of computation across multiple devices, as in TensorFlow. But static declaration is inconvenient for the following:

variably sized inputsvariably structured inputsnontrivial inference algorithmsvariably structured outputs
Of course, it is possible to process variable sized inputs if the computation graphs can represent objects whose size is unspecified at declaration time. Flow control operations such as conditional execution and iteration can be added to the inventory of operations supported by the computation graph. For example, to run an RNN ove…

Spring 18 seminar, "blast from the past" edition

It is a bad sign if the references section of a paper fails to cite any recent papers. That means, either the authors are not aware of any recent work on the area, or the area is not of interest to other researchers.

But it is also a bad sign if the references section of a paper fails to cite any old papers. That means, the authors likely do not know enough about the fundamental/foundational work in the area.

There is a case to be made for working from the fundamentals, the first principles. Elon Musk made that case in his work, and showed that you can make transformative work, even in the commercial technology world, by working from the first principals.

Working from the first principles is also essential for research. It is not uncommon to get your best ideas when preparing for a class. Sometimes reviewing fundamental work in a topic, you notice a gap, some weird under-explained assumption, and go "huh, why is it that way". Or sometimes the students (or outsiders of the fi…

Book review: The Growth mindset

I had read this book a couple years ago . This was a fulfilling book. It is written by an eminent Stanford psychiatrist Prof. Carol Dweck. You can listen to her speak about the ideas in the book here.

The growth mindset is best understood with its antithesis: the fixed mindset. The fixed mindset says that your abilities and capacity are predetermined at birth. The growth mindset says they can be altered and improved.

So, why is this important? This sounds like just an issue of perspective. But a perspective change (a paradigm shift) is capable of changing a lot of things.

The book argues that fixed mindset thinking leads people to play defensive. The fixed mindset people are scared of failures and embarrassment as they would show to the world their capacity (limited and unchanged). So in order  not to fail and save face they don't attempt things.

In contrast, the growth mindset people (i.e., the people who embraces the growth mindset/perspective) love challenges. They are not scar…

Book review. The Undoing Project: A friendship that changed our minds

I have recently read this 2017 book about the collaboration between two psychologists, Amos Tversky and Daniel Kahneman. The book is by Michael Lewis, a truly great storyteller. From his pen, the academic collaboration story of two social scientists becomes a love story and a thriller. I wouldn't ever imagine social science and behavioral economics could be this exciting. So I plan to read more of Michael Lewis's books: Flash Boys, The Big Short, Moneyball, The Blind Side, and Liar's Poker.

Here are some short notes from the book.

Danny Kahneman had a very tough childhood. His family survived (except the father) through Nazi prosecution and World War 2, and were able immigrate to Israel in 1948. He was a gifted child and starred in academia, although through out his life he was always had doubts about his talents and was always unsure of himself.

Amos Tversky was born in Israel and served in the Israel army for many years. He got educated at US for graduate school in psychol…

Paper summary. Towards distributed machine learning in shared clusters: a dynamically partitioned approach

Image
This paper (by Peng Sun, Yonggang Wen, Ta Nguyen Binh Duong, and Shengen Yan) has been put on Arxiv on April 2017.


This paper was a little confusing to read. I think it could have been presented better to make its contributions more clear. The paper aims to enable multiple distributed ML frameworks, say TensorFlow, Petuum, MxNet, share the same cluster.



Enterprises have clusters, managed by  a cluster management systems (CMSs). The paper starts with a review of existing CMSs, and mentions shortcomings with each. It is unhappy with application-level scheduling, because there each application reserves and keeps all allocated resources until completion, and this leads to low utilization of the resources as the scheduling is done for the peak/maximum resource needs of the application.

In the task-level scheduling mode, applications use acquired resources to run a single task, release them as soon as the task completes, and petition for new resources to launch uncompleted tasks. The paper …

TLA+/PlusCal modeling of Synchronized Round Consensus Algorithm: Solution

Image
The other day I posed the synchronized round consensus question.

Here is the solution on GitHub, and some brief explanation of the relevant part below.

Single round consensus algorithm
The code above is pretty straightforward. The while loop between lines 36-42 models how a node sends its vote to other nodes one by one. The sender node can fail in this while loop after some of the nodes received the vote. So if we model check with FAILNUM=1, the agreement invariant is violated in this single round algorithm as seen in the error trace below.

The blue highlighted line, state 15, is the last line in the error trace, and the value of the state variables are listed in the window below. If you inspect "up" you can see node 1 is down. Checking "mb" you can see node 2 received node 1's vote, but node 3 did not receive node 1's node. As a result, the decision "d" for node 2 is "1", whereas node 3 decides "2", and both decisions are final…

Paper summary. SLAQ Quality-driven scheduling for distributed machine learning

Image
This paper (by Haoyu Zhang, Logan Stafman, Andrew Or,  and Michael J. Freedman) appeared  at SOCC'17.

When you assign a distributed machine learning (ML) application resources at the application level, those resources are allotted for many hours. However, loss improvements usually occur during the first part of the application execution, so it is very likely that the application is underutilizing the resources for the rest of the time. (Some ML jobs are retraining of an already trained DNN, or compacting of a DNN by removing unused parameters, etc., so blindly giving more resources at the beginning and pulling some back later may not work well.)

To avoid this, SLAQ allocates resources to ML applications at the task level, leveraging the iterative nature of ML training algorithms. Each iteration of the ML training algorithm submits tasks to the scheduler with running times around 10ms-100ms. This is how Spark based systems operate readily anyways. (The Dorm paper criticized this it…

TLA+/PlusCal modeling of Synchronized Round Consensus Algorithm

Image
In my distributed systems class for Fall 17, I assigned modeling of the synchronized round consensus algorithm as the first project. I have been assigning TLA+/PlusCal modeling projects in my class for the last 4 years and releasing the projects and their solutions. I believe this is useful for the distributed systems community, because at this point the barrier before wider adoption of TLA+ tools seems to be the lack of more TLA+ modeling examples of algorithms/systems. My goal is to provide a TLA+/PlusCal example for everything I teach in the class. This way the students will get a hands-on experience in algorithms design and dealing with the intrinsic complexities of distributed systems: concurrent execution, asymmetry of information, concurrency bugs, and a series of untimely failures.

Here is some previous discussion/context about why I started assigning TLA+/PlusCal modeling projects in distributed systems classes.

Timing of the project I think I timed this project well. In the f…

HPTS'17 day 2

On HPTS day 2, there were 4 presentation sessions (I mention 2 below) and an evening session on remembering Jim Gray and Ed Lassettre.

(Follow these links for HPTS'17 day 0 and day 1.)

Verification of Systems sesion The first talk was Jepsen VIII by Kyle Kingsbury, who breaks databases for a living. He gave a very fast paced talk. A good rule of thumb for presentations is to go with 2 minutes per slide. Kyle flips this rule upside down and then goes even further to present 5 slides per minute. He presented 150+ slides in less than 30 minutes, and somehow he made this work. I can't believe how quickly/smoothly he was able to transition from a slide to the next, and how he managed to memorize all those transitions.

Kyle's Jepsen toolkit tests databases as blackboxes using a client to submit overlapping operations, where the start/end of operations define the operation intervals. To prepare these tests, Kyle first carefully reads through the documentation to see which guarante…

Popular posts from this blog

I have seen things

SOSP19 File Systems Unfit as Distributed Storage Backends: Lessons from 10 Years of Ceph Evolution

PigPaxos: Devouring the communication bottlenecks in distributed consensus

Frugal computing

Learning about distributed systems: where to start?

Fine-Grained Replicated State Machines for a Cluster Storage System

My Distributed Systems Seminar's reading list for Spring 2020

Cross-chain Deals and Adversarial Commerce

Book review. Tiny Habits (2020)

Zoom Distributed Systems Reading Group