Showing posts from 2016

Selected blog posts from 2016

This is my 42nd post for the year. As is the custom in a year-end post, I mention some highlights among my 41 posts in 2016. Machine Learning Learning Machine Learning: A beginner's journey. I wrote this to quickly recap about what I found confusing and useful while I learned about machine learning. I had no idea this would blow up. This received the most pageviews: 45,000 and counting. TensorFlow: A system for large-scale machine learning.  Facebook papers Realtime Data Processing at Facebook.   Measuring and Understanding Consistency at Facebook.   Holistic Configuration Management at Facebook.   Fault-tolerance Why Does the Cloud Stop Computing? Lessons from Hundreds of Service Outages.  TaxDC: A Taxonomy of nondeterministic concurrency bugs in datacenter distributed systems.   Distributed Coordination Modular Composition of Coordination Services. Make sure you read the comment at the end, where  one of the authors Kfir Lev-Ari provides answer

Learning Machine Learning: A beginner's journey

I have been learning about machine learning and deep learning (ML/DL) for the last year. I think ML/DL is here to stay. I don't think this is a fad or bubble! Here is why: ML/DL has results. It is hard to argue against success. ML/DL has been on the rise organically since 1985 (with the backpropagation algorithms) and went through another phase of acceleration after 2005 ( with the wide availability of big data and distributed data processing platforms ). The rise of ML/DL is following a rising curve pattern, not the pattern for a hyped ephemeral bubble. Since it grew gradually over many years, I am betting it will be around for at least the same amount of time.  ML/DL has been co-developed with applications. It has developed very much on the practice side with trial and error, and its theory is still lagging a bit behind and is unable explain many things. According to Nassim Taleb's heuristics ML/DL is antifragile. ML/DL has the market behind it. Big money provides big i

Book review: Intuition pumps and other tools for thinking

The title of this book grabbed my attention immediately. Intuition pumps is a very visual term, and who doesn't like to learn about tools for thinking. The premise of the book is given in the first quote: "You can't do much carpentry with your bare hands and you can't do much thinking with your bare brain." -- Bo Dahlbom. The book is by Philosopher Daniel Dennett. The book is surprisingly readable for a philosophy book, which are full of jargon and big words. Dennett took special care in writing in a simple an clean way. For this, he recruited help from undergraduate students in his university, Tufts. The book content was discussed at an undergraduate seminar Dennett offered, and he then got help from these students review the book. This is later revealed as one of Dennett's thinking tools: "Explain to nonexperts: use a decoy audience". I think that worked: the book is accessible to an undergraduate, but a motivated one. The first chapter expla

TLA+ project2 solution (2016)

In a previous post, I had given the description of project2. Here is the TLA+ project2 solution made available on github. Below I will briefly explain the solution. If you are not familiar with the chain-replication protocol, read this brief summary.  The setup and variables The first part is about the setup. We declare the process IDs for storage nodes (denoted as Nodes), the clients, and the configurator process. The nodes maintain "db" each, where the item is updated and queried. For simplicity, our distributed key-value store maintains only a single item, and initialized to ver=-1, val=-1, cli=-1. The nodes also have an auxiliary variable "up", which is used for modeling the status of the node. A node can go "down" at any time provided that less than FAILNUM nodes are currently down. Each node and client has a message box, initially all message boxes are empty. Finally, there is a global variable chain. Initially it is an empty se

Feedback from my distributed systems class

I am done with my distributed systems class this semester . At the end of the semester, I polled my students for feedback about the class. I was especially interested in hearing about their feedback on using TLA+ in the course and course projects, and using research paper reviewing for homework assignments. Of course, I didn't mention these when I solicited their feedback. I asked them to write a brief review for what they have learned in my distributed systems class this semester. I received overwhelmingly positive feedback. Below I share some of the comments about TLA+ and the paper review assignments. (I removed the flattering feedback about my teaching, while I enjoyed reading them.  I put a lot of effort in my teaching and I am happy when the students recognize how much I care.  That keeps me motivated. ) Feedback about using TLA+ I assigned two TLA+ projects. The first project was modeling Voldemort key-value store with client-side routing , and the second project was a

Modeling a Replicated Storage System in TLA+, Project 2

Two weeks ago, I had described phase 1 of the TLA+ project I assigned in my distributed systems class. I promised to give you the solution soon. I put the solution for phase1 on github. In this post, I will describe the phase 2 of the project. In phase 2 of the project, we use Chain Replication to ensure persistence and consistency of the objects stored. Read my chain replication summary to refresh on the algorithm. Before performing the write, the client reads from the tail of chain to learn the highest versioned write completed. The client then increments the version number, and performs the write to the head  of chain with this version number. Since a node failure may lead to a request being lost (be it a read request, or update request), the client needs to repeat the request until a response is returned from the tail. The client cannot start a new request before it receives a reply to its earlier request. A configurator process ( an infallible process which would in pra

Emerging trends in big data software

Mike Franklin , a famous expert on data science, had visited our department at University at Buffalo in May to talk about emerging trends in big data software. I had taken some notes during his talk, and decided to summarize and share them here. Mike recently joined University of Chicago as the chair of Computer Science Department , and before that he was the head of UC Berkeley's AMPLab (Algorithms, Machines and People Laboratory), where he was involved with the Spark and Mesos projects which had wide academic and industrial impact. Naturally, the talk included a lot of discussion about AMPLab projects, and in particular Spark. Mike described AMPLab as a federation of faculty that collaborate around an interesting emerging trend. AMPLab has in total 90 faculty and students. Half of the funding comes from government, and the other half from industrial sponsors. The industrial sponsors also provide constant feedback about what the lab works on and how it matters for them. As an

How does one get motivated for teaching?

A friend recently complained that, even after years in academia, he never got quite adjusted to teaching. He said he doesn't see much incentive in teaching and found it boring, and asked about how one can get motivated for teaching. This is actually a good question and it is important to get in to the habit of questioning and checking oneself. Here is how I answer this question. I don't need to get motivated for teaching. Teaching is a responsibility bestowed upon me as an academician. So I look at this from a responsibility perspective. I also have responsibilities as a parent, as a husband, or as a driver (if I am the designated driver), and for matters of responsibility, I don't need motivation. I know I should rise to the occasion. Teaching is my service to the students, who really need it. Today students have many alternatives. They can watch a lecture on YouTube, and can find a variety of study material on the Internet. But it is only I that stand in front of

My Distributed Systems Seminar's reading list for Spring 2017

Below is the first draft list of papers I plan to discuss in my distributed systems seminar in the Spring semester. If you have some suggestions on some good/recent papers to cover, please let me know in the comments. Datacenter Operating System Firmament: Fast, Centralized Cluster Scheduling at Scale (OSDI 16) Large-scale cluster management at Google with Borg (Eurosys 15) Apache Hadoop YARN: yet another resource negotiator (SOCC 13) Slicer: Auto-Sharding for Datacenter Applications (OSDI 16) Monitoring Pivot Tracing: Dynamic Causal Monitoring for Distributed Systems (SOSP 15) Shasta: Interactive Reporting At Scale (SIGMOD 16) Adaptive Logging: Optimizing Logging and Recovery Costs in Distributed In-memory Databases (SIGMOD 16) Consistency The many faces of consistency (2016) The SNOW Theorem and Latency-Optimal Read-Only Transactions (OSDI 16) Incremental Consistency Guarantees for Replicated Objects  (OSDI 16) Just Say NO to Paxos Overhead: Replacing Consensus wi

Modeling a Replicated Storage System in TLA+, Project 1

Why a TLA+ project? The first project assignment in my distributed systems class this semester was modeling a replicated storage system in TLA+. Assigning a TLA+ project makes me a rarity among distributed systems professors. A common project would be a MapReduce programming assignment or a project to implement a simple distributed service (such as a key-value store) in Java. I think that a MapReduce deployment project does not teach much about distributed systems, because MapReduce is a very crude abstraction and hides all things distributed under the hood. Using MapReduce for the distributed systems class project would be like handing people a mechanics certification upon successful completion of a driving test. Implementing a simple distributed service, on the other hand, would teach students that indeed programming and debugging distributed systems is very hard. However, I would suspect that much of the hardship in that project would be due to accidental complexities of the i

Book Review: Einstein (by Walter Isaacson)

I read this book recently and liked it a lot. It was a surprisingly engaging read. I thought I knew a lot about Einstein, and the book would be redundant and bland. But the book proved me wrong. I learned a lot of new things about Einstein, especially about his personality and his research perspective and style. The book also did a great job on explaining Einstein's theories, scientific achievements in an accessible and interesting manner. Einstein's personality Let's get this out of the way. Einstein didn't fail math. He was always a very smart and hardworking student. In primary school, he was at the top of his class and "far above the school requirements" in math. And before he was 15 he had mastered differential and integral calculus. What may have helped propagate the myth was Einstein's unruly and defiant personality. Einstein would do great at the things he likes, and not good at the things he doesn't like. He had excelled in his universi

How Complex Systems Fail

This is a 4 page report about the nature of failures in complex systems. It is a gloomy report. It says that complex systems are always ridden with faults, and will fail when some of these faults conspire and cluster. In other words, complex systems constantly dwell on the verge of failures/outages/accidents. The writing of the report is peculiar. It is written as a list of 18 items (ooh, everyone loves lists). But the items are not independent. For example, it is hard to understand items 1 and 2, until you read item 3. Items 1 and 2 are in fact laying the foundations for item 3. The report is written by an MD, and is primarily focused on healthcare related complex systems, but I think almost all of the points also apply for other complex systems, and in particular cloud computing systems. In two recent posts ( Post1 , Post2 ), I had covered papers that investigate failures in cloud computing systems, so I thought this report would be a nice complement to them. 1) Complex syste

Modeling Paxos and Flexible Paxos in Pluscal and TLA+

The first part of this post describes modeling Paxos in Pluscal. The second part shows how to modify that model to achieve a flexible quorum Paxos . The idea for flexible quorums was introduced in a paper in 2016 summer.  This simple and surprising result says "majority agreement is not required in Paxos and the sets of acceptors required to participate in agreement (known as   quorums) do not even need to intersect with each other" . Modeling Paxos in Pluscal While there are many examples of Paxos modeling in TLA+ available, I haven't found any Pluscal modeling of Paxos, except this one from Lamport , which helped me come up with my Pluscal model below. The problem with TLA+ is that it is too low-level (i.e., too declarative and math-like) for writing--and reading-- distributed algorithms. The PlusCal language provides a higher-level pseudocode, which is easier to follow. However, as you go through my Pluscal model below, you will find that it doesn't follow yo

Popular posts from this blog

I have seen things

PigPaxos: Devouring the communication bottlenecks in distributed consensus

My Distributed Systems Seminar's reading list for Fall 2020

Learning about distributed systems: where to start?

Fine-Grained Replicated State Machines for a Cluster Storage System

Book review. Tiny Habits (2020)

Zoom Distributed Systems Reading Group

My Distributed Systems Seminar's reading list for Spring 2020

How to run effective paper reading groups

Paxos unpacked