Posts

Showing posts from December, 2016

Selected blog posts from 2016

This is my 42nd post for the year. As is the custom in a year-end post, I mention some highlights among my 41 posts in 2016. Machine Learning Learning Machine Learning: A beginner's journey. I wrote this to quickly recap about what I found confusing and useful while I learned about machine learning. I had no idea this would blow up. This received the most pageviews: 45,000 and counting. TensorFlow: A system for large-scale machine learning.  Facebook papers Realtime Data Processing at Facebook.   Measuring and Understanding Consistency at Facebook.   Holistic Configuration Management at Facebook.   Fault-tolerance Why Does the Cloud Stop Computing? Lessons from Hundreds of Service Outages.  TaxDC: A Taxonomy of nondeterministic concurrency bugs in datacenter distributed systems.   Distributed Coordination Modular Composition of Coordination Services. Make sure you read the comment at the end, where  one of the author...

Learning Machine Learning: A beginner's journey

Image
I have been learning about machine learning and deep learning (ML/DL) for the last year. I think ML/DL is here to stay. I don't think this is a fad or bubble! Here is why: ML/DL has results. It is hard to argue against success. ML/DL has been on the rise organically since 1985 (with the backpropagation algorithms) and went through another phase of acceleration after 2005 ( with the wide availability of big data and distributed data processing platforms ). The rise of ML/DL is following a rising curve pattern, not the pattern for a hyped ephemeral bubble. Since it grew gradually over many years, I am betting it will be around for at least the same amount of time.  ML/DL has been co-developed with applications. It has developed very much on the practice side with trial and error, and its theory is still lagging a bit behind and is unable explain many things. According to Nassim Taleb's heuristics ML/DL is antifragile. ML/DL has the market behind it. Big money provides big i...

Book review: Intuition pumps and other tools for thinking

The title of this book grabbed my attention immediately. Intuition pumps is a very visual term, and who doesn't like to learn about tools for thinking. The premise of the book is given in the first quote: "You can't do much carpentry with your bare hands and you can't do much thinking with your bare brain." -- Bo Dahlbom. The book is by Philosopher Daniel Dennett. The book is surprisingly readable for a philosophy book, which are full of jargon and big words. Dennett took special care in writing in a simple an clean way. For this, he recruited help from undergraduate students in his university, Tufts. The book content was discussed at an undergraduate seminar Dennett offered, and he then got help from these students review the book. This is later revealed as one of Dennett's thinking tools: "Explain to nonexperts: use a decoy audience". I think that worked: the book is accessible to an undergraduate, but a motivated one. The first chapter expla...

TLA+ project2 solution (2016)

Image
In a previous post, I had given the description of project2. Here is the TLA+ project2 solution made available on github. Below I will briefly explain the solution. If you are not familiar with the chain-replication protocol, read this brief summary.  The setup and variables The first part is about the setup. We declare the process IDs for storage nodes (denoted as Nodes), the clients, and the configurator process. The nodes maintain "db" each, where the item is updated and queried. For simplicity, our distributed key-value store maintains only a single item, and initialized to ver=-1, val=-1, cli=-1. The nodes also have an auxiliary variable "up", which is used for modeling the status of the node. A node can go "down" at any time provided that less than FAILNUM nodes are currently down. Each node and client has a message box, initially all message boxes are empty. Finally, there is a global variable chain. Initially it is an empty se...

Feedback from my distributed systems class

I am done with my distributed systems class this semester . At the end of the semester, I polled my students for feedback about the class. I was especially interested in hearing about their feedback on using TLA+ in the course and course projects, and using research paper reviewing for homework assignments. Of course, I didn't mention these when I solicited their feedback. I asked them to write a brief review for what they have learned in my distributed systems class this semester. I received overwhelmingly positive feedback. Below I share some of the comments about TLA+ and the paper review assignments. (I removed the flattering feedback about my teaching, while I enjoyed reading them.  I put a lot of effort in my teaching and I am happy when the students recognize how much I care.  That keeps me motivated. ) Feedback about using TLA+ I assigned two TLA+ projects. The first project was modeling Voldemort key-value store with client-side routing , and the second project ...

Modeling a Replicated Storage System in TLA+, Project 2

Image
Two weeks ago, I had described phase 1 of the TLA+ project I assigned in my distributed systems class. I promised to give you the solution soon. I put the solution for phase1 on github. In this post, I will describe the phase 2 of the project. In phase 2 of the project, we use Chain Replication to ensure persistence and consistency of the objects stored. Read my chain replication summary to refresh on the algorithm. Before performing the write, the client reads from the tail of chain to learn the highest versioned write completed. The client then increments the version number, and performs the write to the head  of chain with this version number. Since a node failure may lead to a request being lost (be it a read request, or update request), the client needs to repeat the request until a response is returned from the tail. The client cannot start a new request before it receives a reply to its earlier request. A configurator process ( an infallible process which would in...

Emerging trends in big data software

Image
Mike Franklin , a famous expert on data science, had visited our department at University at Buffalo in May to talk about emerging trends in big data software. I had taken some notes during his talk, and decided to summarize and share them here. Mike recently joined University of Chicago as the chair of Computer Science Department , and before that he was the head of UC Berkeley's AMPLab (Algorithms, Machines and People Laboratory), where he was involved with the Spark and Mesos projects which had wide academic and industrial impact. Naturally, the talk included a lot of discussion about AMPLab projects, and in particular Spark. Mike described AMPLab as a federation of faculty that collaborate around an interesting emerging trend. AMPLab has in total 90 faculty and students. Half of the funding comes from government, and the other half from industrial sponsors. The industrial sponsors also provide constant feedback about what the lab works on and how it matters for them. As an ...

Popular posts from this blog

Hints for Distributed Systems Design

Learning about distributed systems: where to start?

Making database systems usable

Looming Liability Machines (LLMs)

Advice to the young

Foundational distributed systems papers

Distributed Transactions at Scale in Amazon DynamoDB

Linearizability: A Correctness Condition for Concurrent Objects

Understanding the Performance Implications of Storage-Disaggregated Databases

Designing Data Intensive Applications (DDIA) Book