Showing posts from December, 2017

Best of 2017 in MuratBuffalo

Paper summary. Real-Time Machine Learning: The Missing Pieces

This paper, dated March 11, 2017 on arxiv, is from UB Berkeley. Here is Prof. Michael Jordan's Strata Hadoop conference talk on this.

The paper first motivates the need for real-time machine learning. For this it mentions in-situ reinforcement learning (RL) that closes the loop by taking actions that affect the sensed environment. (The second paragraph mentions that such RL can be trained more feasibly by using simulated/virtual environments: by first trying multiple actions [potentially in parallel] to see their affect in simulation before interacting with the real world. Again this requires real-time performance as the simulation should be performed faster than real-time interaction.)

Based on this application scenario, here are their desired requirement from the ML platform.
R1: low latency
R2: high throughput
R3: dynamic task creation (RL primitives such as Monte Carlo tree search may generate new tasks during execution)
R4: heterogeneous tasks (tasks would have widely differe…

TensorFlow-Serving: Flexible, High-Performance ML Serving

This paper by Google appeared at NIPS 2017. The paper presents a system/framework to serve machine learning (ML) models.

The paper gives a nice motivation for why there is a need for productizing model-serving using a reusable, flexible, and extendable framework. ML serving infrastructure were mostly ad-hoc non-reusable solutions, e.g. "just put the models in a BigTable, and write a simple server that loads from there and handles RPC requests to the models."
However, those solutions quickly get complicated and intractable as they add support for:
+ model versioning (for model updates with a rollback option)
+ multiple models (for experimentation via A/B testing)
+ ways to prevent latency spikes for other models or versions concurrently serving, and
+ asynchronous batch scheduling with cross-model interleaving (for using GPUs and TPUs).

This work reminded me of the Facebook Configerator. It solves the configuration management/deployment problem but for ML models.
"What i…

WPaxos: a wide area network Paxos protocol (Part 1)

Paxos is great for solving the fault-tolerant coordination problem in a datacenter. But for cross datacenter coordination (which is needed for distributed databases, filesystems, configuration management, etc.), it hits  the WAN latency barrier. The multi-decree Paxos (Multi-Paxos) algorithm, implemented in variants like Raft and Zab, relies on electing a distinguished leader to serialize updates and hence cannot deal with write-intensive scenarios across the wide area networks (WAN).

An alternative is the leaderless Paxos algorithms. Generalized Paxos and EPaxos employ opportunistic leaders for non-interfering commands and are able to reduce 3 message delays to 2, and allow concurrent leaders to commit. But the fast agreement incurs the cost of a much larger quorum named fast-quorum (3/4ths of the nodes) and hits the WAN latency barrier as well.

Another alternative (as employed in Google Spanner) is to use multiple Paxos groups with partitioned data/object space. In order to provide …

Retroscope: Retrospective cut-monitoring of distributed systems (part 3)

This post continues the discussion on monitoring distributed systems with Retroscope. Here we focus on cut monitoring approach Retroscope uses. (This post is jointly written with Aleksey Charapko and Ailidani Ailijiang.)

Retroscope is a monitoring system for exploring global/nonlocal state history of a distributed system. It differs from other monitoring tools due to the way it inspects the system state. While request tracers inspect the system by following the trace of a request (i.e. request r in the figure), Retroscope performs cut monitoring and examines the system at consistent global cuts, observing the state across many machines and requests. It moves along the system history and scans a progression of states one cut at a time, checking cut  Ts1 and then Ts2 and so on.

Retroscope’s cut monitoring approach is complementary to the request tracing solutions, and brings a number of advantages. First, by exposing the nonlocal state, Retroscope enables users to examine nonlocal prope…

Useful podcasts update

1.5 years ago, I had posted a list of the useful podcasts I subscribe to. This is a good time to update that list with some recent favorites.

Listening to insightful podcasts is a great way to patch/upgrade your personal operating system. So if you have some good ones you can recommend, let me know.

Masters of Scale with Reid Hoffman  This podcast talks about startups and scaling them. Reid Hoffman is a master of this domain, and he brings and interviews great people. He had Mark Zuckerberg, Sheryl Sandberg, Peter Thiel, and Eric Schmidt in his series.  Every episode is great. The last one I listened to was Part 1 with Barry Diller. I relate very close to  Barry's approach to learning by deconstructing and understanding from the fundamental elements, and feeling inadequate until good insight and understanding is gained.

DILLER:​ ​By purpose or by temperament, I’m only interested in those things where I haven’t figured it out, and I really do think that however it happened, that whe…

Upgrade your personal operating system

It is not hard to make analogies, because all analogies are to some extent inaccurate. When Paul Graham made the analogy about hackers and painters, many people criticized and said that there are many other hackers and vocation analogies. But analogies, while inaccurate, can also be useful. Analogies can give you a new perspective, and can help you translate things from one domain to the other to check if it is possible to learn something from that.

Computer and brain analogy has been made for several decades now. It is not surprising that trendy technologies of the times (hydraulics, mechanics, steam engines, clocks, electricity, telegraph) have been historically used as metaphors explaining the brain. So the computer metaphor is in all likelihood as laughable as the earlier ones. But again that doesn't mean it can't be useful.

The Getting Things Done system has made the brain and RAM analogy. It stated that unrecorded tasks waste space in the RAM, and you need to move those o…

Distributed system exams

Last week was the finals week at UB. I gave my final exam on distributed systems on Monday.

Many factors go into designing the exam. In a written exam, especially when you have a large class (mine was 75 students), you have to consider the ease of reading/grading the questions. So instead of asking open ended questions, I ask short answer questions such that the space provided under each question is enough to give complete answers. Secondly, since it is impossible to cover everything the course taught, I try to hit the salient points. Finally, I make sure that I ask sufficient number of straightforward questions so that the students that did a decent job of following the course and studying can get about 70 points out 100.

This link shows the final with solutions. I solve the final during the exam while my two TAs and occasionally I proctor the exam.

And this link shows the midterm I gave earlier in the semester. Note the Paxos question, question # 4. I tried to keep the question brief …

TLA+/Pluscal solution for modeling of 2-phase commit transactions

I had posted about the second TLA+ project I assigned for the distributed systems class the other day. Here is the solution to the project. The solution is pretty simple and straightforward. (This was built by modifying the /P2TCommit.tla/ at Lamport's site to add a TM and a BTM agent).

Initial definitions
Lines 12-13 set the initial RMstates for each replica and the initial TMstate.

Lines 15-18 define the canCommit and canAbort conditions. If all RMs are "prepared" (they are ready for a commit decision), the TM uses this to set tmState="commit". CanCommit is also true if there is an RM that already has "committed" state. This is possible if the TM already made the "commit" decision and an RM went ahead with it and transitioned to "committed" from "prepared". Then if the TM failed, and BTM is to make a decision, the BTM sets tmState="commit" in order to keep Consistency.

If there exists an RM with "abort&qu…

Paper writing woes

I am busy submitting papers for the ICDCS deadline. Today was a long day filled with just polishing the writing of the papers. At this point all I can think of is the dorodangos.

And this Peugot commercial. For no particular reason. None at all.

Reasoning compositionally about security

Prof. Andrew Myers from Cornell visited our department at UB couple weeks ago, and gave a talk. I had taken some notes during the talk, and figured I should organize them a little and post here.

I was delighted to see Andrew had a blog. I just wish he posted more frequently. I especially liked the upgoer-five-editor composed "What I do" post, the "GashlyCode-Tinies" post, and the "Deserialization considered harmful" posts.

I like it when researchers and professors blog. Blogging gives us a different venue with an informal format to present our opinions. This enables us to provide intuitive and simpler explanations, opine open-endedly without needing to provide proof/validation, and even show our humorous and human-sides. Blogging is ideal for professing, I think.

Anyways back to Andrew's talk. The talk was about a couple recent papers
"Secure Information Flow Verification with Mutable Dependent Types" and "Verification of a practical ha…

TLA+/Pluscal modeling of 2-phase commit transactions

For the second project in my distributed systems class Fall 17, I assigned modeling of the two-phase transaction commit protocol. I ask students to model what happens when the initiator/transaction manager (TM) fails, how would a backup (TM) take over, and what type of problems could arise.

Here is the description of the project. I will post the solution later on.

2 phase commit In a distributed system, a transaction is performed by a collection of processes called resource managers (RMs), each executing on a different node. The transaction ends when the transaction manager (TM) issues a request either to commit or to abort the transaction. For the transaction to be committed, each participating RM must be willing to commit it. Otherwise, the transaction must be aborted. The fundamental requirement is that all RMs must eventually agree on whether the transaction is committed or aborted.

Here is a model of 2-phase commit. (This was built by modifying the /P2TCommit.tla/ at Lamport'…

Paper Summary. The Case for Learned Index Structures

This paper was put on Arxiv yesterday and is authored by Tim Kraska, Alex Beutel, Ed Chi, Jeff Dean, Neoklis Polyzotis.

The paper aims to demonstrate that "machine learned models have the potential to provide significant benefits over state-of-the-art database indexes".

If this research bears more fruit, we may look back and say, the indexes were first to fall, and gradually other database components (sorting algorithms, query optimization, joins) were replaced with neural networks (NNs).

In any case this is a promising direction for research, and the paper is really thought provoking.

Motivation Databases started as general, one-size fits all blackboxes. Over time, this view got refined to "standardized sizes" to OLAP databases and OLTP databases.

Databases use indexes to access data quickly. B-Trees and Hash-maps are common techniques to implement indexes. But along with the blackbox view, the databases treat the data as opaque, and apply these indexes blindly wi…

Enabling API Virtualization on Android for Platform Openness

This morning I was at Taeyeon Ki's dissertation proposal. He presented his work on enabling openness (hackability/tinkerability) in Android platform-side.

He defined openness as being able to participate in innovation and distribution, and gave SDN and FUSE are examples of openness.

He argued that mobile vendors control their platforms tightly, and this prevents tinkering the platform and obstructs innovation. Even though you can innovate at the app side, the OS/platform side is closed for tinkering. He posed the question: how can we enable anyone to easily develop distribute new platform-level functionality?

To answer this question, he presented his work in two projects: Reptor and Mimic.

ReptorReptor is a bytecode instrumentation tool enabling api virtualization on Android. Reptor does this by intercepting calls and redirecting them to a new implementation of the method called.

Intersecting calls to methods is not straightforward. Doing just method name substitution cannot handl…


Open Computing Language (OpenCL) is a framework for writing programs that execute across heterogeneous platforms consisting of central processing units (CPUs), graphics processing units (GPUs), digital signal processors (DSPs), or field-programmable gate arrays (FPGAs). Heterogeneous computing refers to systems that use more than one kind of processor or cores for high performance or energy efficiency.

OpenCL views a computing system as consisting of a number of compute devices (GPUs, CPUs, FPGAs) attached to a host processor (a CPU). It defines a C-like language for writing programs. Functions executed on an OpenCL device are called /kernels/. A single compute device typically consists of several compute units, which in turn comprise multiple processing elements (PEs). A single kernel execution can run on all or many of the PEs in parallel.

In addition to its C-like programming language, OpenCL defines an API that allows programs running on the host to launch kernels on the compute d…

Popular posts from this blog

I have seen things

SOSP19 File Systems Unfit as Distributed Storage Backends: Lessons from 10 Years of Ceph Evolution

PigPaxos: Devouring the communication bottlenecks in distributed consensus

Frugal computing

Fine-Grained Replicated State Machines for a Cluster Storage System

Learning about distributed systems: where to start?

My Distributed Systems Seminar's reading list for Spring 2020

Cross-chain Deals and Adversarial Commerce

Book review. Tiny Habits (2020)

Zoom Distributed Systems Reading Group