Wednesday, November 20, 2019

Seventh grade openhouse

Recently, I went to my son's 7th grade openhouse. It was a nice setup. They made me follow my son's daily routine, visiting the same classrooms in the same order, but with 10 minute class time where the teachers gave information about what they would be doing this year.

Overall I am impressed by what I saw and heard. My main takeaways was that the school has embraced a hands-on learning curriculum, and is making very good/wise use of technology.

My impression from Math, Science, ELA classes was that, the middle school is using the flipped-classroom model to a great extent. There is little lecturing and a lot of group work.

The science class presentation was exciting. Instead of memorizing, the class emphasizes hypothesizing, testing, and reasoning.

Art class also sounded exciting. They added new media modules. This is not surprising given that half the 12 years old population list being a YouTuber as their career choice. The art teacher said that their goal is to get the students excited and comfortable with making/producing things.

The school offers many interesting clubs in addition to the curriculum. I wish I was a student, and had the opportunity to learn in this environment.

Technology use

The school makes very good use of technology. Each student gets a ChromeBook for the year. They use it at home, charge overnight, and bring it to school to use in the classes all day. These ChromeBooks have very good battery life, and they are very durable.

Everything is on the web, and accessible via the ChromeBooks. The students have access to grading information via the schools webportal for them. Many classes use apps that enable the students to create flash cards for studying. The students get to take practice tests. And almost every class  make use of videos, including YouTube videos.

In each class, the students are responsible for organizing their agendas and keep track of the homework deadlines. The students reach all their class material and view and submit homework online using their ChromeBooks and Google office and other Google tools.

It looks like Google has the monopoly on the mindshare of the new generation. And Microsoft is missing the boat on this one big time. By the time these students graduate college and start jobs, they may not be very willing to adopt into the Windows/PC ecosystem.

Each classroom has a smartboard. These smartboards are actually computer screens projected over to a regular whiteboard. The smartboard functionality comes from the software that tracks hand movements and pen touches to the whiteboard via a camera. The teacher can display any website on the smartboard, and can scroll, click, zoom, with ease.

I am left with a lurking and jealousy guilt after the openhouse. Compared to the middleschool's technology use, our universities and our CS department's technology use is very lame... Why is that?

In my university, we keep on traditional lecturing. In our classes, there is a nice monitor and projecting technology, but there are no smartboards. I think there are clickers available in some classrooms, but that is ancient tech and unnatural way to interact. The university makes us use the God-awful (and even more ancient) Blackboard software for interacting/managing the class and grades.

Monday, November 18, 2019

Book Review. Digital minimalism: Choosing a Focused Life in a Noisy World

"Digital Minimalism: Choosing a Focused Life in a Noisy World" is Cal Newport's new book. The topic of the book is clear from the title. You should be quitting your Facebook checking, Twitter chatting, Imgur browsing, phone fiddling habits. No more monkey business. It is time to get to work. Deep work.

In Chapter 4 of the book, Calvin forms the term *solitude deprivation*. Solitude deprivation is on the other end of the spectrum to solitary confinement, but it can also be bad for you as well over a long duration. The book argues that today we all experience solitude deprivation. The smartphones, laptops, and screens do not give us time to be alone with our thoughts and process things in our speeds. I had heard a nice story, where the Amazon natives recruited for an expedition in to the jungle would take long breaks after doing some walking. They would say they are waiting for their soul to catch up to their bodies. Today we don't give time for our souls to catch up to us, and process both emotionally and rationally the flood of events/news we are bombarded with every day.

So I really liked Chapter 4 that talked about solitude deprivation. This chapter made me really worried that online connectivity, and never getting bored, could be doing more harm to me than I thought.  This chapter made a much more convincing case for the need for quitting social media than the first couple chapters in my view. But maybe it is because I am more of an abstract ideas guy.

Calvin's previous book "Deep Work" had a lot of impact. I think "Digital Minimalism" may not have that much impact. (Well, Digital Minimalism already has become a New York Times, Wall Street Journal, Publishers Weekly, and USA Today bestseller... I guess I mean more impact than that ;-) Deep Work had a positive message, "embrace deeper concentration", whereas Digital Minimalism has a negative message, "prevent digital clutter". I know, I know... For each book, you could simply switch the statements from positive to negative and vice versa. I am just referring to the tone/mood of the books. Digital Minimalism is more of a self-help/how-to book. It prescribes lists of things to do and not to do in a somewhat patronizing voice. The Deep Work book was more conceptual and thought-provoking, and less of a how-to self-help book. I have listened to Deep Work at least three times. I don't see that happening with the "Digital Minimalism" book. I would have liked to read a book titled "Deep Solitude" from Calvin, which I am sure I would be re-reading several times.
If you want to build a ship, don't drum up people to collect wood and don't assign them tasks and work, but rather teach them to long for the endless immensity of the sea.
--Antoine de Saint-Exupery

In any case, I think this is a book you should definitely check out. I wish Calvin best of luck with getting these ideas adopted. They are very timely and important.  In 2005, I was labmates with Calvin at Nancy Lynch's theory of distributed systems group. Calvin is like a real life Captain America. Always responsible, kind, tidy, and disciplined. He would arrange his working hours carefully and would optimize everything. He is super smart and productive. His publication record is impressive. He is a theory and abstract thinking person at heart. He thinks clearly and in a sound manner with much deliberation. He is both a successful professor and a successful writer. He is walking the walk as well. So we should pay attention when he is talking.

MAD questions

1. Is social media more dangerous than TV was? 
It may be so because it is algorithmically manipulated to be addictive. Technology companies completed the loop and this is a quick feedback loop fed by millions of people participating in the experiment.  On the other hand, I have heard of a hypothesis that, since the kids are raising up with this technology, they will develop ways to be immune to it. But I don't know if I am convinced by that argument. Parents should definitely regulate/restrict for these technologies. And I think even the governments should be regulating for these technologies. The Digital Minimalism book cites that mental health is at crisis level for millennials raised with this technology available to them. I see students on their phones walking in the halls and fiddling with their phones even in class. They are always busy catching up to what is on their screens, but they are missing up on things happening around them, and most importantly happening within them.

2. Is it possible to have a good social media tool?
Things have a way of going south quickly for social media and common collaboration tools. Quora, which was once where insightful articles reside, is a cesspool now. I guess we chalk it up to human nature.

I like my Twitter-verse. It is nicely curated to give me a chance to observe interesting people chat and think. It is like watching passersby at a coffee. It is not deep conversation, but it is still useful to keep me inspired and informed on these people's interests. I wish we could write paragraphs on Twitter, but then, maybe people wouldn't write and interact that much.

Saturday, November 16, 2019

SOSP19. I4: Incremental Inference of Inductive Invariants for Verification of Distributed Protocols

This paper is by Haojun Ma (University of Michigan), Aman Goel (University of Michigan), Jean-Baptiste Jeannin (University of Michigan), Manos Kapritsos (University of Michigan), Baris Kasikci (University of Michigan), Karem A. Sakallah (University of Michigan).

This paper is about formal verification of distributed systems. Writing proofs manually is cumbersome. Existing tools for formal verification all require the human to find the inductive invariant.

I4 combines power of Ivy (a tool for interactive verification of infinite-state systems) and model checking in order to find inductive invariant without relying on human intuition. Ivy takes as input a protocol description and a safety property, and guides the user interactively to discover an inductive invariant. The goal for finding an inductive invariant is to prove that the safety property always holds. An inductive proof has a base case, which proves initial state is safe, and an inductive step, which proves if state k is safe, prove state k+1 is safe. Once that inductive invariant is found, Ivy automatically verifies that it is indeed inductive.

The insight in I4 is that the safety/correctness behavior of a distributed system does not fundamentally change as the size increases. I witness this regularly in my use of TLA+ for model checking protocols. TLA+ is able to identify any problem (sometimes requiring upto 40 steps) by finding a counterexample involving three nodes. Three nodes is often what it takes. One node initializes a coordination operation, and the other two nodes see a different perspective of the ongoing computation, maybe due to exchanging messages with each other (i.e. doing stale reads) at inopportune times, and arrive to conflicting decisions that violate the goal of the coordination operation.

I4 uses inductive invariants from small instances and apply/generalize to large instances, and automates this with model-checking. More specifically, I4 first creates a finite instance of the protocol; use a model checking tool to automatically derive the inductive invariant for this finite instance; and generalize this invariant to an inductive invariant for the infinite protocol. This amends the I4 approach in that it automates the inductive invariant discovery process. This amends the model checking approach as well. While model checking is fully automated, it doesn’t scale to distributed systems. I4 applies model checking to small, finite instances and then generalizes the result to all instances.

The figure above shows an overview of the I4 flow for the invariant generation on a finite instance.
Given a protocol description--written in Ivy--and an initial size, I4 first generates a finite instance of that protocol with a given initial size. For example, ... I4 will generate a finite instance of the protocol with one server and two clients. It then uses the Averroes model checker to either generate an inductive invariant that proves the correctness of the protocol for that particular instance, or produce a counterexample demonstrating how the protocol can be violated and which can be used to debug the protocol. If the protocol is too complex, the model checker may fail to produce an answer within a reasonable amount of time or it may run out of memory. If this occurs, the finite encoding is simplified—using a concretization technique—to further constrain it and make it easier for the model checker to run to completion. This step is currently done manually but is easily automatable. Once an inductive invariant has been identified, I4 generalizes it to apply not only to the finite instance that produced it, but also to all instances of the protocol.
It is important to note that if the safety invariant does not hold, Averroes produces a counterexample and the human should work on the protocol to come up with a safety invariant that holds for the protocol. I4 is automatic, in that if the protocol safety invariant holds, then the inductive invariant is generated automatically by the Averroes tool.  But, wait, what is the difference between safety invariant and inductive invariant? Isn't safety invariant already inductive?

Safety property P may be an invariant but not an inductive one. "The verification proof requires the derivation of additional invariants that are used to constrain P until it becomes inductive. These additional invariants are viewed as strengthening assertions that remove those parts of P that are not closed under the system's transition relation." In other words, while the safety property holds for reachable states, it may not be closed under program actions outside the reachable states. This makes safety invariant unsuitable for verification since proving properties is not constrained to the reachable states (as it is hard to enumerate/identify reachable states in a proof). So, the inductive invariant is a version of the safety property that is closed under the program actions. The figure below illustrates this relationship. I think this concept is explored further in the Ivy paper.

If the safety property holds, then Averroes generates an inductive invariant for the finite instance; minimizes the invariant by removing redundant clauses; and then passes it on to the next step to be generalized. However, occasionally the finite instance may still be too large for the Averroes model checker, and it may run out of memory. This is where human involvement is needed again. The human helps concretize the small finite version of the protocol further to avoid state space explosion. Symmetry plays a big role here. FIRST is the keyword that denotes the node that sends the first message. The model checker can try instances where all the three nodes in the finite instances might be the one that sends the message. The human can notice a symmetry and set "FIRST = Node1" to help reduce the state space. (The team is working on automating this step as well.)

Then I4 uses Ivy for the proof generation as shown below, and the verification is complete.

I4 is available as opensource at They applied I4 to several examples as shown in the table.

I4 improves on manual verification via using Coq and interactive verification using Ivy.

A restriction in I4 is that it applies to verification of safety properties, and not to liveness properties.

I am happy to find so many verification papers at SOSP. This paper appeared in the distributed systems in the afternoon of Day 2. In the morning of Day 2, there was a session on verification which included four papers. I had reviewed two of these papers earlier: "Scaling Symbolic Evaluation for Automated Verification of Systems Code with Serval" and "Verifying Concurrent, Crash-safe Systems with Perennial". It looks like the verification community at SOSP is quick to take results from more general and theoretic verification conferences, and integrate those tools and improve upon them to put them in use for verification of practical systems.

Thursday, November 14, 2019

SOSP19 Lineage Stash: Fault Tolerance Off the Critical Path

This paper is by Stephanie Wang (UC Berkeley), John Liagouris (ETH Zurich), Robert Nishihara (UC Berkeley), Philipp Moritz (UC Berkeley), Ujval Misra (UC Berkeley), Alexey Tumanov (UC Berkeley), Ion Stoica (UC Berkeley).

I really liked this paper. It has a simple idea, which has a good chance of getting adopted by real world systems. The presentation was very well done and was very informative. You can watch the presentation video here.

Low-latency processing is very important for data processing, stream processing, graph processing, and control systems. Recovering after failures is also important for them, because for systems composed of 100s of nodes, node failures are part of daily operation.

It seems like there is a tradeoff between low latency and recovery time. The existing recovery methods either have low runtime overhead or low recovery overhead, but not both.
  • Global checkpoint approach to recovery achieves a low runtime overhead, because a checkpoint/snapshot can be taken asynchronously and off the critical path of the execution. On the other hand, the checkpoint approach has high recovery overhead because the entire system needs to be rolled back to the checkpoint and then start from there again.
  • Logging approach to recovery has high runtime overhead, because it synchronously records/logs every data about any nondeterministic execution after the last checkpoint. On the flip side of the coin, it can achieve low overhead to recovery because only the failed processes need to be rolled back a little and resume from there. 

Can we have a recovery approach that achieves both low runtime overhead and low recovery overhead? The paper proposes the "lineage stash" idea to achieve that. The idea behind lineage stash is simple.

The first part of the idea is to reduce the amount of data logged by only logging the lineage. Lineage stash logs the pointers to the data messages instead of the data, and also logs task descriptions in case that data needs to be recreated by the previous operation. Lineage stash also logs the order of execution.

The second part of the idea is to do this lineage logging asynchronously, off the critical path of execution. The operators/processes now include a local volatile cache for lineage, which is asynchronously flushed to the underlying remote global lineage storage. The global lineage store is a sharded and replicated key-value datastore.

But then the question becomes, is this still fault tolerant? If we are doing the logging to the global lineage store asynchronously, what if the process crashes before sending the message, and we lose the log information?

The final part of the idea is to use a causal logging approach, and piggybacking the uncommitted lineage information to the other processes/operations for them to store in their stashes as well. So this kind of resembles a tiny decentralized blockchain stored in the stashes of interacting processes/operations.

In the figure, the filter process had executed some tasks and then passed messages to the counter process. Since the logging is off the critical path, the lineage for these tasks was not yet replicated to the global lineage stash. But as part of the rule, the lineage was piggybacked to the messages sent to the counter, so the counter has also a copy of the lineage in its stash, when the filter process crashed. Then in the recovery, the counter process helps by flushing this uncommitted lineage to the global lineage storage for persistence. The recovering filter process can then retrieve and replay this lineage to achieve a correct and quick (on the order of milliseconds) recovery.

Lineage stash idea was implemented and evaluated in Apache Flink for a stream processing word count application over 32 nodes. It was compared against the default global checkpoint recovery, and the default augmented with synchronous logging.

As the figure above shows, by using asynchronous logging approach, linear stash is able to avoid the runtime latency overhead of synchronized logging and matches that of the asynchronous checkpointing approach. Moreover, as the figure below shows, the recovery latency of checkpointing is very high. The lineage stash approach reaches similar recovery latency as the syncronized logging approach.

The lineage stash looks very promising for providing lightweight (off the critical path) fault-tolerance for fine-grain data processing systems. I really like the simplicity of the idea. I feel like I have seen a related idea somewhere else as well. But I can't quite remember it.

Monday, November 11, 2019

SOSP19 Verifying Concurrent, Crash-safe Systems with Perennial

This paper is by Tej Chajed (MIT CSAIL), Joseph Tassarotti (MIT CSAIL), Frans Kaashoek (MIT CSAIL), Nickolai Zeldovich (MIT CSAIL).

Replicated disk systems, such as file systems, databases, and key-value stores, need both concurrency (to provide high performance) and crash safety  (to keep your data safety). The replicated disk library is subtle, but the paper shows how to systematically reason about all possible executions using verification. (This work considers verification of a single computer storage system with multiple disk --not a distributed storage system.)

Existing verification frameworks support either concurrency (CertiKOS [OSDI ’16], CSPEC [OSDI ’18], AtomFS [SOSP ’19]) or crash safety (FSCQ [SOSP ’15], Yggdrasil [OSDI ’16], DFSCQ [SOSP ’17]).

Combining verified crash safety and concurrency is challenging because:
  • Crash and recovery can interrupt a critical section,
  • Crash can wipe in-memory state, and
  • Recovery logically completes crashed threads' operations. 

Perennial introduces 3 techniques to address these three challenges:
  • leases to address crash and recovery interrupting a critical section,
  • memory versioning to address crash wiping in-memory state, and
  • recovery helping to address problems due to interference from recovery actions.

The presentation deferred to the paper for the first two techniques and explained the recovery helping technique.

To show that the implementation satisfies the high-level specification a forward simulation is shown under an abstraction relation. The abstraction relation maps the concrete/implementation state to the high-level abstract specification state. Perennial adopted the abstraction relation as: "if not locked (due to an operation in progress), then the abstract state matches the concrete state in both disks".

The problem is "crashing" breaks the abstraction relation. To fix this problem, Perennial separates crash invariant (which refers to interrupted spec operations) from the abstraction invariant. The recovery proof relies on the crash invariant to restore the abstraction invariant.

Crash invariant says "if disks disagree, some thread was writing the value on the first disk". Then the recovery helping technique helps recovery commit writes from before the crash. The recovery proof shows the code restores the abstraction relation by completing all interrupted writes. As a result users get correct behavior and atomicity.

The Perennial proof framework was written in 9K lines of coq which provides crash reasoning: leases, memory versioning, and recovery helping. Perennial is built on top of Iris concurrency framework (for concurrency reasoning), which is built on top of Coq. (Iris:  R. Krebbers, R. Jung, A. Bizjak, J.-H. Jourdan, D. Dreyer, and L. Birkedal. The essence of higher-order concurrent separation logic. In Proceedings of the 26th European Symposium on Programming Languages and Systems, pages 696–723, Uppsala, Sweden, Apr. 2017.)

The authors have developed Goose for reasoning about Go implementations, but they also defer this to the paper. The developer writes Go code, and the Goose translator (written in 2K lines of Go code) translates this to Perennial proof, where it is machine checked with Coq.

As evaluation of Perennial framework, they verified a mail server written in Go. They argue that compared to a verification in CSCSPEC [OSDI ’18] (their earlier verification framework), the verification in Perennial takes less effort and is done in less number of lines of proof.

The software is available at

MAD questions

1. Is this an instance of a convergence refinement relation? 
In 2001, I was thinking on fault-tolerance preserving refinements as a graduate student working on graybox design of self-stabilization. The question was that: If we design fault-tolerance at the abstract, what guarantee do we have that after the abstract code is compiled and implemented in concrete, the fault-tolerance still holds/works?

It is easy to see that fault-tolerance would be preserved by an "everywhere refinement" that preserves the abstraction relation (between concrete and abstract) at any state, including the states outside the invariant states that are not reachable in the absence of faults. But the problem is that outside the invariant, the abstraction relation may not hold due to recovery actions being different than normal actions. That is pretty much the dilemma the Perennial work faced in verifying the recovery of replicated disks above.

OK, I said, let's relax the everywhere refinement to an "everywhere eventual refinement" and that would work for preserving fault-tolerance. Yes, it works, but it is not easy to prove that the concrete is an everywhere eventual refinement of the abstract because there is a lot of freedom in this type of refinement, and not much of a structure to leverage. The proof becomes as hard as proving fault-tolerance of the concrete from scratch. So, what I ended up proposing was a "convergent refinement", where the actions of the concrete provides a compacted version of the actions of the abstract outside the invariant. In other words, the forward simulation outside the invariant would be skipping states in the concrete. Perennial faced with the same dilemma chose to use a different abstraction relation. Whereas the convergence refinement idea is to keep the same abstraction relation but allow it to contract/skip steps in the computations outside the invariant states. I wonder if this could be applicable in the Perennial problem.

My reasoning with going compacting steps in refinement outside invariant was because it is safer than expanding the computation: if you show recovery in states in the abstract, by skipping steps (and not adding new ones) the concrete is also guaranteed to preserve that recovery.

Here is the abstract of my 2002 paper on convergence refinement. I just checked and this paper only got 19 citations in 19 years. It did not age well after getting a best paper award at ICDCS'02. In comparison, some of the papers we wrote quickly and published as short paper or as a workshop paper got more than 150-900 citations in less than 10 years. Citations is funny business.
Refinement tools such as compilers do not necessarily preserve fault-tolerance. That is, given a fault-tolerant program in a high-level language as input, the output of a compiler in a lower-level language will not necessarily be fault-tolerant. In this paper, we identify a type of refinement, namely "convergence refinement", that preserves the fault-tolerance property of stabilization. We illustrate the use of convergence refinement by presenting the first formal design of Dijkstra’s little-understood 3-state stabilizing token-ring system. Our designs begin with simple, abstract token-ring systems that are not stabilizing, and then add an abstract "wrapper" to the systems so as to achieve stabilization. The system and the wrapper are then refined to obtain a concrete token-ring system, while preserving stabilization. In fact, the two are refined independently, which demonstrates that convergence refinement is amenable for "graybox" design of stabilizing implementations, i.e., design of system stabilization based solely on system specification and without knowledge of system implementation details.

Saturday, November 9, 2019

SOSP19 Day 2, Scaling Symbolic Evaluation for Automated Verification of Systems Code with Serval

Verification session was the first session for Day 2. I like formal methods, and I did enjoy these papers. In this post I will only talk about the first paper in the session, the Serval paper. (You can read about SOSP19 Day 1 here.)

This paper is by Luke Nelson (University of Washington), James Bornholt (University of Washington), Ronghui Gu (Columbia University), Andrew Baumann (Microsoft Research), Emina Torlak (University of Washington), Xi Wang (University of Washington).

This paper received a best paper award at SOSP19, and the software is publicly available at

SOSP has a tradition of publishing systems verification papers, such as seL4 (SOSP’09), Ironclad Apps (OSDI’14), FSCQ (SOSP’15), CertiKOS (PLDI’16), Komodo (SOSP’17). A downside of systems verification is it is very effort-intensive. The Certikos manual proof consisted of more than 200K lines.

To help address this problem,  this paper introduces Serval, a framework for  developing automated verifiers for systems software. Serval accomplishes this by lifting interpreters written by developers into automated verifiers. It also provides a systematic approach to identifying and repairing verification performance bottlenecks using symbolic profiling and optimizations.

Wait, wait... What is an interpreter? And what is lifting?

In prior work on automatic verification (such as Hyperkernel SOSP17), a verifier implements symbolic evaluation for specific systems, and the verifier is not reusable/generalized. To make the verifier reusable and general, in Serval, the developers write an interpreter for an instruction set using Rosette, an extension of the Racket language for symbolic reasoning. Serval leverages Rosette to "lift" an interpreter into a verifier; which means to "transform a regular program to work on symbolic values". The developers also give the system specifications to be verified.

In the Serval framework the verifier consists of the lifted interpreter and the symbolic optimization. The steps are: write a verifier as interpreter, then Serval performs symbolic profiling to find bottleneck, and apply optimizations until verification becomes feasible.

Serval uses symbolic execution to avoid the state space explosion problem. But the program counter (PC) becoming symbolic is bad as it unnecessarily opens up search space. Serval prevents this with symbolic optimizations:

  • peephole optimization
  • fine-tune symbolic evaluation
  • use domain language to reduce the concrete values PC can take, and avoid path explosion problem.

Unfortunately I didn't understand much about the first two optimizations from listening to the presentation.

Using Serval, the authors build automated verifiers for the RISC-V, x86-32, LLVM, and BPF instruction sets. Targeting low level end of compiling stack can be an advantage for verification, because we don't need to trust higher level language toolkits. Future work will consider how the low-level-guarantees identified and verified by Serval could be connected to high level data structures for proof verification.

To show that existing systems can be retrofitted for Serval, they Retrofitted CertiKOS and Komodo for Serval. They mention this takes around 4 weeks for a new system. They also found 15 new bugs in Linux BPF JIT.

I will read the paper carefully to understand Serval better. It seems promising for scaling verification to practical systems. Of course the process still requires expertise and several weeks worth of effort, but Serval improves on the state-of-the-art with many months of effort.

Thursday, November 7, 2019

SOSP19 Day 1 wrap up

It was only 3 sessions into day 1, and my brain was fried.
Conferences are tiring because you are exposed to so many new ideas in a short time. It was clear I would not be able to pay attention to the papers in the last session, so I skipped that session (the privacy session which included the following three papers) to go for a walk at the golf park behind the conference center.

After the privacy session, there was a poster session and reception from 5-7:30pm. The poster session was nice for asking authors questions about the papers and having more in-depth conversation.

A student had told me he doesn't know how to start conversations with other conference attendees. I told him "That's easy.. Just ask them about what they are working on these days." A better way to start deeper conversations is to listen to the paper presentations, and have genuine questions about future work, or some extension and connection, and go discuss with them at coffee breaks, lunch, or poster session.

In the free-roaming poster session and reception, I had a chance to meet many colleagues and catch up on what they are working these days. When they returned the question, I had to talk for 3-5 minutes about what I am working on these days. I found that my "elevator pitch" got better and better as I had to answer this question many times.

I am a shy person, but at conferences my curiosity works in my favor, and I approach people to learn about their current work, and what they think of this paper versus that paper. I really enjoy talking to fellow researchers, each of whom is an expert in a small part of a big field. We may have different opinions on things, they may not like the papers/ideas I like, but I get to learn about their perspectives and file them in my brain without having to agree or disagree with them for now.

General impressions about SOSP 

SOSP is single track, so 500+ people were in the same big conference room for the sessions. The first half of the room had tables, and the second half just chairs. If you sat at a table row, you can rest your laptop on the table and type comfortably. I sat at the very front row and took notes. Interestingly, there is little contention for the front rows. Another advantage of sitting at the front row is that I am not distracted by seeing other audience members checking Facebook, Twitter, and mail on their laptops.

(Rant: This is my pet peeve. This drives me nuts. What is the point of flying over a long distance, driving at least two more hours from the airport to come to the conference, and check mail and social media all day long? You disrupted your whole week to travel to this conference and now you are not "at the conference". This I will never understand. Be here now!)

To comment on the papers and presentations on the first day, I found all the sessions very interesting. I don't have a favorite, all the three sessions I attended were very good.

Most of the presentations were given by graduate students. The quality of most of the presentations were very good. It is obvious a lot effort went into the rehearsals of those presentation. Almost all presenters had written (and memorized) extensive speaker notes and while the presentation view was displayed on the curtain, they had the presenter notes open on their laptops. Some of the presenters just read from their presentation notes. Those presentations were not very engaging. But at least the slides were very well organized, and the messages were distilled down to important points and were easy to follow.

Each presentation was about 20 minutes including a 2-3 minutes question answering slot at the end. (I think the SOSP conferences I attended before had 5 minutes reserved Q&A slot, but for this one the Q&A was not as rigidly reserved and enforced.)

Most of the presenters were able to cover around 40 slides in 20 minutes. This is an astonishingly large number. Normally the rule of thumb is to have 10 slides to present in 20 minutes. But being well prepared for a smooth flowing presentation, the presenters were somehow able to pull this off. I guess this takes its toll on the listeners though. I felt overwhelmed and exhausted after three sessions being bombarded by too many ideas, concepts, and acronyms.

I had written two posts about how to present in case you are looking for advice in that department.

Tuesday, November 5, 2019

SOSP19 Day 1, Debugging session

This session was the first session after lunch and had four papers on debugging in large scale systems.

CrashTuner: Detecting Crash Recovery Bugs in Cloud Systems via Meta-info Analysis

This paper is by Jie Lu (The Institute of Computing Technology of the Chinese Academy of Sciences), Chen Liu (The Institute of Computing Technology of the Chinese Academy of Sciences), Lian Li (The Institute of Computing Technology of the Chinese Academy of Sciences), Xiaobing Feng (The Institute of Computing Technology of the Chinese Academy of Sciences), Feng Tan (Alibaba Group), Jun Yang (Alibaba Group), Liang You (Alibaba Group).

Crash recovery code can be buggy and often result in catastrophic failure. Random fault injection is ineffective for detecting them as they are rarely exercised. Model checking at the code level is not feasible due to state space explosion problem. As a result, crash-recovery bugs are still widely prevalent. Note that the paper does not talk about "crush" bugs, but "crash recovery" bugs, where the recovery code interferes with normal code and causes the error.

Crashtuner introduces new approaches to automatically detect crash recovery bugs in distributed systems. The paper observes that crash-recovery bugs involve "meta-info" variables. Meta-info variables include variables denoting nodes, jobs, tasks, applications, containers, attempt, session, etc. I guess these are critical metadata. The paper might include more description for them.

The insight in the paper is that crash-recovery bugs can be easily triggered when nodes crash before reading meta-info variables and/or crash after writing meta-info variables.

Using this insight, Crashtuner inserts crash points at read/write of meta-info variables. This results in a 99.91% reduction on crash points with previous testing techniques.

They evaluated Crashtuner on Yarn, HDFS, HBase, and ZooKeeper and found 116 crash recovery bugs. 21 of these were new crash-recovery bugs (including 10 critical bugs). 60 of these bugs were already fixed.

The presentation concluded by saying that  meta-info is a well-suited abstraction for distributed systems. After the presentation, I still have questions about identifying meta-info variables and what would be false-positive and false-negative rates for finding meta-info variables via heuristic definition as above.

The Inflection Point Hypothesis: A Principled Debugging Approach for Locating the Root Cause of a Failure

This paper is by Yongle Zhang (University of Toronto), Kirk Rodrigues (University of Toronto), Yu Luo (University of Toronto), Michael Stumm (University of Toronto), Ding Yuan (University of Toronto).

This paper is about debugging for finding the root cause of a failure. The basic approach is to collect large number of traces (via failure reproduction and failure execution), and try to find the strongest statistical correlation with the fault, and identify this as the root cause. The paper asks the question: what is the fundamental property of root cause that allows us to build a tool to automatically identify the root cause?

The paper offers as the definition for root cause as "the most basic reason for failure, if corrected will prevent the fault from occurring." This has two parts: "if changed would result in correct execution" and "the most basic cause".

Based on this, the paper defines inflection point as the first point in the failure execution that differs from the instruction in nonfailure execution. And develops Kairux: a tool for automated root cause localization. The idea in Kairux is to construct the nonfailure execution that has the longest common prefix. To this end it uses unit tests, stitches unit test to construct nonfailure execution, and modifies existing unit test for longer common prefix. Then it uses dynamic slicing to obtain partial order.

The presentation gave a real world example from HDFS 10453, delete blockthread. It took the developer one month to figure out the root cause of the bug. Kairux does this automatically.

Kairux was evaluated on 10 cases from JVM distributed systems, including  HDFS, HBase, ZooKeeper. It successfully found the root cause for 7 out the 10 cases. For the 3 unsuccessful cases, the paper claims this was because the root cause location could not be reached by modifying unit tests

This paper was similar to the previous paper in the session in that it had a heuristic insight which had applicability in a reasonably focused narrow domain. I think the tool support would be welcome by developers. Unfortunately I didn't see that the code and tool is available as opensource anywhere.

Finding Semantic Bugs in File Systems with an Extensible Fuzzing Framework

This paper is by Seulbae Kim (Georgia Institute of Technology), Meng Xu (Georgia Institute of Technology), Sanidhya Kashyap (Georgia Institute of Technology), Jungyeon Yoon (Georgia Institute of Technology), Wen Xu (Georgia Institute of Technology), Taesoo Kim (Georgia Institute of Technology).

The presentation offers by asking "Can file systems be bug free?" and answers this in the negative, citing that the codebase for filesystems is massive (40K-100K) and are constantly evolving. This paper proposes to use fuzzing as an umbrella solution that unifies existing bug checkers for finding semantic bugs in filesystems.

The idea in fuzzing is to give crashes are feedback to the fuzzers. However, the challenge for finding semantic bugs using fuzzers is that semantic bugs are silent, and won't be detected. So we need a checker to go through the test cases and check for the validity of return values, and give this feedback to fuzzer.

To realize this insight, they built Hydra. Hydra is available as opensource at

Hydra uses checker defined signals, automates input space exploration, test execution, and incorporation of test cases. Hydra is extensible via pluggable checkers for spec violation posix checker (sybilfs), for logic bugs, for memory safety bugs, and for crash consistency bug (symC3).

So far, Hydra has discovered 91 new bugs in Linux file systems, including several crash consistency bugs. Hydra also found a bug in a verified file system (FSCQ), (because it had used an unverified function in implementation).

The presenter said that Hydra generates better test cases, and the minimizer can reduce the steps in crashes from 70s to 20s. The presentation also live demoed Hydra in action with symC3.

Efficient and Scalable Thread-Safety Violation Detection --- Finding thousands of concurrency bugs during testing

This paper is by Guangpu Li (University of Chicago), Shan Lu (University of Chicago), Madanlal Musuvathi (Microsoft Research), Suman Nath (Microsoft Research), Rohan Padhye (Berkeley).

This paper received a best paper award at SOSP19. It provides an easy push button for finding concurrency bugs.

The paper deals with thread safety violations (TSV). A thread safety violation occurs if two threads concurrently invoke two conflicting methods upon the same object. For example, the C# list datastructure has a contract that says two adds cannot be concurrent. Unfortunately thread safety violations still exist, and are hard to find via testing as they don't show up in most test runs. The presentation mentioned a major bug that lead to Bitcoin loss.

Thread safety violations are very similar to data race conditions, and it is possible to use data-race detection tools in a manually intensive process to find some of these bugs in small scale. To reduce the manual effort, it is possible to adopt dynamic data race analysis while running the program under test inputs, but these require a lot of false-positive pruning.

In a large scale, these don't work. The CloudBuild at Microsoft involves  100million tests from 4K team and upto 10K machines. At this scale, there are three challenges: integration, overhead, and false positives.

The paper presents TSVD, a scalable dynamic analysis tool. It is push button. You provide TSVD only the thread safety contract, and it finds the results with zero false positives. TSVD was  deployed in Azure, and it has found more than 1000 bugs in a short time. The tool is available as opensource at

To achieve zero false positive, TSVD uses a very interesting trick. A potential violation (i.e., a call site that was identified by code analysis as one that may potentially violate the thread safety contract) is retried in many test executions by injecting delays to trigger a real violation. If a real violation is found, this is a true bug. Else, it was a false-positive and is ignored.

But how do we do the analysis to identify these potentially unsafe calls to insert delays? TSVD uses another interesting trick to identify them. It looks for conflicting calls with close-by physical timestamps. It flags likely racing calls, where two conflicting calls from different threads to the same object occur within a short physical time window. This way of doing things is more efficient and scalable than trying to do a happened-before analysis and finding calls with concurrent logical timestamps. Just identify likely race calls.

OK, what if there is actual synchronization between the two potentially conflicting calls within closeby physical timestamps? Why waste energy to keep testing it to break this? Due to synchronization, this won't lead to a real bug. To avoid this they use synchronization inference (another neat trick!): If m1 synchronized before m2, a delay added to m1 leads to the same delay to m2. If this close correlation is observed in the delays, TSVD infers synchronization. This way it also infers if a program is running sequentially or not, which calls are more likely to lead to problems, etc.

They deployed TSVD at Microsoft for several months. It was given thread safety contracts of 14 system classes in C#, including list, dictionary, etc. It was tested on 1600 projects, and was run 1 or 2 times, and found 1134 thread safety violations. During the validation procedure, they found that 96% TSVs are previously unknown to developers and 47% will cause severe customer facing issues eventually.

TSVD beats other approaches, including random, data collider, happened-before (hb) tracking. 96% of all violations were captured by running TSVD 50 times. And 50% violations were captured by running TSVD once! This beats other tools with little overhead.

One drawback to TSVD approach is that it may cause a false negative by adding the random delay. But when you run the tool multiple times, those missed false negatives are captured due to different random delays tried.

Yep, this paper definitely deserved a best paper award. It used three very interesting insights/heuristics to make the problem feasible/manageable, and then built a tool using these insights, and showed exhaustive evaluations of this tool. 

Sunday, November 3, 2019

SOSP19 Day 1, Blockchain session

The second session on Day 1 was on blockchains. There were three papers on this session.

Teechain: A Secure Payment Network with Asynchronous Blockchain Access

This paper is by Joshua Lind (Imperial College London), Oded Naor (Technion), Ittay Eyal (Technion), Florian Kelbert (Imperial College London), Emin Gun Sirer (Cornell University), Peter Pietzuch (Imperial College London).

Bitcoin's throughput is a measly 4 transactions per second as limited by Nakamoto consensus. This leads to work on off-chain scaling via payment networks, e.g. lightning network.  The payment network consists of nodes that establish pairwise connections. To establish a multihop connection these pairwise connections are utilized as intermediaries. There are three phases to a payment network: setup a multihop connection, process payments for some time, perform a settlement where the final balance is written back to the blockchain.

To guard against a roll-back attack and ensure that the correct balance is written, the blockchain is used as the root of trust. If a node misbehaves,  writes an incorrect amount, the other reacts within delta time, corrects by providing proof of lightning transactions, resolves the situation, and  gets incentive for it.

The problem with this is that the reaction time delta requires synchronous access. Spam/congestion attacks can make some transactions take more than 7 days to be written to the chain. So what should be the appropriate value of delta, the reaction time? Security of the payment channel should not rely on read/write latencies.

To address this issue, the paper proposes Teechain, the first asynchronous blockchain access payment network. Teechain removes the blockchain as root-of trust by introducing another root of trust, a treasury. Treasury controls funds, balances, and payments. But how do we realize treasuries for blockchains, avoiding centralization and trust?

Teechain uses committees for realizing treasuries for blockchains. A treasury committee consists of n parties in the network, and requires m out of n parties to agree before accessing funds. But how large should m and n be? Large committees are problematic for scaling. To reduce the size of committees, the paper proposes trusted executions, such as the Intel SGX enclave. The enclaves guards against software hardware attacks, and uses trusted execution environments (tees) to secure committee members. The paper does not take tees as the entire solution, because some attacks, like foreshadow could still be possible with tees. Instead, the paper uses tees to increase the attack costs and combines tees with treasury committees to solve the problem securely.

OK, how does the treasury committee maintain agreement? Chain replication is used for propagating the transactions to the committee. Even though the committee members use tees, they do not rule out that some nodes can still be byzantine, so there is an m-out-of-n agreement at the end. If chain replication is applied naively, some attacks are possible. So Teechain uses a variant called force-freeze chain replication, where if the chain configuration is changed or if there is a fault, the decision is an abort decision, and the state is dumped.

The paper includes evaluations to show that Teechain scales out. Evaluations were done using a complete graph and a hub-spoke graph. For committee sizes n=3 or 4 are used, and it was shown that 1 million tx/sec are possible using 30 machines.

Teechain is available as opensource from

Fast and Secure Global Payments with Stellar

This paper is by Marta Lokhava (Stellar), Giuliano Losa (Galois), David Mazières (Stanford), Graydon Hoare (Stellar), Nicolas Barry (Stellar), Eliezer Gafni (UCLA), Jonathan Jove (Stellar), Rafał Malinowski (Stellar), Jed McCaleb (Stellar).

In 2018, I had written a review of the stellar arxiv paper I read on Stellar. This paper extends that with formal verification, evaluation, and lessons learned from several years in deployment.

However, the presentation was not technical. It was a very high level presentation that avoided the description of the protocol and formal verification. I think the presenter, David, must have done this presentation hundred times before to VCs because the presentation was pitch perfect. Every criticism possible about the protocol was proactively flipped and explained as an advantage. Here is how the presentation went.

Things we take for granted, such as a bank account in a stable currency, access to well-regulated investments, cheap international money transfers, globally accepted fee-free credit cards, are not available in many places. Stellar provides more equitable access to assets. It is the first solution to provide:
  1. open membership
  2. issuer-enforced finality: still need secure servers, but issuer owns or designates them
  3. cross-issuer atomicity
The Stellar transaction model is based on replicated state machines (RSMs). Each RSM executes transactions to keep ledger state. Transactions guarantee atomicity. But now that the RSM is distributed, how does Stellar guarantee ledger integrity? Stellar uses a shared RSM. The idea is to follow the graph transitively until it converges. The hypothesis is that any two nodes transitively follow a common node. This holds true for Internet with its hierarchical domains.

The Byzantine agreement in Stellar follows from this hypothesis. The key idea is that the broadcast protocol steps are conditioned on other nodes' steps: take the step if all nodes are mutually satisfied. For availability must generalize follows to a sets of peers, called quorum slices. A quorum is a set of nodes that contains one slice from nonfaulty server.

Stellar has top tier, middle tier, leaf tier servers. As in Internet, no central authority appoints the top tier. The production network has been running for 4 years.

In the Q&A period, one question was: "How are you dealing with dynamic reconfiguration of quorums?" David said "we get it for free, and can unilaterally change quorum slices at any time". But this answer is not clear to me because reconfiguration could reduce your safety. As far as I remember from the Stellar arxiv report, the onus is on the user to figure out that her quorum slices are set up correctly. So the reconfiguration should be more involved than that.

Another question was: "Can you do smartcontracts where state lies in different slices?" David claimed it is possible, but again without reading the paper I don't understand this answer. The state could be partitioned (and not fully replicated) across all nodes, and ordering across all slices may be involved.

Notary: A Device for Secure Transaction Approval

This paper is by Anish Athalye (MIT CSAIL), Adam Belay (MIT CSAIL), Frans Kaashoek (MIT CSAIL), Robert Morris (MIT CSAIL), Nickolai Zeldovich (MIT CSAIL).

Smartphones suffer from bugs and attacks. So hardware wallets (or cryptocurrency wallets) are adopted for transaction approval for critical financial transactions. The ledger app store contains 50+ third party apps. Unfortunately, existing hardware wallets also have OS bugs and potential hardware bugs as well.

The contribution in this paper is to introduce "notary", a device for secure transaction approval. Notary uses an agent separation architecture. And the authors have developed a physical hardware wallet prototype for notary.

The separation architecture provides isolation: there is a kernel system-on-chip (soc) and a separate agent soc. The kernel soc does not run any third party code. It is the agent soc that runs third-party code. The agent soc does not have access to OS. The agent also does not have full access to hardware, as it lacks access to the persistent storage.

There are only 2 wires across the kernel soc and agent soc: UART and RST. Using RST, the kernel can reset the agent soc. More specifically, the kernel clears the state in agent soc after every transaction. This way notary uses clean-state deterministic start to ensure noninterference across transactions, and avoid any bugs/attacks.

Verilog is used for verifying that notary clears registers under any path. SMT-compatible format is used for symbolic circuit simulation. They developed a RISC-V based prototype, and evaluated with two agents: Bitcoin and web-app approval.

Friday, November 1, 2019

SOSP19 Day 1, machine learning session

In this post, I cover the papers that appeared in the first session of the conference: machine learning. (Here is my account of Day 0 and the opening remarks if you are interested.)

I haven't read any of these papers yet and I go by my understanding of these papers from their presentations. I might have misunderstood some parts. The good news is all of these papers are available as open access, and I include links to the papers in my notes. Please check the papers when in doubt about my notes.

The machine learning session contained four papers. I found all of them very interesting. They applied principled systems design techniques to machine learning and provided results that have broader applicability than a single application. I wanted to enter the machine learning research area three years ago. But I was unsuccessful and concluded that the area is not very amenable for doing principled systems work. It looks like I had admitted defeat prematurely. After seeing these papers in this section, I am excited again for the fusion and synergy of machine learning and principled distributed systems areas.

PipeDream: Generalized Pipeline Parallelism for DNN Training

This paper is by Deepak Narayanan (Stanford University), Aaron Harlap (Carnegie Mellon University), Amar Phanishayee (Microsoft Research), Vivek Seshadri (Microsoft Research), Nikhil R. Devanur (Microsoft Research), Gregory R. Ganger (CMU), Phillip B. Gibbons (Carnegie Mellon University), Matei Zaharia (Stanford University).

There are two prevalent approaches to parallelizing DNN training: data parallelism and model parallelism. This paper proposes pipeline-parallel training that combines data and model parallelism with pipelining. In this approach, the model is first divided into sequential stages, and then these stages are pipelined over the workers. To optimize the pipelining over the workers, the bottleneck stages are identified, and those stage are suitably data parallelized across multiple workers to prevent the pipeline from stalling, waiting for results from previous stages.

The authors identify three big challenges to realize this idea and develop techniques for addressing these challenges.
  • Partitioning and load balancing operators across workers: They developed a profiler and optimizer to load balance computing and reduce communication. 
  • Scheduling of forward and backward passes of different inputs: They use a bi-directional pipeline, where an input minibatch proceeds through the computation pipeline first forward and then backward. Each active minibatch in the pipeline may be in a different stage, either in the forward pass or backward pass. Once in steady state, each stage alternates between performing
  • its forward pass for a minibatch and its backward pass for an earlier minibatch. This one-forward-one-backward (1F1B) scheduling ensures that every GPU is occupied with a minibatch in a balanced pipeline, with each stage producing outputs in aggregate at roughly the same rate.  
  • Managing weights and activation versions for effective learning: The presenter mentioned that naive pipelining leads to weight version mismatches. To prevent this, they store multiple <weight, activation> versions, which they call stashed weights.

They integrated pipedream with PyTorch using 3000 lines of Python code. They hooked to PyTorch's communication library. The project is opensource and accessible at

Their evaluation results show that pipedream can provide 5 times faster training than data parallel training. The reason for speedup is pipedream reduces communication among workers, and can achieve up to a magnitude smaller communication than incurred by data parallelism.

There were some good questions from the audience following the talk.
  • "Pipedream is evaluated on models that are sequential. What about models that are branched, where multiple things need to finish before the next phase?" The presenter answered that the techniques/setup in pipedream can generalize to handle them but also added that most models are sequential.
  • "What about pipedream's memory footprint?" The presenter said that they are looking for to reduce this.
  • "As sparsity changes, it may be possible to do asynchronous training faster than synchronous training. Would it be possible to beat these results using asynchronous training rather than the synchronous training pipedream performs?" I don't have notes for the answer, but I think the answer is that for DNN training synchronous training is the method that works best. 

A Generic Communication Scheduler for Distributed DNN Training Acceleration

This paper is by  Yanghua Peng (The University of Hong Kong), Yibo Zhu (ByteDance Inc.), Yangrui Chen (The University of Hong Kong), Yixin Bao (The University of Hong Kong), Bairen Yi (ByteDance Inc.), Chang Lan (ByteDance Inc.), Chuan Wu (The University of Hong Kong), Chuanxiong Guo (ByteDance Inc.).

The paper presents a scheduler for deep learning training, so this has some relevance to the pipedream paper as well. This paper focuses only on data parallelism for scaling out training and shows how to optimize it.

The FIFO scheduling strategy does not overlap communication with computation well. Although there has been work to improve scheduling, such as p3 and tictac, these were limited because they are coupled with specific framework implementations, MxNet and Tensorflow, respectively. In contrast this work presents a generic tensor scheduling framework, Bytescheduler, which was implemented and evaluated for MxNet, PyTorch, and Tensorflow. The project is available as opensource at

The basic insights in this work are:
  • Communication of former layers of a neural network has higher priority and can preempt communication of latter layers
  • It is beneficial to partition large layers and merge small layers
This work uses bayesian optimization for auto tuning for partitioning and scheduling control. The presenter mentioned a technique called "dependency proxy" for geting the scheduling control right. The presenter also mentioned that Bytescheduler adapts to different bandwidths, and  different environments and tasks.

Parity Models: Erasure-Coded Resilience for Prediction Serving Systems

This work is by Jack Kosaian (Carnegie Mellon University), K. V. Rashmi (Carnegie Mellon University), Shivaram Venkataraman (University of Wisconsin-Madison).

Tail latency in inference serving is a problem, which results in to missed deadlines. This work shows how to use erasure codes for reducing tail latency in ML inference. If one replica is slow, this work prescribes using  the parity model and decoding it quickly to make the deadline.

In this setup the inference servers have multiple identical models, and a query needs to talk to k of them. The system jointly codes queries from different users and sends to different inference servers and a parity server, which holds the parity model. In case an answer is missing, the system decodes the results of inference servers and the parity server to reconstruct/approximate the missing answer and still make the deadline. The reconstructed output only comes into play when the original predictions are slow or fail. To meet the deadline, fast encoding and decoding is needed and the key for this is the design of the parity model.

The challenge here is that, while  handcrafting erasure codes is straightforward for linear operations, it is hard to achieve for neural networks which are nonlinear and complex. To solve this problem, the authors apply a learning based approach to achieve erasure coded resilience for NNs. This approach reconstruct approximations which is appropriate for machine learning inference.

The code used for training and evaluating parity models is available at The paper showcases parity models in the presence of resource contention, and includes extensive evaluation.

One of the questions for the presenter was "For tasks that are mission critical such as self-driving cars, would this modal accuracy difference from the parity model be confusing?" The answer is yes, it could be. So this is may be more suitable for ad serving like applications, rather than mission critical applications.  I personally think the technique here is more impressive than the problem solved. The parity models idea bring benefits of erasure codes to ML inference, and this technique should be applicable and customizable/specializable to a rich set of problems in the domain. I was wondering if this could be applicable to decomposing two images from a given stereoscopic image.

Jack, a third year PhD student at CMU, presented this paper. He gave one of the best presentations of the conference, with confidence, great stage presence, and great communication skills. I later learned that this was his first presentation at any conference ever.

TASO: Optimizing Deep Learning Computation with Automated Generation of Graph Substitutions

This paper is by Zhihao Jia (Stanford University), Oded Padon (Stanford University), James Thomas (Stanford University), Todd Warszawski (Stanford University), Matei Zaharia (Stanford University), Alex Aiken (Stanford Univeristy).

Existing rule-based DNN optimizations rewrite/substitute a graph with a more efficient version. In Tensorflow this is implemented by 2000 rewrite rules in 53K lines of code. Unfortunately, with this approach, addition of new operators and graph structures require escalation in the number of rules. Moreover, these heuristics do not apply for all DNN hardware. The rewrite rules miss subtle optimizations for specific DNNs and hardware. Different hardware need different optimizations, one optimization does not work for all. (To motivate the hardware dependence of the optimizations, the presentation gave an example with TASO where the final graph is 30% faster on V100 but 10% slower on K80).

To address these pain points in optimizing deep learning, this work proposes TASO, a tensor algebra super-optimizer. TASO replaces manually designed graph optimizations with automated generation and verification. TASO can then be used to feasibly produce optimizations that are based on hardware backend, rather than generic general rules. While TensorFlow currently contains approximately 53,000 lines of manual optimization rules, the operator specifications needed by TASO are only 1,400 lines of code.

The challenge in developing TASO is two folds: how to generate potential substitutions and how to verify their correctness.

There are 66 Million graphs with up to 4 operators. To make this number manageable, the graph substitution generator computes output fingerprints and considers pairs of graphs with identical output fingerprints. It finds 29K substitutions, out of which only 743 substitutions remain after applying pruning to eliminate redundant substitutions. These 743 substitutions are generated in 5 minutes, and verified against 43 operators in 10 minutes. This is done per hardware per operator, so supporting a new operator requires a few hours of the engineer providing the operator specifications.

In sum, TASO provides less engineering effort, better performance, and formal verification for optimizing deep learning computation graphs. The code for TASO is available on github at

Thursday, October 31, 2019

SOSP'19 Day 0

I attended SOSP this week. SOSP is the premier conference on systems where 500+ researchers follow 30+ paper presentations in a single-track event over three days. SOSP is held every two years --on odd years, alternating with OSDI which is held on the even years.

This was my third SOSP. I had been to SOSP 2009 at Big Sky Montana and SOSP 2013 at Nemacolin Pennsylvania. It is always a treat to attend SOSP. I get to learn new ideas and meet with interesting people. SOSP never fails to impress me with the quality of papers and presentations. I attended almost all of the sessions on Monday and Tuesday and took a lot of notes. I will be releasing these notes over the next 4-5 posts as I find some free time to edit and post them. You can see the entire program here and download the papers freely as open access.

Today, I will just talk about day 0 and the opening of day 1.

Driving to SOSP

Abutalib, a final year PhD student at CMU, had offered to give me a ride to the conference. At Buffalo, I was on his drive to the conference from Pittsburgh, PA to Huntsville, Ontario, Canada. Talib and his labmate/coauthor Michael came to pick me up at 2pm on Sunday. We ate lunch at my house, and left at 3pm. We had 4 hours of driving from Buffalo to Huntsville.  We reached the Canadian border in 30 minutes, there were no lines, and the border check took less than a minute.

After the border, we had 3.5 hours of driving left. Talib had a paper on the conference titled "File Systems Unfit as Distributed Storage Backends: Lessons from 10 Years of Ceph Evolution". (Talib is on the academic job market this year, and he is an excellent candidate for you to hire.) I listened to Talib's presentation, then Talib retired to work more on his presentation. I then moved to the front seat and chatted with Michael for the rest of the drive. It was an interesting chat, and we didn't feel how the time passed.

Michael is a third year PhD student. We talked about CMU PhD course requirements. CMU requires just 6 courses for PhD, as they like the students to focus more on research and not get too distracted with many courses. On the other hand, CMU has a TAship requirement, which is not required in other schools. Regardless of their funding for PhD, the PhD students are required to do a TAship for two semesters. I think this is a good training for becoming faculty.

Michael had taken the distributed systems class, where they had an influential distributed systems paper to read every single day. He had also taken an advanced cloud computing course which covered topics like MapReduce, Spark, Yarn, etc. The course was co-taught by three instructors, who alternated covering certain topics. The course put a big emphasis on "cost", as that is an essential component of the cloud computing model. The students were given limited cloud credits and had to complete the three course projects within this budget. After all, an engineer is one who can do for a dollar what any fool can do for two.

We talked about the big data processing ecosystem. Spark seems to be unrivaled for big data processing. But there is still a gap in distributed machine learning. Tensorflow is not easy to use for distributed machine learning, and Pytorch may be grabbing a good ground there. Michael's PhD research will focus on using efficient compression algorithms for machine learning and DNNs, and briefly chatted about work on those topics.

We talked a lot about blockchains and their value proposition. I think I yakked a lot about Ben-Or, Texel, and decentralized consensus in blockchains.  We talked about TLA+, and its value proposition.

Day 0 evening

We were at the DeerHurst resort, Huntsville Ontario, by 7:30pm. The SOSP registration desk was still open, so we picked our SOSP badges, and walked to the hotel reception to check in. While checking in, I ran into Robbert Van Rennesse, and we talked about his work on Texel consensus for a while.

After I got to my room and drop my bags, I went down to the lobby because there were many SOSP attendees chatting there. It seemed like we had the entire hotel to ourselves, and 500+ geeks would be the main event for the coming three days.

I saw Marcos and Indranil whom I know before, so I joined in their conversation. Mahesh from Facebook was also there. So I had already found  distributed systems folks in the conference. We chatted for more than thirty minutes, standing in the lobby. We talked about Paxos variants. Both Marcos and Mahesh are working with a product group, so we talked about how great engineers enrich projects. We talked about California power outages. We talked about how hard it is to get a paper accepted at SOSP, and how hard it is to receive and process negative reviews, especially bad and unfair ones. We talked about the need for a dedicated distributed *systems* conference.

Day 1 SOSP opening

After breakfast, SOSP'19 kicked off at 8:45am with a short address from the general chairs, Tim Brecht (University of Waterloo) and Carey Williamson (University of Calgary). Then PC Chairs Remzi H. Arpaci-Dusseau (University of Wisconsin) and Yuanyuan Zhou (University of California San Diego) took the stage for brief remarks, thanks, and acknowledgment.

SOSP’19 received a total of 276 paper submissions, reviewed by the program committee (PC) of 55 reviewers. All papers received three reviews in the first round. Based on first round reviews, 154 papers were selected to proceed to the second round, and each then received additional reviews. In total more than 1200 reviews, totaling over 1 million words of feedback. Ultimately the conference accepted papers 38 papers, bringing the acceptance rate to 13%. The best paper awards for the conference were awarded to

This year SOSP organized for the first time artifact evaluation led by Profs. Baris Kasikci and Vijay Chidambaram, and organized by Supreeth Shastri. 23 of the 38 papers were submitted for artifact evaluation. 22 of these earned at least one page, 11 papers earned all badges, and 12 papers had all their results reproduced.

This year there was also a student mentorship program organized by Natacha Crooks and Malte Schwarzkopf. The program matched PhD students to postdoctoral research associates or faculty members and researchers in their research area of interest for meeting at the conference. Of course most PhD students already get mentorship from their advisors, but this program was valuable to show how the community is a welcoming one to the students.

Tuesday, October 22, 2019

Book review. A Mind at Play: How Claude Shannon Invented the Information Age

This book was published by Jimmy Soni and Rob Goodman in 2017. It is 385 pages long and informative.

This is a good book. Shannon is already an extremely interesting researcher, so the book is interesting. The writing is good, gets very good at some places, but it is not top-class writing. A master storyteller like Michael Lewis ( e.g., "Undoing Project"), Walter Isaacson, or Steven Levy would have made this book excellent. I guess the difference would be that these masters would put in an order of magnitude more research in to the subject, do dozens of interviews, and extensive archive search. They would also distill the story and build the book around a single strong theme with some side themes tying to that, and tell a much more engaging story. They would not leave a stone unturned. They would go the extra mile to point us to the insights they gathered, without explicitly showing them, but gently nudging us toward them to make us think we came up with those insights.

Claude Shannon is a giant. It won't be an overstatement to say Shannon is the father or digital era and information age. In his master's thesis at MIT, in 1937 when he was only 21 years old, he came up with the digital circuit design theory, demonstrating the electrical applications of Boolean algebra. In the book review for "Range: Why Generalists Triumph in a Specialized World", I had included this paragraph about Shannon.
[Shannon] launched the Information Age thanks to a philosophy course he took to fulfill a requirement at the University of Michigan. In it, he was exposed to the work of self-taught nineteenth-century English logician George Boole, who assigned a value of 1 to true statements and 0 to false statements and showed that logic problems could be solved like math equations. It resulted in absolutely nothing of practical importance until seventy years after Boole passed away, when Shannon did a summer internship at AT&T’s Bell Labs research facility.

For his PhD in 1940, under supervision of Vannevar Bush, Shannon developed a mathematical formulation for Mendelian genetics, called "An Algebra for Theoretical Genetics".

Then in 1948, eleven years later after his MS thesis, he founded information theory with a landmark paper, "A Mathematical Theory of Communication", with applications to digital communication, storage, and cryptography.

He has been very influential in cryptography (with the introduction of one-time pads), artificial intelligence (with his electromechanical mouse Theseus), chess playing computers, and in providing a mathematical theory of juggling, among other things.

The tragic part of this story is how it ends. After suffering progressive decline of Alzheimer's disease over more than 15 years, Shannon died at the age of 84, on February 24, 2001.

As is my custom in book reviews, here are some of my highlights from the book. The book has many more interesting facts and information about Shannon, so I recommend the book strongly if you want to learn more about Shannon.

Shannon's research approach

Of course, information existed before Shannon, just as objects had inertia before Newton. But before Shannon, there was precious little sense of information as an idea, a measurable quantity, an object fitted out for hard science. Before Shannon, information was a telegram, a photograph, a paragraph, a song. After Shannon, information was entirely abstracted into bits.

He was a man immune to scientific fashion and insulated from opinion of all kinds, on all subjects, even himself, especially himself; a man of closed doors and long silences, who thought his best thoughts in spartan bachelor apartments and empty office buildings.
It is a puzzle of his life that someone so skilled at abstracting his way past the tangible world was also so gifted at manipulating it. Shannon was a born tinkerer: a telegraph line rigged from a barbed-wire fence, a makeshift barn elevator, and a private backyard trolley tell the story of his small-town Michigan childhood. And it was as an especially advanced sort of tinkerer that he caught the eye of Vannevar Bush—soon to become the most powerful scientist in America and Shannon’s most influential mentor—who brought him to MIT and charged him with the upkeep of the differential analyzer, an analog computer the size of a room, “a fearsome thing of shafts, gears, strings, and wheels rolling on disks” that happened to be the most advanced thinking machine of its day.

Shannon’s study of the electrical switches directing the guts of that mechanical behemoth led him to an insight at the foundation of our digital age: that switches could do far more than control the flow of electricity through circuits—that they could be used to evaluate any logical statement we could think of, could even appear to "decide." A series of binary choices—on/off, true/false, 1/0—could, in principle, perform a passable imitation of a brain. That leap, as Walter Isaacson put it, “became the basic concept underlying all digital computers.” It was Shannon’s first great feat of abstraction. He was only twenty-one.
And yet Shannon proved that noise could be defeated, that information sent from Point A could be received with perfection at Point B, not just often, but essentially always. He gave engineers the conceptual tools to digitize information and send it flawlessly (or, to be precise, with an arbitrarily small amount of error), a result considered hopelessly utopian up until the moment Shannon proved it was not.
Having completed his pathbreaking work by the age of thirty-two, he might have spent his remaining decades as a scientific celebrity, a public face of innovation: another Bertrand Russell, or Albert Einstein, or Richard Feynman, or Steve Jobs. Instead, he spent them tinkering. An electronic, maze-solving mouse named Theseus. An Erector Set turtle that walked his house. The first plan for a chess-playing computer, a distant ancestor of IBM’s Deep Blue. The first-ever wearable computer. A calculator that operated in Roman numerals, code-named THROBAC (“Thrifty Roman-Numeral Backward-Looking Computer”). A fleet of customized unicycles. Years devoted to the scientific study of juggling.

Claude’s gifts were of the Einsteinian variety: a strong intuitive feel for the dimensions of a problem, with less of a concern for the step-by-step details. As he put it, “I think I’m more visual than symbolic. I try to get a feeling of what’s going on. Equations come later.” Like Einstein, he needed a sounding board, a role that Betty played perfectly. His colleague David Slepian said, “He didn’t know math very deeply. But he could invent whatever he needed.” Robert Gallager, another colleague, went a step further: “He had a weird insight. He could see through things. He would say, ‘Something like this should be true’ . . . and he was usually right. . . . You can’t develop an entire field out of whole cloth if you don’t have superb intuition.”

I had what I thought was a really neat research idea, for a much better communication system than what other people were building, with all sorts of bells and whistles. I went in to talk to him [Shannon] about it and I explained the problems I was having trying to analyze it. And he looked at it, sort of puzzled, and said, “Well, do you really need this assumption?” And I said, well, I suppose we could look at the problem without that assumption. And we went on for a while. And then he said, again, “Do you need this other assumption?” And I saw immediately that that would simplify the problem, although it started looking a little impractical and a little like a toy problem. And he kept doing this, about five or six times. I don’t think he saw immediately that that’s how the problem should be solved; I think he was just groping his way along, except that he just had this instinct of which parts of the problem were fundamental and which were just details. At a certain point, I was getting upset, because I saw this neat research problem of mine had become almost trivial. But at a certain point, with all these pieces stripped out, we both saw how to solve it. And then we gradually put all these little assumptions back in and then, suddenly, we saw the solution to the whole problem. And that was just the way he worked. He would find the simplest example of something and then he would somehow sort out why that worked and why that was the right way of looking at

“What’s your secret in remaining so carefree?” an interviewer asked Shannon toward the end of his life. Shannon answered, “I do what comes naturally, and usefulness is not my main goal. I keep asking myself, How would you do this? Is it possible to make a machine do that? Can you prove this theorem?” For an abstracted man at his most content, the world isn’t there to be used, but to be played with, manipulated by hand and mind. Shannon was an atheist, and seems to have come by it naturally, without any crisis of faith; puzzling over the origins of human intelligence with the same interviewer, he said matter-of-factly, “I don’t happen to be a religious man and I don’t think it would help if I were!” And yet, in his instinct that the world we see merely stands for something else, there is an inkling that his distant Puritan ancestors might have recognized as kin.

University of Michigan

A’s in math and science and Latin, scattered B’s in the rest: the sixteen-year-old high school graduate sent his record off to the University of Michigan, along with an application that was three pages of fill-in-the-blanks, the spelling errors casually crossed out.
Engineering’s rising profile began to draw the attention of deans in other quarters of the university, and disciplinary lines began to blur. By the time Shannon began his dual degrees in mathematics and engineering, a generation later, the two curricula had largely merged into one.


The job—master’s student and assistant on the differential analyzer at the Massachusetts Institute of Technology—was tailor-made for a young man who could find equal joy in equations and construction, thinking and building. "I pushed hard for that job and got it."
It was Vannevar Bush who brought analog computing to its highest level, a machine for all purposes, a landmark on the way from tool to brain. And it was Claude Shannon who, in a genius accident, helped obsolete it.
In Michigan, Shannon had learned (in a philosophy class, no less) that any statement of logic could be captured in symbols and equations—and that these equations could be solved with a series of simple, math-like rules. You might prove a statement true or false without ever understanding what it meant. You would be less distracted, in fact, if you chose not to understand it: deduction could be automated. The pivotal figure in this translation from the vagaries of words to the sharpness of math was a nineteenth-century genius named George Boole, a self-taught English mathematician whose cobbler father couldn’t afford to keep him in school beyond the age of sixteen.

Finished in the fall of 1937, Shannon’s master’s thesis, “A Symbolic Analysis of Relay and Switching Circuits,” was presented to an audience in Washington, D.C., and published to career-making applause the following year.
A leap from logic to symbols to circuits: “I think I had more fun doing that than anything else in my life,” Shannon remembered fondly. An odd and wonkish sense of fun, maybe—but here was a young man, just twenty-one now, full of the thrill of knowing that he had looked into the box of switches and relays and seen something no one else had. All that remained were the details. In the years to come, it would be as if he forgot that publication was something still required of brilliant scientists; he’d pointlessly incubate remarkable work for years, and he’d end up in a house with an attic stuffed with notes, half-finished articles, and “good questions” on ruled paper. But now, ambitious and unproven, he had work pouring out of him.
Armed with these insights, Shannon spent the rest of his thesis demonstrating their possibilities. A calculator for adding binary numbers; a five-button combination lock with electronic alarm—as soon as the equations were worked out, they were as good as built. Circuit design was, for the first time, a science. And turning art into science would be the hallmark of Shannon’s career.

Vannevar Bush

More to the point, it was a matter of deep conviction for Bush that specialization was the death of genius. “In these days, when there is a tendency to specialize so closely, it is well for us to be reminded that the possibilities of being at once broad and deep did not pass with Leonardo da Vinci or even Benjamin Franklin,” Bush said in a speech at MIT.
Somewhere on the list of Vannevar Bush’s accomplishments, then, should be his role in killing American eugenics. As president of the Carnegie Institution of Washington, which funded the Eugenics Record Office, he forced its sterilization-promoting director into retirement and ordered the office to close for good on December 31, 1939.
But the poison tree bore some useful fruit. (Few scientists had compiled better data on heredity and inheritance than eugenicists.) And Shannon was there, in its last months, to collect what he could of it [for his PhD thesis titled "An Algebra for Theoretical Genetics".]

Bell Labs

If Google’s “20 percent time”—the practice that frees one-fifth of a Google employee’s schedule to devote to blue-sky projects—seems like a West Coast indulgence, then Bell Labs’ research operation, buoyed by a federally approved monopoly and huge profit margins, would appear gluttonous by comparison.
Bell researchers were encouraged to think decades down the road, to imagine how technology could radically alter the character of everyday life, to wonder how Bell might “connect all of us, and all of our new machines, together.” One Bell employee of a later era summarized it like this: “When I first came there was the philosophy: look, what you’re doing might not be important for ten years or twenty years, but that’s fine, we’ll be there then.”
Claude Shannon was one of those who thrived. Among the institutions that had dotted the landscape of Shannon’s life, it’s hard to imagine a place better suited to his mix of passions and particular working style than the Bell Laboratories of the 1940s. “I had freedom to do anything I wanted from almost the day I started,” he reflected. “They never told me what to work on.”

War time           

Things were moving fast there, and I could smell the war coming along. And it seemed to me I would be safer working full-time for the war effort, safer against the draft, which I didn’t exactly fancy. I was a frail man, as I am now... I was trying to play the game, to the best of my ability. But not only that, I thought I’d probably contribute a hell of a lot more.

“I think he did the work with that fear in him, that he might have to go into the Army, which means being with lots of people around which he couldn’t stand. He was phobic about crowds and people he didn’t know.”
If anything, his reaction to the war work was quite the opposite: the whole atmosphere left a bitter taste. The secrecy, the intensity, the drudgery, the obligatory teamwork—all of it seems to have gotten to him in a deeply personal way. Indeed, one of the few accounts available to us, from Claude’s girlfriend, suggests that he found himself largely bored and frustrated by wartime projects, and that the only outlet for his private research came on his own time, late at night. “He said he hated it, and then he felt very guilty about being tired out in the morning and getting there very late. . . . I took him by the hand and sometimes I walked him to work—that made him feel better.” It’s telling that Shannon was reluctant, even decades later, to talk about this period in any kind of depth, even to family and friends. In a later interview, he would simply say, with a touch of disappointment in his words, that “those were busy times during the war and immediately afterwards and [my research] was not considered first priority work.” This was true, it appears, even at Bell Labs, famously open-minded though it may have been.
As in other areas of Shannon’s life, his most important work in cryptography yielded a rigorous, theoretical underpinning for many of a field’s key concepts. This paper, “A Mathematical Theory of Cryptography—Case 20878,” contained important antecedents of Shannon’s later work—but it also provided the first-ever proof of a critical concept in cryptology: the “one-time pad.”

According to Turing’s biographer, Andrew Hodges, Shannon and Turing met daily over tea, in public, in the conspicuously modest Bell Labs cafeteria.
Shannon, for his part, was amazed by the quality of Turing’s thinking. “I think Turing had a great mind, a very great mind,” Shannon later said.

Information theory                

On the evenings he was at home, Shannon was at work on a private project. It had begun to crystallize in his mind in his graduate school days. He would, at various points, suggest different dates of provenance. But whatever the date on which the idea first implanted itself in his mind, pen hadn’t met paper in earnest until New York and 1941. Now this noodling was both a welcome distraction from work at Bell Labs and an outlet to the deep theoretical work he prized so much, and which the war threatened to foreclose.
The work wasn’t linear; ideas came when they came. “These things sometimes... one night I remember I woke up in the middle of the night and I had an idea and I stayed up all night working on that.” To picture Shannon during this time is to see a thin man tapping a pencil against his knee at absurd hours. This isn’t a man on a deadline; it’s something more like a man obsessed with a private puzzle, one that is years in the cracking. “He would go quiet, very very quiet. But he didn’t stop working on his napkins,” said Maria. “Two or three days in a row. And then he would look up, and he’d say, ‘Why are you so quiet?’ ”
The real measure of information is not in the symbols we send—it’s in the symbols we could have sent, but did not. To send a message is to make a selection from a pool of possible symbols, and “at each selection there are eliminated all of the other symbols which might have been chosen.”
The information value of a symbol depends on the number of alternatives that were killed off in its choosing. Symbols from large vocabularies bear more information than symbols from small ones. Information measures freedom of choice.
Shannon proposed an unsettling inversion. Ignore the physical channel and accept its limits: we can overcome noise by manipulating our messages. The answer to noise is not in how loudly we speak, but in how we say what we say.
Shannon showed that the beleaguered key-tappers in Ireland and Newfoundland had essentially gotten it right, had already solved the problem without knowing it. They might have said, if only they could have read Shannon’s paper, “Please add redundancy.” In a way, that was already evident enough: saying the same thing twice in a noisy room is a way of adding redundancy, on the unstated assumption that the same error is unlikely to attach itself to the same place two times in a row. For Shannon, though, there was much more. Our linguistic predictability, our congenital failure to maximize information, is actually our best protection from error.
The information theorist Sergio Verdú offered a similar assessment of Shannon’s paper: “It turned out that everything he claimed essentially was true. The paper was weak in what we now call ‘converses’ . . . but in fact, that adds to his genius rather than detracting from it, because he really knew what he was doing.” In a sense, leaving the dots for others to connect was a calculated gamble on Shannon’s part: had he gone through that painstaking work himself, the paper would have been much longer and appeared much later.


Theseus was propelled by a pair of magnets, one embedded in its hollow core, and one moving freely beneath the maze. The mouse would begin its course, bump into a wall, sense that it had hit an obstacle with its “whiskers,” activate the right relay to attempt a new path, and then repeat the process until it hit its goal, a metallic piece of cheese. The relays stored the directions of the right path in “memory”: once the mouse had successfully navigated the maze by trial and error, it could find the cheese a second time with ease. Appearances to the contrary, Theseus the mouse was mainly the passive part of the endeavor: the underlying maze itself held the information and propelled Theseus with its magnet. Technically, as Shannon would point out, the mouse wasn’t solving the maze; the maze was solving the mouse. Yet, one way or another, the system was able to learn.
Shannon would later tell a former teacher of his that Theseus had been “a demonstration device to make vivid the ability of a machine to solve, by trial and error, a problem, and remember the solution.” To the question of whether a certain rough kind of intelligence could be “created,” Shannon had offered an answer: yes, it could. Machines could learn. They could, in the circumscribed way Shannon had demonstrated, make mistakes, discover alternatives, and avoid the same missteps again. Learning and memory could be programmed and plotted, the script written into a device that looked, from a certain perspective, like an extremely simple precursor of a brain. The idea that machines could imitate humans was nothing new.
In a life of pursuits adopted and discarded with the ebb and flow of Shannon’s promiscuous curiosity, chess remained one of his few lifelong pastimes. One story has it that Shannon played so much chess at Bell Labs that “at least one supervisor became somewhat worried.” He had a gift for the game, and as word of his talent spread throughout the Labs, many would try their hand at beating him. “Most of us didn’t play more than once against him,” recalled Brockway McMillan.
He went on: “you can make a thing that is smarter than yourself. Smartness in this game is made partly of time and speed. I can build something which can operate much faster than my neurons.”
I think man is a machine. No, I am not joking, I think man is a machine of a very complex sort, different from a computer, i.e., different in organization. But it could be easily reproduced—it has about ten billion nerve cells. And if you model each one of these with electronic equipment it will act like a human brain. If you take [Bobby] Fischer’s head and make a model of that, it would play like Fischer.

MIT professorship

MIT made the first move: in 1956, the university invited one of its most famous alumni, Claude Shannon, to spend a semester back in Cambridge as a visiting professor. Returning to his graduate school haunts had something of a revivifying effect on Claude, as well as Betty. For one thing, the city of Cambridge was a bustle of activity compared to the comparatively sleepy New Jersey suburbs. Betty remembered it as an approximation of their Manhattan years, when going out to lunch meant stepping into the urban whirl. Working in academia, too, had its charms. “There is an active structure of university life that tends to overcome monotony and boredom,” wrote Shannon. “The new classes, the vacations, the various academic exercises add considerable variety to the life here.” Reading those impersonal lines, one might miss the implication that Shannon himself had grown bored.

I am having a very enjoyable time here at MIT. The seminar is going very well but involves a good deal of work. I had at first hoped to have a rather cozy little group of about eight or ten advanced students, but the first day forty people showed up, including many faculty members from M.I.T., some from Harvard, a number of doctorate candidates, and quite a few engineers from Lincoln Laboratory. . . . I am giving 2 one and a half hour sessions each week, and the response from the class is exceptionally good. They are almost all following it at 100 percent. I also made a mistake in a fit of generosity when I first came here of agreeing to give quite a number of talks at colloquia, etc., and now that the days are beginning to roll around, I find myself pretty pressed for time.
In a lecture titled “Reliable Machines from Unreliable Components,” Shannon presented the following challenge: “In case men’s lives depend upon the successful operation of a machine, it is difficult to decide on a satisfactorily low probability of failure, and in particular, it may not be adequate to have men’s fates depend upon the successful operation of single components as good as they may be.” What followed was an analysis of the error-correcting and fail-safe mechanisms that might resolve such a dilemma.

When an offer came for a full professorship and a permanent move to Massachusetts, it was hard to decline. If he accepted, Shannon would be named a Professor of Communication Sciences, and Professor of Mathematics, with permanent tenure, effective January 1, 1957, with a salary of $17,000 per year (about $143,000 in 2017).
After accepting the MIT offer, the Shannons left for Cambridge via California—a year-long detour for a fellowship at Stanford’s Center for Advanced Study in the Behavioral Sciences. Prestigious as the appointment was, the Shannons mainly treated it as an excuse to see the country. They made the leisurely drive through the West’s national parks to California, and back, in a VW bus.
Before setting off for the West, though, Claude and Betty purchased a house at 5 Cambridge Street in Winchester, Massachusetts, a bedroom community eight miles north of MIT. Once their California year was complete, they returned to their new home. In Winchester, the Shannons were close enough to campus for a quick commute but far enough away to live an essentially private life. They were also living in a piece of history—an especially appropriate one in light of Shannon’s background and interests.
The house would figure prominently in Shannon’s public image. Nearly every story about him, from 1957 on, situated him at the house on the lake—usually in the two-story addition that the Shannons built as an all-purpose room for gadget storage and display, a space media profiles often dubbed the “toy room,” but which his daughter Peggy and her two older brothers simply called “Dad’s room.” The Shannons gave their home a name: Entropy House. Claude’s status as a mathematical luminary would make it a pilgrimage site for students and colleagues, especially as his on-campus responsibilities dwindled toward nothing.
Even at MIT, Shannon bent his work around his hobbies and enthusiasms. “Although he continued to supervise students, he was not really a co-worker, in the normal sense of the term, as he always seemed to maintain a degree of distance from his fellow associates,” wrote one fellow faculty member. With no particular academic ambitions, Shannon felt little pressure to publish academic papers. He grew a beard, began running every day, and stepped up his tinkering. What resulted were some of Shannon’s most creative and whimsical endeavors. There was the trumpet that shot fire out of its bell when played. The handmade unicycles, in every permutation: a unicycle with no seat; a unicycle with no pedals; a unicycle built for two. There was the eccentric unicycle: a unicycle with an off-center hub that caused the rider to move up and down while pedaling forward and added an extra degree of difficulty to Shannon’s juggling. (The eccentric unicycle was the first of its kind. Ingenious though it might have been, it caused Shannon’s assistant, Charlie Manning, to fear for his safety—and to applaud when he witnessed the first successful ride.) There was the chairlift that took surprised guests down from the house’s porch to the edge of the lake. A machine that solved Rubik’s cubes. Chess-playing machines. Handmade robots, big and small. Shannon’s mind, it seems, was finally free to bring its most outlandish ideas to mechanical life. Looking back, Shannon summed it all up as happily pointless: “I’ve always pursued my interests without much regard to financial value or value to the world. I’ve spent lots of time on totally useless things.” Tellingly, he made no distinction between his interests in information and his interests in unicycles; they were all moves in the same game.
One professor, Hermann Haus, remembered a lecture of his that Shannon attended. “I was just so impressed,” Haus recalled, “he was very kind and asked leading questions. In fact, one of those questions led to an entire new chapter in a book I was writing.”
He was not the sort of person who would give a class and say “this was the essence of such and such.” He would say, “Last night, I was looking at this and I came up with this interesting way of looking at it.” He’d say
Shannon became a whetstone for others’ ideas and intuitions. Rather than offer answers, he asked probing questions; instead of solutions, he gave approaches. As Larry Roberts, a graduate student of that time, remembered, “Shannon’s favorite thing to do was to listen to what you had to say and then just say, ‘What about...’ and then follow with an approach you hadn’t thought of. That’s how he gave his advice.” This was how Shannon preferred to teach: as a fellow traveler and problem solver, just as eager as his students to find a new route or a fresh approach to a standing puzzle.
Even with his aversion to writing things down, the famous attic stuffed with half-finished work, and countless hypotheses circulating in his mind—and even when one paper on the scale of his “Mathematical Theory of Communication” would have counted as a lifetime’s accomplishment—Shannon still managed to publish hundreds of pages’ worth of papers and memoranda, many of which opened new lines of inquiry in information theory. That he had also written seminal works in other fields—switching, cryptography, chess programming—and that he might have been a pathbreaking geneticist, had he cared to be, was extraordinary.

Stock market           

By then the family had no need of the additional income from stock picking. Not only was there the combination of the MIT and Bell Labs pay, but Shannon had been on the ground floor of a number of technology companies. One former colleague, Bill Harrison, had encouraged Shannon to invest in his company, Harrison Laboratories, which was later acquired by Hewlett-Packard. A college friend of Shannon, Henry Singleton, put Shannon on the board of the company he created, Teledyne, which grew to become a multibillion-dollar conglomerate. As Shannon retold the story, he made the investment simply because “I had a good opinion of him.”

The club benefited from Shannon as well, in his roles as network node and informal consultant. For instance, when Teledyne received an acquisition offer from a speech recognition company, Shannon advised Singleton to turn it down. From his own experience at the Labs, he doubted that speech recognition would bear fruit anytime soon: the technology was in its early stages, and during his time at the Labs, he’d seen much time and energy fruitlessly sunk into it. The years of counsel paid off, for Singleton and for Shannon himself: his investment in Teledyne achieved an annual compound return of 27 percent over twenty-five years.
The stock market was, in some ways, the strangest of Shannon’s late-life enthusiasms. One of the recurrent tropes of recollections from family and friends is Shannon’s seeming indifference to money. By one telling, Shannon moved his life savings out of his checking account only when Betty insisted that he do so. A colleague recalled seeing a large uncashed check on Shannon’s desk at MIT, which in time gave rise to another legend: that his office was overflowing with checks he was too absentminded to cash. In a way, Shannon’s interest in money resembled his other passions. He was not out to accrue wealth for wealth’s sake, nor did he have any burning desire to own the finer things in life. But money created markets and math puzzles, problems that could be analyzed and interpreted and played out.

Two-phase commit and beyond

In this post, we model and explore the two-phase commit protocol using TLA+. The two-phase commit protocol is practical and is used in man...