Wednesday, November 20, 2019

Seventh grade openhouse

Recently, I went to my son's 7th grade openhouse. It was a nice setup. They made me follow my son's daily routine, visiting the same classrooms in the same order, but with 10 minute class time where the teachers gave information about what they would be doing this year.

Overall I am impressed by what I saw and heard. My main takeaways was that the school has embraced a hands-on learning curriculum, and is making very good/wise use of technology.

My impression from Math, Science, ELA classes was that, the middle school is using the flipped-classroom model to a great extent. There is little lecturing and a lot of group work.

The science class presentation was exciting. Instead of memorizing, the class emphasizes hypothesizing, testing, and reasoning.

Art class also sounded exciting. They added new media modules. This is not surprising given that half the 12 years old population list being a YouTuber as their career choice. The art teacher said that their goal is to get the students excited and comfortable with making/producing things.

The school offers many interesting clubs in addition to the curriculum. I wish I was a student, and had the opportunity to learn in this environment.

Technology use

The school makes very good use of technology. Each student gets a ChromeBook for the year. They use it at home, charge overnight, and bring it to school to use in the classes all day. These ChromeBooks have very good battery life, and they are very durable.

Everything is on the web, and accessible via the ChromeBooks. The students have access to grading information via the schools webportal for them. Many classes use apps that enable the students to create flash cards for studying. The students get to take practice tests. And almost every class  make use of videos, including YouTube videos.

In each class, the students are responsible for organizing their agendas and keep track of the homework deadlines. The students reach all their class material and view and submit homework online using their ChromeBooks and Google office and other Google tools.

It looks like Google has the monopoly on the mindshare of the new generation. And Microsoft is missing the boat on this one big time. By the time these students graduate college and start jobs, they may not be very willing to adopt into the Windows/PC ecosystem.


Each classroom has a smartboard. These smartboards are actually computer screens projected over to a regular whiteboard. The smartboard functionality comes from the software that tracks hand movements and pen touches to the whiteboard via a camera. The teacher can display any website on the smartboard, and can scroll, click, zoom, with ease.

I am left with a lurking and jealousy guilt after the openhouse. Compared to the middleschool's technology use, our universities and our CS department's technology use is very lame... Why is that?

In my university, we keep on traditional lecturing. In our classes, there is a nice monitor and projecting technology, but there are no smartboards. I think there are clickers available in some classrooms, but that is ancient tech and unnatural way to interact. The university makes us use the God-awful (and even more ancient) Blackboard software for interacting/managing the class and grades.

Monday, November 18, 2019

Book Review. Digital minimalism: Choosing a Focused Life in a Noisy World

"Digital Minimalism: Choosing a Focused Life in a Noisy World" is Cal Newport's new book. The topic of the book is clear from the title. You should be quitting your Facebook checking, Twitter chatting, Imgur browsing, phone fiddling habits. No more monkey business. It is time to get to work. Deep work.

In Chapter 4 of the book, Calvin forms the term *solitude deprivation*. Solitude deprivation is on the other end of the spectrum to solitary confinement, but it can also be bad for you as well over a long duration. The book argues that today we all experience solitude deprivation. The smartphones, laptops, and screens do not give us time to be alone with our thoughts and process things in our speeds. I had heard a nice story, where the Amazon natives recruited for an expedition in to the jungle would take long breaks after doing some walking. They would say they are waiting for their soul to catch up to their bodies. Today we don't give time for our souls to catch up to us, and process both emotionally and rationally the flood of events/news we are bombarded with every day.

So I really liked Chapter 4 that talked about solitude deprivation. This chapter made me really worried that online connectivity, and never getting bored, could be doing more harm to me than I thought.  This chapter made a much more convincing case for the need for quitting social media than the first couple chapters in my view. But maybe it is because I am more of an abstract ideas guy.

Calvin's previous book "Deep Work" had a lot of impact. I think "Digital Minimalism" may not have that much impact. (Well, Digital Minimalism already has become a New York Times, Wall Street Journal, Publishers Weekly, and USA Today bestseller... I guess I mean more impact than that ;-) Deep Work had a positive message, "embrace deeper concentration", whereas Digital Minimalism has a negative message, "prevent digital clutter". I know, I know... For each book, you could simply switch the statements from positive to negative and vice versa. I am just referring to the tone/mood of the books. Digital Minimalism is more of a self-help/how-to book. It prescribes lists of things to do and not to do in a somewhat patronizing voice. The Deep Work book was more conceptual and thought-provoking, and less of a how-to self-help book. I have listened to Deep Work at least three times. I don't see that happening with the "Digital Minimalism" book. I would have liked to read a book titled "Deep Solitude" from Calvin, which I am sure I would be re-reading several times.
If you want to build a ship, don't drum up people to collect wood and don't assign them tasks and work, but rather teach them to long for the endless immensity of the sea.
--Antoine de Saint-Exupery

In any case, I think this is a book you should definitely check out. I wish Calvin best of luck with getting these ideas adopted. They are very timely and important.  In 2005, I was labmates with Calvin at Nancy Lynch's theory of distributed systems group. Calvin is like a real life Captain America. Always responsible, kind, tidy, and disciplined. He would arrange his working hours carefully and would optimize everything. He is super smart and productive. His publication record is impressive. He is a theory and abstract thinking person at heart. He thinks clearly and in a sound manner with much deliberation. He is both a successful professor and a successful writer. He is walking the walk as well. So we should pay attention when he is talking.

MAD questions

1. Is social media more dangerous than TV was? 
It may be so because it is algorithmically manipulated to be addictive. Technology companies completed the loop and this is a quick feedback loop fed by millions of people participating in the experiment.  On the other hand, I have heard of a hypothesis that, since the kids are raising up with this technology, they will develop ways to be immune to it. But I don't know if I am convinced by that argument. Parents should definitely regulate/restrict for these technologies. And I think even the governments should be regulating for these technologies. The Digital Minimalism book cites that mental health is at crisis level for millennials raised with this technology available to them. I see students on their phones walking in the halls and fiddling with their phones even in class. They are always busy catching up to what is on their screens, but they are missing up on things happening around them, and most importantly happening within them.

2. Is it possible to have a good social media tool?
Things have a way of going south quickly for social media and common collaboration tools. Quora, which was once where insightful articles reside, is a cesspool now. I guess we chalk it up to human nature.

I like my Twitter-verse. It is nicely curated to give me a chance to observe interesting people chat and think. It is like watching passersby at a coffee. It is not deep conversation, but it is still useful to keep me inspired and informed on these people's interests. I wish we could write paragraphs on Twitter, but then, maybe people wouldn't write and interact that much.

Saturday, November 16, 2019

SOSP19. I4: Incremental Inference of Inductive Invariants for Verification of Distributed Protocols

This paper is by Haojun Ma (University of Michigan), Aman Goel (University of Michigan), Jean-Baptiste Jeannin (University of Michigan), Manos Kapritsos (University of Michigan), Baris Kasikci (University of Michigan), Karem A. Sakallah (University of Michigan).

This paper is about formal verification of distributed systems. Writing proofs manually is cumbersome. Existing tools for formal verification all require the human to find the inductive invariant.

I4 combines power of Ivy (a tool for interactive verification of infinite-state systems) and model checking in order to find inductive invariant without relying on human intuition. Ivy takes as input a protocol description and a safety property, and guides the user interactively to discover an inductive invariant. The goal for finding an inductive invariant is to prove that the safety property always holds. An inductive proof has a base case, which proves initial state is safe, and an inductive step, which proves if state k is safe, prove state k+1 is safe. Once that inductive invariant is found, Ivy automatically verifies that it is indeed inductive.

The insight in I4 is that the safety/correctness behavior of a distributed system does not fundamentally change as the size increases. I witness this regularly in my use of TLA+ for model checking protocols. TLA+ is able to identify any problem (sometimes requiring upto 40 steps) by finding a counterexample involving three nodes. Three nodes is often what it takes. One node initializes a coordination operation, and the other two nodes see a different perspective of the ongoing computation, maybe due to exchanging messages with each other (i.e. doing stale reads) at inopportune times, and arrive to conflicting decisions that violate the goal of the coordination operation.

I4 uses inductive invariants from small instances and apply/generalize to large instances, and automates this with model-checking. More specifically, I4 first creates a finite instance of the protocol; use a model checking tool to automatically derive the inductive invariant for this finite instance; and generalize this invariant to an inductive invariant for the infinite protocol. This amends the I4 approach in that it automates the inductive invariant discovery process. This amends the model checking approach as well. While model checking is fully automated, it doesn’t scale to distributed systems. I4 applies model checking to small, finite instances and then generalizes the result to all instances.


The figure above shows an overview of the I4 flow for the invariant generation on a finite instance.
Given a protocol description--written in Ivy--and an initial size, I4 first generates a finite instance of that protocol with a given initial size. For example, ... I4 will generate a finite instance of the protocol with one server and two clients. It then uses the Averroes model checker to either generate an inductive invariant that proves the correctness of the protocol for that particular instance, or produce a counterexample demonstrating how the protocol can be violated and which can be used to debug the protocol. If the protocol is too complex, the model checker may fail to produce an answer within a reasonable amount of time or it may run out of memory. If this occurs, the finite encoding is simplified—using a concretization technique—to further constrain it and make it easier for the model checker to run to completion. This step is currently done manually but is easily automatable. Once an inductive invariant has been identified, I4 generalizes it to apply not only to the finite instance that produced it, but also to all instances of the protocol.
It is important to note that if the safety invariant does not hold, Averroes produces a counterexample and the human should work on the protocol to come up with a safety invariant that holds for the protocol. I4 is automatic, in that if the protocol safety invariant holds, then the inductive invariant is generated automatically by the Averroes tool.  But, wait, what is the difference between safety invariant and inductive invariant? Isn't safety invariant already inductive?

Safety property P may be an invariant but not an inductive one. "The verification proof requires the derivation of additional invariants that are used to constrain P until it becomes inductive. These additional invariants are viewed as strengthening assertions that remove those parts of P that are not closed under the system's transition relation." In other words, while the safety property holds for reachable states, it may not be closed under program actions outside the reachable states. This makes safety invariant unsuitable for verification since proving properties is not constrained to the reachable states (as it is hard to enumerate/identify reachable states in a proof). So, the inductive invariant is a version of the safety property that is closed under the program actions. The figure below illustrates this relationship. I think this concept is explored further in the Ivy paper.


If the safety property holds, then Averroes generates an inductive invariant for the finite instance; minimizes the invariant by removing redundant clauses; and then passes it on to the next step to be generalized. However, occasionally the finite instance may still be too large for the Averroes model checker, and it may run out of memory. This is where human involvement is needed again. The human helps concretize the small finite version of the protocol further to avoid state space explosion. Symmetry plays a big role here. FIRST is the keyword that denotes the node that sends the first message. The model checker can try instances where all the three nodes in the finite instances might be the one that sends the message. The human can notice a symmetry and set "FIRST = Node1" to help reduce the state space. (The team is working on automating this step as well.)

Then I4 uses Ivy for the proof generation as shown below, and the verification is complete.


I4 is available as opensource at https://github.com/GLaDOS-Michigan/I4. They applied I4 to several examples as shown in the table.


I4 improves on manual verification via using Coq and interactive verification using Ivy.


A restriction in I4 is that it applies to verification of safety properties, and not to liveness properties.

I am happy to find so many verification papers at SOSP. This paper appeared in the distributed systems in the afternoon of Day 2. In the morning of Day 2, there was a session on verification which included four papers. I had reviewed two of these papers earlier: "Scaling Symbolic Evaluation for Automated Verification of Systems Code with Serval" and "Verifying Concurrent, Crash-safe Systems with Perennial". It looks like the verification community at SOSP is quick to take results from more general and theoretic verification conferences, and integrate those tools and improve upon them to put them in use for verification of practical systems.

Thursday, November 14, 2019

SOSP19 Lineage Stash: Fault Tolerance Off the Critical Path

This paper is by Stephanie Wang (UC Berkeley), John Liagouris (ETH Zurich), Robert Nishihara (UC Berkeley), Philipp Moritz (UC Berkeley), Ujval Misra (UC Berkeley), Alexey Tumanov (UC Berkeley), Ion Stoica (UC Berkeley).

I really liked this paper. It has a simple idea, which has a good chance of getting adopted by real world systems. The presentation was very well done and was very informative. You can watch the presentation video here.

Low-latency processing is very important for data processing, stream processing, graph processing, and control systems. Recovering after failures is also important for them, because for systems composed of 100s of nodes, node failures are part of daily operation.

It seems like there is a tradeoff between low latency and recovery time. The existing recovery methods either have low runtime overhead or low recovery overhead, but not both.
  • Global checkpoint approach to recovery achieves a low runtime overhead, because a checkpoint/snapshot can be taken asynchronously and off the critical path of the execution. On the other hand, the checkpoint approach has high recovery overhead because the entire system needs to be rolled back to the checkpoint and then start from there again.
  • Logging approach to recovery has high runtime overhead, because it synchronously records/logs every data about any nondeterministic execution after the last checkpoint. On the flip side of the coin, it can achieve low overhead to recovery because only the failed processes need to be rolled back a little and resume from there. 


Can we have a recovery approach that achieves both low runtime overhead and low recovery overhead? The paper proposes the "lineage stash" idea to achieve that. The idea behind lineage stash is simple.

The first part of the idea is to reduce the amount of data logged by only logging the lineage. Lineage stash logs the pointers to the data messages instead of the data, and also logs task descriptions in case that data needs to be recreated by the previous operation. Lineage stash also logs the order of execution.


The second part of the idea is to do this lineage logging asynchronously, off the critical path of execution. The operators/processes now include a local volatile cache for lineage, which is asynchronously flushed to the underlying remote global lineage storage. The global lineage store is a sharded and replicated key-value datastore.


But then the question becomes, is this still fault tolerant? If we are doing the logging to the global lineage store asynchronously, what if the process crashes before sending the message, and we lose the log information?

The final part of the idea is to use a causal logging approach, and piggybacking the uncommitted lineage information to the other processes/operations for them to store in their stashes as well. So this kind of resembles a tiny decentralized blockchain stored in the stashes of interacting processes/operations.


In the figure, the filter process had executed some tasks and then passed messages to the counter process. Since the logging is off the critical path, the lineage for these tasks was not yet replicated to the global lineage stash. But as part of the rule, the lineage was piggybacked to the messages sent to the counter, so the counter has also a copy of the lineage in its stash, when the filter process crashed. Then in the recovery, the counter process helps by flushing this uncommitted lineage to the global lineage storage for persistence. The recovering filter process can then retrieve and replay this lineage to achieve a correct and quick (on the order of milliseconds) recovery.

Lineage stash idea was implemented and evaluated in Apache Flink for a stream processing word count application over 32 nodes. It was compared against the default global checkpoint recovery, and the default augmented with synchronous logging.


As the figure above shows, by using asynchronous logging approach, linear stash is able to avoid the runtime latency overhead of synchronized logging and matches that of the asynchronous checkpointing approach. Moreover, as the figure below shows, the recovery latency of checkpointing is very high. The lineage stash approach reaches similar recovery latency as the syncronized logging approach.


The lineage stash looks very promising for providing lightweight (off the critical path) fault-tolerance for fine-grain data processing systems. I really like the simplicity of the idea. I feel like I have seen a related idea somewhere else as well. But I can't quite remember it.

Monday, November 11, 2019

SOSP19 Verifying Concurrent, Crash-safe Systems with Perennial

This paper is by Tej Chajed (MIT CSAIL), Joseph Tassarotti (MIT CSAIL), Frans Kaashoek (MIT CSAIL), Nickolai Zeldovich (MIT CSAIL).

Replicated disk systems, such as file systems, databases, and key-value stores, need both concurrency (to provide high performance) and crash safety  (to keep your data safety). The replicated disk library is subtle, but the paper shows how to systematically reason about all possible executions using verification. (This work considers verification of a single computer storage system with multiple disk --not a distributed storage system.)

Existing verification frameworks support either concurrency (CertiKOS [OSDI ’16], CSPEC [OSDI ’18], AtomFS [SOSP ’19]) or crash safety (FSCQ [SOSP ’15], Yggdrasil [OSDI ’16], DFSCQ [SOSP ’17]).

Combining verified crash safety and concurrency is challenging because:
  • Crash and recovery can interrupt a critical section,
  • Crash can wipe in-memory state, and
  • Recovery logically completes crashed threads' operations. 

Perennial introduces 3 techniques to address these three challenges:
  • leases to address crash and recovery interrupting a critical section,
  • memory versioning to address crash wiping in-memory state, and
  • recovery helping to address problems due to interference from recovery actions.

The presentation deferred to the paper for the first two techniques and explained the recovery helping technique.

To show that the implementation satisfies the high-level specification a forward simulation is shown under an abstraction relation. The abstraction relation maps the concrete/implementation state to the high-level abstract specification state. Perennial adopted the abstraction relation as: "if not locked (due to an operation in progress), then the abstract state matches the concrete state in both disks".

The problem is "crashing" breaks the abstraction relation. To fix this problem, Perennial separates crash invariant (which refers to interrupted spec operations) from the abstraction invariant. The recovery proof relies on the crash invariant to restore the abstraction invariant.


Crash invariant says "if disks disagree, some thread was writing the value on the first disk". Then the recovery helping technique helps recovery commit writes from before the crash. The recovery proof shows the code restores the abstraction relation by completing all interrupted writes. As a result users get correct behavior and atomicity.

The Perennial proof framework was written in 9K lines of coq which provides crash reasoning: leases, memory versioning, and recovery helping. Perennial is built on top of Iris concurrency framework (for concurrency reasoning), which is built on top of Coq. (Iris:  R. Krebbers, R. Jung, A. Bizjak, J.-H. Jourdan, D. Dreyer, and L. Birkedal. The essence of higher-order concurrent separation logic. In Proceedings of the 26th European Symposium on Programming Languages and Systems, pages 696–723, Uppsala, Sweden, Apr. 2017.)

The authors have developed Goose for reasoning about Go implementations, but they also defer this to the paper. The developer writes Go code, and the Goose translator (written in 2K lines of Go code) translates this to Perennial proof, where it is machine checked with Coq.

As evaluation of Perennial framework, they verified a mail server written in Go. They argue that compared to a verification in CSCSPEC [OSDI ’18] (their earlier verification framework), the verification in Perennial takes less effort and is done in less number of lines of proof.


The software is available at https://chajed.io/perennial.

MAD questions


1. Is this an instance of a convergence refinement relation? 
In 2001, I was thinking on fault-tolerance preserving refinements as a graduate student working on graybox design of self-stabilization. The question was that: If we design fault-tolerance at the abstract, what guarantee do we have that after the abstract code is compiled and implemented in concrete, the fault-tolerance still holds/works?

It is easy to see that fault-tolerance would be preserved by an "everywhere refinement" that preserves the abstraction relation (between concrete and abstract) at any state, including the states outside the invariant states that are not reachable in the absence of faults. But the problem is that outside the invariant, the abstraction relation may not hold due to recovery actions being different than normal actions. That is pretty much the dilemma the Perennial work faced in verifying the recovery of replicated disks above.

OK, I said, let's relax the everywhere refinement to an "everywhere eventual refinement" and that would work for preserving fault-tolerance. Yes, it works, but it is not easy to prove that the concrete is an everywhere eventual refinement of the abstract because there is a lot of freedom in this type of refinement, and not much of a structure to leverage. The proof becomes as hard as proving fault-tolerance of the concrete from scratch. So, what I ended up proposing was a "convergent refinement", where the actions of the concrete provides a compacted version of the actions of the abstract outside the invariant. In other words, the forward simulation outside the invariant would be skipping states in the concrete. Perennial faced with the same dilemma chose to use a different abstraction relation. Whereas the convergence refinement idea is to keep the same abstraction relation but allow it to contract/skip steps in the computations outside the invariant states. I wonder if this could be applicable in the Perennial problem.

My reasoning with going compacting steps in refinement outside invariant was because it is safer than expanding the computation: if you show recovery in states in the abstract, by skipping steps (and not adding new ones) the concrete is also guaranteed to preserve that recovery.

Here is the abstract of my 2002 paper on convergence refinement. I just checked and this paper only got 19 citations in 19 years. It did not age well after getting a best paper award at ICDCS'02. In comparison, some of the papers we wrote quickly and published as short paper or as a workshop paper got more than 150-900 citations in less than 10 years. Citations is funny business.
Refinement tools such as compilers do not necessarily preserve fault-tolerance. That is, given a fault-tolerant program in a high-level language as input, the output of a compiler in a lower-level language will not necessarily be fault-tolerant. In this paper, we identify a type of refinement, namely "convergence refinement", that preserves the fault-tolerance property of stabilization. We illustrate the use of convergence refinement by presenting the first formal design of Dijkstra’s little-understood 3-state stabilizing token-ring system. Our designs begin with simple, abstract token-ring systems that are not stabilizing, and then add an abstract "wrapper" to the systems so as to achieve stabilization. The system and the wrapper are then refined to obtain a concrete token-ring system, while preserving stabilization. In fact, the two are refined independently, which demonstrates that convergence refinement is amenable for "graybox" design of stabilizing implementations, i.e., design of system stabilization based solely on system specification and without knowledge of system implementation details.

Saturday, November 9, 2019

SOSP19 Day 2, Scaling Symbolic Evaluation for Automated Verification of Systems Code with Serval

Verification session was the first session for Day 2. I like formal methods, and I did enjoy these papers. In this post I will only talk about the first paper in the session, the Serval paper. (You can read about SOSP19 Day 1 here.)

This paper is by Luke Nelson (University of Washington), James Bornholt (University of Washington), Ronghui Gu (Columbia University), Andrew Baumann (Microsoft Research), Emina Torlak (University of Washington), Xi Wang (University of Washington).

This paper received a best paper award at SOSP19, and the software is publicly available at https://unsat.cs.washington.edu/projects/serval/.

SOSP has a tradition of publishing systems verification papers, such as seL4 (SOSP’09), Ironclad Apps (OSDI’14), FSCQ (SOSP’15), CertiKOS (PLDI’16), Komodo (SOSP’17). A downside of systems verification is it is very effort-intensive. The Certikos manual proof consisted of more than 200K lines.

To help address this problem,  this paper introduces Serval, a framework for  developing automated verifiers for systems software. Serval accomplishes this by lifting interpreters written by developers into automated verifiers. It also provides a systematic approach to identifying and repairing verification performance bottlenecks using symbolic profiling and optimizations.

Wait, wait... What is an interpreter? And what is lifting?

In prior work on automatic verification (such as Hyperkernel SOSP17), a verifier implements symbolic evaluation for specific systems, and the verifier is not reusable/generalized. To make the verifier reusable and general, in Serval, the developers write an interpreter for an instruction set using Rosette, an extension of the Racket language for symbolic reasoning. Serval leverages Rosette to "lift" an interpreter into a verifier; which means to "transform a regular program to work on symbolic values". The developers also give the system specifications to be verified.


In the Serval framework the verifier consists of the lifted interpreter and the symbolic optimization. The steps are: write a verifier as interpreter, then Serval performs symbolic profiling to find bottleneck, and apply optimizations until verification becomes feasible.


Serval uses symbolic execution to avoid the state space explosion problem. But the program counter (PC) becoming symbolic is bad as it unnecessarily opens up search space. Serval prevents this with symbolic optimizations:

  • peephole optimization
  • fine-tune symbolic evaluation
  • use domain language to reduce the concrete values PC can take, and avoid path explosion problem.

Unfortunately I didn't understand much about the first two optimizations from listening to the presentation.

Using Serval, the authors build automated verifiers for the RISC-V, x86-32, LLVM, and BPF instruction sets. Targeting low level end of compiling stack can be an advantage for verification, because we don't need to trust higher level language toolkits. Future work will consider how the low-level-guarantees identified and verified by Serval could be connected to high level data structures for proof verification.

To show that existing systems can be retrofitted for Serval, they Retrofitted CertiKOS and Komodo for Serval. They mention this takes around 4 weeks for a new system. They also found 15 new bugs in Linux BPF JIT.

I will read the paper carefully to understand Serval better. It seems promising for scaling verification to practical systems. Of course the process still requires expertise and several weeks worth of effort, but Serval improves on the state-of-the-art with many months of effort.

Thursday, November 7, 2019

SOSP19 Day 1 wrap up

It was only 3 sessions into day 1, and my brain was fried.
Conferences are tiring because you are exposed to so many new ideas in a short time. It was clear I would not be able to pay attention to the papers in the last session, so I skipped that session (the privacy session which included the following three papers) to go for a walk at the golf park behind the conference center.



After the privacy session, there was a poster session and reception from 5-7:30pm. The poster session was nice for asking authors questions about the papers and having more in-depth conversation.

A student had told me he doesn't know how to start conversations with other conference attendees. I told him "That's easy.. Just ask them about what they are working on these days." A better way to start deeper conversations is to listen to the paper presentations, and have genuine questions about future work, or some extension and connection, and go discuss with them at coffee breaks, lunch, or poster session.

In the free-roaming poster session and reception, I had a chance to meet many colleagues and catch up on what they are working these days. When they returned the question, I had to talk for 3-5 minutes about what I am working on these days. I found that my "elevator pitch" got better and better as I had to answer this question many times.

I am a shy person, but at conferences my curiosity works in my favor, and I approach people to learn about their current work, and what they think of this paper versus that paper. I really enjoy talking to fellow researchers, each of whom is an expert in a small part of a big field. We may have different opinions on things, they may not like the papers/ideas I like, but I get to learn about their perspectives and file them in my brain without having to agree or disagree with them for now.

General impressions about SOSP 

SOSP is single track, so 500+ people were in the same big conference room for the sessions. The first half of the room had tables, and the second half just chairs. If you sat at a table row, you can rest your laptop on the table and type comfortably. I sat at the very front row and took notes. Interestingly, there is little contention for the front rows. Another advantage of sitting at the front row is that I am not distracted by seeing other audience members checking Facebook, Twitter, and mail on their laptops.

(Rant: This is my pet peeve. This drives me nuts. What is the point of flying over a long distance, driving at least two more hours from the airport to come to the conference, and check mail and social media all day long? You disrupted your whole week to travel to this conference and now you are not "at the conference". This I will never understand. Be here now!)

To comment on the papers and presentations on the first day, I found all the sessions very interesting. I don't have a favorite, all the three sessions I attended were very good.

Most of the presentations were given by graduate students. The quality of most of the presentations were very good. It is obvious a lot effort went into the rehearsals of those presentation. Almost all presenters had written (and memorized) extensive speaker notes and while the presentation view was displayed on the curtain, they had the presenter notes open on their laptops. Some of the presenters just read from their presentation notes. Those presentations were not very engaging. But at least the slides were very well organized, and the messages were distilled down to important points and were easy to follow.

Each presentation was about 20 minutes including a 2-3 minutes question answering slot at the end. (I think the SOSP conferences I attended before had 5 minutes reserved Q&A slot, but for this one the Q&A was not as rigidly reserved and enforced.)

Most of the presenters were able to cover around 40 slides in 20 minutes. This is an astonishingly large number. Normally the rule of thumb is to have 10 slides to present in 20 minutes. But being well prepared for a smooth flowing presentation, the presenters were somehow able to pull this off. I guess this takes its toll on the listeners though. I felt overwhelmed and exhausted after three sessions being bombarded by too many ideas, concepts, and acronyms.

I had written two posts about how to present in case you are looking for advice in that department.
http://muratbuffalo.blogspot.com/2017/08/on-presenting-well.html
http://muratbuffalo.blogspot.com/2015/02/how-to-present-your-work.html

Two-phase commit and beyond

In this post, we model and explore the two-phase commit protocol using TLA+. The two-phase commit protocol is practical and is used in man...