Notes from USENIX NSDI Days 2 & 3

Before talking about Days 2 & 3, here are my notes from NSDI day 1 if you are interested.

Again all of the papers mentioned (and omitted) below are accessible publicly at the NSDI web page. Enjoy!

NSDI day 2 

Day 2 had the following sessions:

  • web and video
  • performance isolation and scaling
  • congestion control
  • cloud

Here are the papers I took detailed notes on. I ordered them by how interesting I found them.

Towards Battery-Free HD Video Streaming

This paper is by Saman Naderiparizi, Mehrdad Hessar, Vamsi Talla, Shyamnath Gollakota, and Joshua R Smith, University of Washington.

This work was mind blowing for me. This is a strong group that has developed the Wisp motes before, but even then the presented battery-free video streaming sensors was very impressive and the talk included a live demo of them.

So here is the deal. The goal of this work is to design sticker form factor, battery-free cameras. But that is crazy right? Video streaming is power hungry, how can you go battery-free?

The paper looks closely at the costs of the components of a video streaming camera. It turns out the image sensor, at 85microWatt, is very low power. On the other hand the radio at 100milliWatt is very high power.

If we could only offload the task of communication away from the sensor, we can pull this off!

The inspiration for the idea comes from the Russian great seal bug. Inside the great seal was a very thin membrane that vibrated when there is talking. Then a directional remote radio was used to receive those analog vibrations and reconstruct noise by the Russian spies. The following is from the Wikipedia page on this seal bug, called "the Thing":
The "Thing" consisted of a tiny capacitive membrane connected to a small quarter-wavelength antenna; it had no power supply or active electronic components. The device, a passive cavity resonator, became active only when a radio signal of the correct frequency was sent to the device from an external transmitter. Sound waves (from voices inside the ambassador's office) passed through the thin wood case, striking the membrane and causing it to vibrate. The movement of the membrane varied the capacitance "seen" by the antenna, which in turn modulated the radio waves that struck and were re-transmitted by the Thing. A receiver demodulated the signal so that sound picked up by the microphone could be heard, just as an ordinary radio receiver demodulates radio signals and outputs sound.
The group applies the same principle to cameras to make them battery-free. They showed on the stage the first demo of analog video backscatter that sends pixels directly to the antenna. The range for the prototype of the system was 27 feet, and it streamed 112*112 resolution of real-time video. A software defined radio was used to recreate the backscattered analog video.

For the analog hardware, the speakers said they got inspiration from human brain signals and created pulse width modulated pixels. More engineering went into performing intra-frame compression leveraging that the adjacent pixels are fairly similar.

Vesper: Measuring Time-to-Interactivity for Web Pages

This paper is by  Ravi Netravali and Vikram Nathan, MIT CSAIL; James Mickens, Harvard University; Hari Balakrishnan, MIT CSAIL.

This work asks the question of "what does it mean for a page to load quickly?" In other words, how should we define the load time?

The way page loads work likes this. The url typed in browser invokes the server to send the html. The browser runs html + javascript (requesting other embedded items as it encounters them in the html, but that is the topic of Ravi and James's other paper in the session). In the browser there is a javascript engine and a rendering engine which constructs the DOM tree. The js engine and dom tree interact via dom api.

Often the page load metric used is the page load time. Of course, this is very conservative, because only some of the page content is immediately visible, the invisible part doesn't matter, so why care about the load time of the invisible parts? Making this observation, Google came up with speed index: time to render above-the-fold, i.e., in the visible part of the browser.

But the speed index is also deficient because it doesn't talk about the js code running time. The js code running time affects the user experience, say via autocomplete, etc. An interactive page is not usable without js, and today a large fraction of the pages are interactive. The talk gave some statistics about median 182 handlers, and 95th percentile 1252 handlers in the pages surveyed.

To improve on the speed index, this work comes up with the ready index, which is defined as page time-to-interactivity in terms of visibility and functionality.

But the challenge is, nobody knows a good way to automatically identify the interactive state: are the java scripts working yet?

The system, Vesper, uses a 2 phase approach

  • identify visible elements event handlers, and state handlers access when fired
  • track loading progress of interactive state from phase 1

As a side note, the speaker, Ravi, spoke in a relaxed and clear way. The talk didn't feel rushed, but covered a lot of stuff. I really loved the delivery.

Performance Analysis of Cloud Applications

This paper is by Dan Ardelean, Amer Diwan, and Chandra Erdman, Google.

This work from Google considers the question of how we evaluate a change before we deploy it in production? Yes, the accepted approach is to use A/B testing over a small fraction of the users, but is that enough?

The abstract has this to say:
"Many popular cloud applications are large-scale distributed systems with each request involving tens to thousands of RPCs and large code bases. Because of their scale, performance optimizations without actionable supporting data are likely to be ineffective: they will add complexity to an already complex system often without chance of a benefit. This paper describes the challenges in collecting actionable data for Gmail, a service with more than 1 billion active accounts.
Using production data from Gmail we show that both the load and the nature of the load changes continuously. This makes Gmail performance difficult to model with a synthetic test and difficult to analyze in production. We describe two techniques for collecting actionable data from a production system. First, coordinated bursty tracing allows us to capture bursts of events across all layers of our stack simultaneously. Second, vertical context injection enables us combine high-level events with low-level events in a holistic trace without requiring us to explicitly propagate this information across our software stack."
The vertical context injection roughly means collecting the trace at the kernel level, using ftrace, where the layers above the kernel injects information into the kernel via stylized syscalls with payload.

The paper concludes with this observations. For meaningful performance experiments:

  • do experiments in production
  • use controlled A-B tests with 10 millions of users (less is not very meaningful)
  • use long-lived tests to capture the changing mix of requests
  • use creative approaches (vertical context injections) for collecting rich data cheaply.

LHD: Improving Cache Hit Rate by Maximizing Hit Density

This paper is by Nathan Beckmann, Carnegie Mellon University; Haoxian Chen, University of Pennsylvania; Asaf Cidon, Stanford University and Barracuda Networks.

Who knew... Cache eviction policies still require work and you can achieve big improvements there.

To motivate the importance of the cache hit rate research, Asaf mentioned the following. The  key-value cache is 100x faster than database. For Facebook if you can improve its cache hit rate of 98%, by just another additional 1%, the performance would improve 35%.

Here is the abstract:
Cloud application performance is heavily reliant on the hit rate of datacenter key-value caches. Key-value caches typically use least recently used (LRU) as their eviction policy, but LRU’s hit rate is far from optimal under real workloads. Prior research has proposed many eviction policies that improve on LRU, but these policies make restrictive assumptions that hurt their hit rate, and they can be difficult to implement efficiently.
We introduce least hit density (LHD), a novel eviction policy for key-value caches. LHD predicts each object’s expected hits-per-space-consumed (hit density), filtering objects that contribute little to the cache’s hit rate. Unlike prior eviction policies, LHD does not rely on heuristics, but rather rigorously models objects’ behavior using conditional probability to adapt its behavior in real time.
To make LHD practical, we design and implement RankCache, an efficient key-value cache based on memcached. We evaluate RankCache and LHD on commercial memcached and enterprise storage traces, where LHD consistently achieves better hit rates than prior policies. LHD requires much less space than prior policies to match their hit rate, on average 8x less than LRU and 2–3x less than recently proposed policies. Moreover, RankCache requires no synchronization in the common case, improving request throughput at 16 threads by 8x over LRU and by 2x over CLOCK.

Poster session

There was a poster session at the end of day 2. I wish there were more of the preliminary but bold work in the poster session, because most of the posters were just a poster accompanying the presented paper in the main track.

I liked these two posters the most. They are both very interesting works in progress.
  • Distributed Test Case Generation using Model Inference. Stewart Grant and Ivan Beschastnikh, University of British Columbia
  • High Performance and Usable RPCs for Datacenter Fabrics. Anuj Kalia, Carnegie Mellon University; Michael Kaminsky, Intel Labs; David G. Andersen, Carnegie Mellon University 

NSDI Day 3

The day 3 had these sessions:

  • Network monitoring and diagnosis
  • Fault-Tolerance
  • Physical Layer
  • Configuration Management

Since the sessions were networking specific, and since I was tired of the firehose of information spewed at me in the first 2 days, I didn't take much notes on day 3.  So I will just include my notes on the Plover paper from the fault-tolerance session.

PLOVER: Fast, Multi-core Scalable Virtual Machine Fault-tolerance

This paper is by Cheng Wang, Xusheng Chen, Weiwei Jia, Boxuan Li, Haoran Qiu, Shixiong Zhao, and Heming Cui, The University of Hong Kong.

This work builds on the Remus paper which appeared in NSDI08 and received a test-of-time award this year at NSDI.

The two limitations of the REMUS primary/backup VM replication approach was that:

  • too many memory pages needs to be copied and transferred, and
  • a split brain is possible due to partitioning.

This work, Plover, uses Paxos to address these problems. Paxos helps with both problems. By using 3 nodes, it doesn't suffer from the split brain the primary-backup approach suffers. By totally-ordering the requests seen by replicas, it can avoid copying memory pages. Replicas executing the same sequence of inputs should have the same state --- well, of course, assuming deterministic execution that is.

The drawback with Paxos is: it cannot handle non-determinism in request execution. To fix this, Plover invokes the VM page synchronization periodically before releasing replies.

The good news is that using Paxos to totally-order requests makes most memory pages same: the paper reports 97% of the pages being the same. So the VM synchronization is lightweight because it only needs to take care of the remaining 3% pages.

Plover is available on github.

I wonder, since Plover already does VM synchronization, does it really need to use a 100% totally-ordered requests delivered to the replicas via Paxos? Would it be possible to use a relaxed but faster solution? The Tapir project explored relaxed ordering of operations for storage systems, and some of the lessons may be applicable here.

MAD questions

Ok the MAD questions today picks up the thread from the last time. How do you improve the conference experience? Are conferences cramming to many technical sessions in a day? What can be done differently to improve the interactivity and networking of the conference participants?

A major reason I go to conferences is to meet people doing interesting work and converse with them, learn better about their perspectives and thought-processes.

During the 3 days at NSDI, I have talked with 14 people. That is not a bad number for me. I am not from the networking and NSDI community, so I don't know most people there. I get more chances to interact with people if I go to a conference where I know more people. Unfortunately, since I kept switching research areas (theoretical distributed systems 98-00, wireless sensor networks 00-10, smartphones/crowdsourcing 10-13, cloud computing 13-18) I don't have a home conference, where I know most people.

Out of these 14 people, I only knew 3 of them before. From the remaining, I knew a couple of them from interacting on Twitter, but the remaining majority were cold-hello first-time interactions.

The cold-hello interactions are hard, and as an introvert and shy person (except when I am curious) I had to force myself to have these first-time interactions. I assume the people I approach are also interested in talking to people (that is what conferences are supposed to be about), and we can have nice interesting conversations since we have some shared interest on distributed systems and at least on research. I would say 75% of the conversations I had  were interesting and not-superficial. But sometimes it bombs and that gets awkward. And instead of being happy about the nice interactions you have, it is easy to focus on the awkward ones and feel bad about them.

Although I am happy with meeting 14 interesting people, this is so much lower than the people I meet and talk with at HPTS. If you look at my posts about HPTS, you can see that I made it a point to emphasize how much I enjoyed the interactivity of HPTS.

I think a major way HPTS makes this happen is it sets the intentions clear and states this explicitly the first day. Pat Helland takes the stage and says that "the point of HPTS is to meet other people and interact, and the sessions are just a break from meeting/networking with other people". Since HPTS makes the cold-hello the norm, it does not feel weird anymore. I never had an awkward conversation at HPTS.

I am sure there are many ways to build interactivity and networking in the conferences. Why don't we make the posters session a long session in the afternoon, rather than after 6pm? Are there any ice-breaker activities that the conferences can adapt? I remember that at an activity with 20 people, the moderator asked everyone to say something uniquely quirky about themselves. That broke the ice pretty quickly; I still remember some of the quirks people mentioned. Maybe to scale for larger groups, it may be possible to have open-mike crazy ideas, dangerous ideas, and hot-takes sessions. Maybe we need to get some professionals to help, say from improv comedy people or capable event ice-breaker people. (I assume big companies like Google should have skilled HR people to help tech people interact better, right?)

Ok, this can get long, and I am not really knowledgeable in this area, so I will stop here. But here is my ask. Next time if you see me at a conference, please approach me and say hello. I am sure we will have a nice conversation and we will have things to learn from each other.

Maybe next time I should make a badge to make this explicit: "Please say Hi to me, I love to meet and talk to you."


Popular posts from this blog

Graviton2 and Graviton3

Foundational distributed systems papers

Learning a technical subject

Learning about distributed systems: where to start?

Strict-serializability, but at what cost, for what purpose?

CockroachDB: The Resilient Geo-Distributed SQL Database

Amazon Aurora: Design Considerations + On Avoiding Distributed Consensus for I/Os, Commits, and Membership Changes

Warp: Lightweight Multi-Key Transactions for Key-Value Stores

Anna: A Key-Value Store For Any Scale

Your attitude determines your success