Sunday, November 27, 2016

My Distributed Systems Seminar's reading list for Spring 2017

Below is the first draft list of papers I plan to discuss in my distributed systems seminar in the Spring semester. If you have some suggestions on some good/recent papers to cover, please let me know in the comments.

Datacenter Operating System

Firmament: Fast, Centralized Cluster Scheduling at Scale (OSDI 16)
Large-scale cluster management at Google with Borg (Eurosys 15)
Apache Hadoop YARN: yet another resource negotiator (SOCC 13)
Slicer: Auto-Sharding for Datacenter Applications (OSDI 16)

Monitoring

Pivot Tracing: Dynamic Causal Monitoring for Distributed Systems (SOSP 15)
Shasta: Interactive Reporting At Scale (SIGMOD 16)
Adaptive Logging: Optimizing Logging and Recovery Costs in Distributed In-memory Databases (SIGMOD 16)

Consistency

The many faces of consistency (2016)
The SNOW Theorem and Latency-Optimal Read-Only Transactions (OSDI 16)
Incremental Consistency Guarantees for Replicated Objects  (OSDI 16)
Just Say NO to Paxos Overhead: Replacing Consensus with Network Ordering  (OSDI 16)
FaSST: Fast, Scalable and Simple Distributed Transactions with Two-Sided (RDMA) Datagram RPCs  (OSDI 16)

BFT 

The Honey Badger of BFT Protocols (2016)
The Bitcoin Backbone Protocol: Analysis and Applications (2015)
XFT: Practical Fault Tolerance beyond Crashes (OSDI 16)

Links

2016 Seminar reading list
2015 Seminar reading list

6 comments:

Punya said...

Thanks for posting this. I believe the pivot tracing link might be pointing to the wrong URL - could you check?

蓝葻 said...

+1
This seems to be the correct link:
http://sigops.org/sosp/sosp15/current/2015-Monterey/printable/122-mace.pdf

Murat said...

I corrected the link. Thank you.

Michael Hausenblas said...

Nice collection! Also, I was wondering, are you aware of https://dcos.io/ which is the actual Datacenter Operating System (DC/OS)?

Sam BESSALAH said...

Great List.
I would add two papers in the Datacenter OS section. One is
DRF https://people.eecs.berkeley.edu/~alig/papers/drf.pdf (Dominant Resource Fairness) which is the mechanism that underpins Mesos (https://people.eecs.berkeley.edu/~alig/papers/mesos.pdf) .

Sam BESSALAH said...

Oh, and on interesting paper is the Dapper paper for Distributed System Tracing from Google. One close open source implementation is in the Zipkin project : http://static.googleusercontent.com/media/research.google.com/en//archive/papers/dapper-2010-1.pdf