Spring 18 seminar, "blast from the past" edition

It is a bad sign if the references section of a paper fails to cite any recent papers. That means, either the authors are not aware of any recent work on the area, or the area is not of interest to other researchers.

But it is also a bad sign if the references section of a paper fails to cite any old papers. That means, the authors likely do not know enough about the fundamental/foundational work in the area.

There is a case to be made for working from the fundamentals, the first principles. Elon Musk made that case in his work, and showed that you can make transformative work, even in the commercial technology world, by working from the first principals.

Working from the first principles is also essential for research. It is not uncommon to get your best ideas when preparing for a class. Sometimes reviewing fundamental work in a topic, you notice a gap, some weird under-explained assumption, and go "huh, why is it that way". Or sometimes the students (or outsiders of the field) ask a basic question from the left field, and that may start an investigation. And sometimes, you see the old idea/algorithm as a promising/useful fit in new emerging applications. Recently the flexible quorums extension to Paxos is a good example of working from first principles. Nobody expected that result after 30 years of Paxos.

Back to the seminar

Every Spring, I teach a distributed systems seminar where we cover recent interesting papers in the field. But this Spring, I think I will experiment with a special "blast from the past" edition.

Reading these golden oldies could be a chance to revisit the fundamentals of the field. When reading these papers, we can note about how they aged (which parts aged well, which not) and how the context changed from then to now. We can use Google Scholar to investigate how these papers got cited in the intervening years. We can also consider if some of these algorithms can find new applications in modern systems.

We run our distributed systems seminars to include discussion and group work sessions, so I am sure we will be able to come up with new insights/perspectives about these papers with the advantage of hindsight.

But I am not sure what the selected papers will be yet. Here is my first brain dump on this. It may take several weeks to finalize the list. If you have suggestions, please let me know in the comments. What should be the cutoff date for these papers? 2000 seems to be a reasonable cutoff date. We may even stretch this up to 2007 to include a paper or two.
  1. Lamport. Time, clocks, and the ordering of events in a distributed system, 1978.
  2. Lampson & Sturgis, Crash Recovery in a Distributed Data Storage System, 1979 
  3. Dijkstra. Self-stabilization in spite of distributed control, 1982.
  4. Ben-Or, "Another Advantage of Free Choice: Completely Asynchronous Agreement Protocols" 1983.
  5. Rabin, "Randomized Byzantine Generals" 1983.
  6. Oki & Liskov, Viewstamped replication: A new primary copy method to support highly-available distributed systems, 1988.
  7. Consensus in the presence of partial synchrony, Dwork, Lynch, Stockmeyer, 1988. 
  8. Some UNITY papers from Chandy & Misra.
  9. Birman, Virtual Synchrony paper and debate.
  10. Andrew File System: Scale and performance in a distributed file system, 1988 
  11. Schneider, "Implementing fault-tolerant services using the state machine approach: a tutorial", 1990.
  12. Awerbuch and Peleg, "Sparse Partitions", 1990.
  13. Arora, Gouda. Distributed reset, 1990.  Closure and convergence: A foundation of fault-tolerant computing, 1993. 
  14. Herlihy & Moss, "Transactional Memory: Architectural Support for Lock-Free Data Structures", 1993.
  15. Plan 9 from Bell Labs, 1995 
  16. Lampson, "How to Build a Highly Available System Using Consensus", 1996. 
  17. Chandra, Toueg "Unreliable Failure Detectors for Reliable Distributed Systems", 1996.
  18. Flexible update propagation for weakly consistent replication, 1997. 
  19. Afek & Dolev, "Local stabilizer" 1997.  
  20. Cluster-Based Scalable Network Services, 1997. 
  21. Scalable, distributed data structures for internet service construction, 2000. 
  22. Rosenblum and Garfinkel. Virtual Machine Monitors: Current Technology and Future Trends, 2005. 



Werner Vogels's blog has a "Back to the Basics Reading" label, which includes interesting papers.

Comments

Popular posts from this blog

Hints for Distributed Systems Design

Learning about distributed systems: where to start?

Making database systems usable

Looming Liability Machines (LLMs)

Foundational distributed systems papers

Advice to the young

Linearizability: A Correctness Condition for Concurrent Objects

Scalable OLTP in the Cloud: What’s the BIG DEAL?

Understanding the Performance Implications of Storage-Disaggregated Databases

Designing Data Intensive Applications (DDIA) Book