DistSys Reading Group second meeting: Wormspace

We had our second Zoom DistSys Reading Group on Wednesday. The meeting is open to all who are working on distributed systems. Join our Slack channel for paper discussion and meeting links (password protected).

I had summarized Wormspace paper before the meeting. It is a great paper. This week I was the presenter, and we started the meeting with my presentation for 30 minutes. Here is a link to my slides.

In the presentation I made sure to emphasize the benefit provided by WormSpace. It is an abstraction that enable developers to use distributed consensus as a building block for applications. Developers don't need to understand how distributed consensus via Paxos works. The API hides the details and complexity of Paxos under a data-centric API: capture, write, and read. The API is at a low enough level to enable efficient designs on top (as demonstrated for WormTX) without the need to open the Paxos box.  Bunching WORs in WOS was also a very useful decision for improving the programmability of WormSpace library. In short, WormSpace enables you to remix distributed consensus in your applications. While WormSpace is not groundbreaking in terms of novelty, it is a very important contribution because providing the right abstractions enable an explosion of applications in a field, e.g., Map-Reduce, spark, and TensorFlow.

After the presentation, we had general discussion about the paper, answering questions posed by participants at the linked Google Docs. Some interesting ones included:
  • Q: How does WormSpace's use of registers compare with Heidi Howard’s generalized solution to distributed consensus which builds on top of write-once register (WOR)? 
  • A: WormSpace implements WOR via Paxos, whereas Heidi's work assumes WOR is provided at middleware to build Paxos protocols on top. WormSpace registers are distributed registers, in contrast Heidi's registers are local-implementations, many rounds per register is provided by the middleware over which distributed registers are implemented via Paxos variants.
  • Q: What is the purpose of a WOS? Is WOS just at a coarser granularity than WOR?
  • A: Yes, WOS provides a batching opportunity over WORs, and helps for performance and programmability. As an example of programmability improvement, consider WormLog. A client (sequencer) can allocate a WOS, and batch capture all the WORs in it, and pass these as write tokens to tail in sequence to other clients talking to this sequencer.
  • [comment] Fig 9 &10 compare WormPaxos vs Cpaxos (classical Paxos). But Cpaxos does not use the multipaxos optimization of stable leader, and also not piggybacking of commit. In contrast WormPaxos has a stable leader: the client who allocates the WOS. This makes the comparison unfair.

In the discussion, we decided on two questions to explore deeper for the breakout session.
  • WormSpace can be viewed as an easy to use paxos-as-a-library.  With the many variants of Paxos that exist, which can be implemented using WormSpace’s API?  What extensions to WormSpace would be needed to be able to support the others?  (E.g. Mencius, generalized paxos, flexible paxos, e-paxos, {cheap,vertical,stoppable} paxos?)
  • WormSpace proposes WORs/Paxos as the fault-tolerant, replicated base on which to build applications and services.  Other work (Tapir, Replicated Commit, ?) suggests that replication should be fused with higher level protocols/applications for maximum performance. How do these approaches compare?

Here is the YouTube video for our presentation and discussion. After the breakouts, there had been another discussion, but we couldn't record that.

After I put people to the breakout rooms, our neighborhood had a power outage. It lasted for two hours. I tell you, it is not fun to have a blackout, when you are in quarantine and are worried about whether the world is falling apart.

Comments

Popular posts from this blog

Hints for Distributed Systems Design

Learning about distributed systems: where to start?

Making database systems usable

Looming Liability Machines (LLMs)

Advice to the young

Foundational distributed systems papers

Distributed Transactions at Scale in Amazon DynamoDB

Linearizability: A Correctness Condition for Concurrent Objects

Understanding the Performance Implications of Storage-Disaggregated Databases

Designing Data Intensive Applications (DDIA) Book