There is plenty of room at the bottom

- August 17, 2021

This is a pun on the saying "there is always room at the top". This is also the title of a famous Feynman lecture from 1959, where he made a case for nanotechnology.

In this post, I will try to argue that there is plenty of room at the bottom for distributed algorithms. Most work on distributed algorithms are done at a high-level abstraction plane. These high level solutions do not transfer well to the implementation level, and this opens a lot of space to explore at the implementation level.

But this is not just an opportunistic argument. It is imperative to target the implementation level with our distributed algorithms work, otherwise they remain as theoretical, unused, inapplicable.

Let me try to demonstrate using consensus protocols as examples. Someone else can make the same case using another subdomain.

Be mindful of what is swept under the rug

What is succinct at the high level could be very hard to implement and get right at the low level. Paxos seems simple, but even after several papers on it, it was difficult to implement. In fact the whole point of Raft was to show how to implement Paxos while limiting its generality.

What seems to be a harmless assumption could be the source of a liveness bug that brings down the deployment. Assuming nodes are strongly connected, Raft does not have a liveness problem. And this does not seem like a big deal at the design level. But when that assumption is violated in deployment, you have a 6 hour outage impacting production.

What seems to be implementation detail can determine the scalability of the algorithm. By employing intermediate proxy nodes to relay the communication between the leader and followers, PigPaxos is able to achieve 3.5 folds improvement in Paxos throughput.

What is fast under low contention environments could be sluggish under increased contention. EPaxos uses dependency tracking to achieve more concurrency and throughput, but when contention increases the benefits disappear quickly.

What seems to be an operational/deployment problem can require a new consensus read algorithm to be developed. PQR algorithm enables clients to do linearizable reads just using the followers, and relieves the leader off the job of serving reads, which allows it to be able to serve more writes.

What seems to be a crazy relaxation of guaranteed commits can open a new field of consensus algorithms. I am talking about Nakamoto consensus which amalgamated known techniques to rewrite the book on consensus. It used proof of work to achieve leader election without needing communication, and used hash-chain based multi-synod log to achieve eventual implicit commit finalization.

The lesson is clear. Be mindful of what is swept under the rug! What is not a concern at high level can be a feature that makes or breaks the implementation. It is possible to think of other dimensions/concerns, even though I may not be able to match them to Paxos variants. Ok here goes nothing.

Continuous Integration Continuous Delivery (CI/CD) concerns are the big thing in the industry, but I haven't seen them addressed in Paxos/consensus variants. Is there a Paxos variant designed for being easier to change/modify?

How about Paxos variants that play nicer with other protocols, such as storage, networking, and commit protocols. (Well, actually, TAPIR did consider consensus together with storage for transactions, and this paper among others considered unifying consensus protocols with commits.)

How would disaster tolerance work? Flexible and graceful fault-tolerance is better than being robust to a point and then being a brittle mess after that without notice.

How would debuggability/observability work for consensus?

Can we have a fully verified (via automated reasoning) implementation of consensus for a deployment system?

Cloud computational model changes everything

There was plenty of room to begin with, and we now have even more room at the bottom with a drastically changed terrain: Algorithm design still uses the traditional computer paradigm from 1970s. But the cloud computational model toppled many of the assumptions of the traditional CPU+memory+storage-on-box model.

Cloud has a new resource and cost model based on containers and sometimes on even serverless platforms. The incentive at the traditional on-box model is to use all CPU+memory+storage without any being a bottleneck. Things change in the cloud where bottlenecks are less of an issue because of disaggregation, and you pay based on what you use so using-all-you-can does not make as much sense. Recently, the scalable but wasteful paper looked at how Paxos variants designed for the on-box model fare for the cloud model.

Cloud also gives you novel distributed building blocks and primitives. Most cloud environments provide distributed transactional logs as part of infrastructure used by existing protocols. Why not leverage these to build consensus on them. Delos and followup work is an example of this.

Containers and abundant resources in the cloud make it possible to migrate computation for pro-actively controlling and strengthening availability. Recently, the Physalia paper investigated how infrastructure aware placement can significantly reduce the effect of network partitions, infrastructure failures, and even software bugs.

We are just getting starting in this line of work. We are two decades in to the cloud era, but distributed algorithms are still designed and evaluated with the single dedicated box model from the 1970s. This is an anomaly, and this is cue there will be a revolutionary science period where a new paradigm will emerge.

Are you saying we should not use abstractions and avoid abstract thinking?

No, I am not saying that. I am just saying: don't abstract away the features that are important to the implementation. Have one foot in the implementation plane. Use tools that forces you to deal with the implementation/deployment relevant concerns.

"The purpose of abstraction is not to be vague, but to create a new semantic level in which one can be absolutely precise." --Dijkstra

Search This Blog

Metadata