Optimizing Distributed Protocols with Query Rewrites

This paper (Sigmod 2024) formalizes scalability optimizations as rule-driven rewrites, inspired by SQL query optimizations. It focuses on two well-known and popular techniques: decoupling (distributing logic/code across nodes for introducing pipeline parallelism) and partitioning (distributing data across nodes--no new components, but instances of them-- for workload parallelism).

Whittaker et al. (hi!) applied decoupling and partitioning optimizations to Paxos in the Compartmentalized Paxos work. That work deconstructed Paxos and showed how to reconstruct it using decoupling and partitioning to be more scalable by individually focusing on each component/role in Paxos. This is a simple but effective trick. Even after you learn the trick, you still keep getting surprised by how effective it is.

The current paper aims to do this application via query rewrite rules more methodically rather than the traditional/ad-hoc way. 

To this end, the paper utilizes Dedalus, a Datalog¬ dialect with set semantics, supporting SQL-like declarative constructs (joins, selection, projection). Dedalus programs are constrained Datalog¬ programs adhering to three additional syntax rules: space and time in schema, matching space-time variables in body, and space and time constraints in head. 

Dedalus programs are an (unordered) set of queries over (unordered) relations, so the logic for ordering (time, causality, log sequence numbers) is the exception, not the norm, and easy to identify. This becomes important for the correctness reasoning of rewrite rules.

Which brings us to what I think is the best part about this work: the rewrites are correct by construction. The rewrite-rules use localized "peephole" optimizations of dataflow graphs, preserving semantics without requiring holistic reasoning about protocol invariants. The rewrite rules modify existing programs with small local changes, each of which is proven to preserve semantics. As a result, each rewritten subprogram is provably indistinguishable to an observer (or client) from the original.  Moreover, because rewrites are local and preserve semantics, they can be composed to produce protocols with multiple optimizations.

The paper applies the rule-driven rewrites to three seminal distributed protocols: voting, 2PC, and Paxos. The application of the rewrites rulesis not automatic, but methodical. All protocols are implemented as Dedalus programs applying rewrite rules manually, and then are compiled to Hydroflow, a Rust dataflow runtime for distributed systems.  They achieve 2×, 5×, and 3× throughput improvements over baseline (not scaled out) implementations on voting, 2PC, and Paxos respectively, which matches the performance of ad hoc rewrites (e.g., compartmentalized Paxos).


Discussion

There isn't much creativity to applying decoupling and partitioning, as they are well established techniques in distributed systems design and deployment. But being able to do this methodically and to compile from high-level declarative language is awesome. With LLMs writing some glue-code automatically becomes feasible, making this line of building systems more feasible in the future.

This work is an exciting step for realizing the HydroFlow vision. I had written a post about the Hydroflow proposal paper last year. It is worth a review to understand how this work fits there. Here is the last paragraph of that post.

This line was especially catchy: "in the future we will not overfit our programs". If the Hydro project and vision succeeds, we will write them in a declarative language, and will be able to tune the orthogonal facets of availability, consistency independently.

Comments

Popular posts from this blog

Learning about distributed systems: where to start?

Hints for Distributed Systems Design

Foundational distributed systems papers

Metastable failures in the wild

Scalable OLTP in the Cloud: What’s the BIG DEAL?

The end of a myth: Distributed transactions can scale

There is plenty of room at the bottom

Dude, where's my Emacs?

Always Measure One Level Deeper

Distributed Transactions at Scale in Amazon DynamoDB