Unifying Consensus and Atomic Commitment for Effective Cloud Data Management (VLDB19)

This paper is by Sujaya Maiyya, Faisal Nawab, Divyakant Agrawal, and Amr El Abbadi, and appeared in VLDB19. Here is a video of the presentation of the paper from our Zoom DistSys Reading Group.


Atomic commitment protocols (e.g., Two Phase Commit) provide ACID guarantees for transactional access to sharded data.


Consensus protocols (e.g., Paxos) replicate data across servers in a consistent and fault tolerant manner.

Many prior works have observed the similarities between the commitment and consensus problems. Both consensus and atomic commit protocols aim at ensuring that one outcome is agreed upon in a distributed environment. However, the conditions for achieving this agreement is different in the two cases. Paxos only needs a majority of nodes to be available for a decision to be made whereas 2PC needs votes from all the participants to decide on the final value.


This paper proposes a framework that unites and explains several commitment and single-leader-based consensus protocols under one roof, the Consensus and Commitment (C&C) framework, and then derive protocols to demonstrate the framework. Many of these derived protocols are generalizations of existing protocols, however, some of them are novel in their own right.
  • Paxos Atomic Commitment (PAC) is a distributed atomic commitment protocol managing sharded but non-replicated data. PAC is a variant of 3PC, and integrates crash recovery and normal operations seamlessly in a simple Paxos-like manner. Leader election phase requires a majority quorum response, and the value discovery phase requires all shards to respond unless one of the responses had Decision value true or its AcceptVal value set.
  • Replicated-PAC (R-PAC) is for fully replicated cloud data management, which is similar to Replicated Commit. It is a variant of Gray and Lamport's Paxos Commit. R-PAC is similar to PAC above, but since all nodes maintain identical data, the leader need to only wait for replies from a majority of replicas that have the same InitVal.
  • Finally G- PAC is a novel protocol for sharded replicated architectures, which is similar to other recently proposed hybrid protocols, Janus and TAPIR. G-PAC integrates transaction commitment with the replication of data and reduces transaction commit latencies by avoiding the unnecessary layering of commitment and consensus.
The paper compares the performance of G-PAC with a Spanner-like protocol, where 2PC is used at the logical data level and Paxos is used for consistent replication of logical data. The layering of Paxos and 2PC increases latency, but combining them into a flat hierarchy in G-PAC means that the leader needs to communicate with a large number of nodes, which incurs overhead. If a transaction, T, accesses n shards and each shard is replicated in r servers, there are a total of n ∗ r servers that are involved in the transaction T. Method V with the newly defined notions of super-set and super-majority, which decides the value for the transaction, is described in Algorithm 4. Super-set: majority of replicas (r /2 + 1) from each of the n shards. Super-majority: majority of replicas (r /2 + 1) from majority of shards (n/2 + 1).

The experimental results highlight the low-latency benefits of combining consensus along with commitment into a single integrated protocol. But it looks like the G-PAC evaluation is not done with 15 nodes (i.e., 3 replicas in each of the 5 regions), but only with 5 nodes (1 in each region) where each node uses 2 nearby nodes (within these 5 nodes in 5 different regions) to replicate.

Comments

Popular posts from this blog

Learning about distributed systems: where to start?

Hints for Distributed Systems Design

Foundational distributed systems papers

Metastable failures in the wild

The demise of coding is greatly exaggerated

Scalable OLTP in the Cloud: What’s the BIG DEAL?

The end of a myth: Distributed transactions can scale

SIGMOD panel: Future of Database System Architectures

Why I blog

There is plenty of room at the bottom