Timely Algorithms

This is what I (along with some collaborators) have been thinking about lately. This is still raw, but the idea is exciting enough that I wanted to share early and get feedback, recommendations, pointers.

Using synchronized time for distributed coordination

A fundamental problem in distributed systems is to coordinate the execution of nodes effectively to achieve performance while preserving correctness. Unfortunately, distributed coordination is a notoriously tricky problem. Since the nodes do not have access to a shared state and a common clock, they need to communicate and synchronize state frequently in order to coordinate on the order of events. However, excessive communication hinders the performance/scalability of a distributed system.

A key component of distributed coordination is the enforcement of consistent views at all nodes for the ordering of significant events. Distributed algorithms employed logical clocks and vector clocks predominantly for this coordination. These are essentially event counters, and are entirely divorced from wallclock/physical clocks and real time. While the logical/vector clocks are robust to asynchrony (unbounded divergence in the speed/rate of processing of nodes), explicit communication is required to enable the nodes infer/deduce/learn about each others' logical counters. (That is, communication is the only way to convey information in this approach.)

In our work, we advocate using time from tightly-synchronized clocks to convey information with the passage of time and reduce the communication costs of distributed algorithms. 

Previous work on using synchronized clocks

This is an attractive goal, so it has been pursued before, and some progress has been made. Barbara Liskov's 1991 paper on practical use of synchronized clocks introduced the use of timeouts for leases. However, leases at the nodes do not require tightly synchronized clocks, and only use clocks to measure a timeout period. This is easily implemented because the skew of physical clocks is insignificant for the millisecond and second level timeout periods.

There has not been any significant progress for a long time after Liskov's 1991 paper; adopting synchronized clocks to improve distributed algorithms proved to be challenging. One reason was because synchronization was not tight enough and not reliable. As clock technology improved, synchronization has gotten tighter and its reliability also improved.

The availability of tightly synchronized clocks led to some more progress in adopting synchronized clocks in distributed systems. Google Spanner and Causal Spartan use synchronized clocks to provide extra guarantees. Spanner uses it to provide linearizability of transactions across WANs and Spartan uses it for implementing efficient causal consistency across WANs. Both Spanner and Spartan use availability of synchronized clocks for providing distributed consistent snapshots.

Challenges for using time for distributed coordination

The reason there has not been more examples of using synchronized clocks to convey information to reduce the communication costs of distributed algorithms is the challenges involved. There are hidden factors that can invalidate the information to be conveyed via passage of time:
(a) Crash of nodes,
(b) Loss of messages, and
(c) Asynchrony in clocks.

Consider a silent-consent commit algorithm. The transaction manager (TM) sends a message to the resource managers (RMs) asking them to commit the transaction at time T. If an RM needs to reject the transaction, it sends a message, and then the TM sends the Abort message to all the RMs to abort the transaction. Otherwise the TMs first message to commit at T is the only message employed in the algorithm. Note that if one disagreeing RM has crashed, or if a message loss occurs, the safety of this silent consent algorithm is violated.

Timely Algorithms

In our work, we formulate a systematic approach to circumvent these challenges that enable building next-generation of synchronized clocks based  distributed coordination algorithms, which we call timely algorithms. Our method is to first formally articulate what information we like the time to convey, and then to determine if it is possible to convey that information in the presence of type a, b, and c faults. If it is found that the information to be conveyed is not tolerant to the faults considered, there are three ways to resolve the situation.

One way to resolve this problem is by conveying less-ambitious information and still get some benefit from that. In that case safety holds to the face of faults. An alternative way to resolve this problem is by involving extra/orthogonal mechanisms that mask type faults. For example using a Paxos group instead of one node would mask the type a fault. Using datacenter networks with redundancy would mask type b fault. Using better hardware and software backed clocks would mask type c fault. Then it is possible to provide silent-consent-like efficient timely algorithms.

Finally, another way to resolve the problem is to let the safety be violated by the faults, but use another mechanism to roll-back or reconcile the problem on the occasion when that happens. Provided that these faults are rare, the timely algorithm would still lead to a lot of savings.

Using our method, we give three examples of timely algorithms. We show how these timely algorithms address the faults, save extra communication (avoid incast storm problems), and increase the throughput of the system compared to their asynchronous counterparts. 

Details to come.

Comments

Popular posts from this blog

Learning about distributed systems: where to start?

Hints for Distributed Systems Design

Foundational distributed systems papers

Metastable failures in the wild

The demise of coding is greatly exaggerated

Scalable OLTP in the Cloud: What’s the BIG DEAL?

The end of a myth: Distributed transactions can scale

SIGMOD panel: Future of Database System Architectures

Why I blog

There is plenty of room at the bottom