Do tightly synchronized clocks help consensus?

Let's start with the impossibility results, a good place to start distributed systems discussion. The coordinated attack result says that if the communication channels can drop messages arbitrarily and undetectably you cannot solve distributed consensus using a deterministic protocol in finite rounds. This result is all about *the curse of asymmetry of information*. Reliable channels are needed for two parties to agree on something. Otherwise the two parties are stuck in a Zeno's paradox of "You may not know that I know that you know that I know" kind of information chaining. A classic paper to checkout on the topic is "Knowledge and common knowledge in a distributed environment".

Assuming mostly reliable channels and demanding majority acks, it is possible to solve consensus in a partially synchronous system. Paxos shows us how to do this. Paxos uses quorum of acks to get around the problem of "the agreement having to occur in one shot" as in the coordinated attack problem. In Paxos, the quorum acks mark the point where the system agrees, and the stragglers can learn or re-learn this information because agreement was stored in a quorum of nodes. As a result, the agreed value does not change.

What does this have to do with time synchronization? Specifically, where is time synchronization in Paxos? Nowhere! Paxos preserves safety in an asynchronous system, and achieves progress when the failure detectors do not suspect the eligible and alive leader. For this, it is sufficient to use diamond S failure detectors (Chandra-Toueg 96), which do not need time synchronization. These detectors are implementable by using per process clocks measuring local time passing under the assumption that time does not advance at unboundedly different rates across processes.


Now the question is this: If we had perfect time synchronization, would we be able to solve consensus easier, better, more robust, or more efficiently?  No! The curse of asymmetry of information is the core challenge in consensus. The leader needs to hear acks from a quorum in order to anchor consensus. (The quorum should be more than F and the phase1 and phase2 quorums should intersect with each other so the consensus state persists.

Time synchronization with clock uncertainty bound, epsilon, less than microsecond are not useful for consensus because consensus is bound by communication, and communication between computers  even in the same cluster takes many microseconds. 

The guarantee we would require from time synchronization is that causality is not violated by the physical clock timestamps. That means epsilon should be less than the communication latency. If epsilon is larger than the communication latency we may order the events incorrectly. Consider this example. Event A and event B have the same clock timestamp but they are farther apart in time because of the epsilon uncertainty. If we were to use A and B as a snapshot of the distributed system, this would be an inconsistent snapshot. Another event happens after A which then sends a message that affects B, and taints the snapshot. Causality sneaks in between A and B and the integrity of the snapshot is violated. 

As long as epsilon is less than communication latency, this problem does not happen and we won't have problems building linearizability or consistent snapshots using the synchronized clocks' timestamps as the basis. 

Don't get me wrong. There could be uses for nanosecond-precise time synchronization for telemetry applications, like measuring network delays, performance hickups, etc. But those would not be useful for consensus, linearizability, or even taking consistent snapshots. Clock synchronization with epsilon on the order of microseconds would be plenty sufficient for them. 

A more robust epsilon is more important than smaller epsilon. A time synchronization protocol that can provide guaranteed epsilon at the order of 10s of microseconds would be preferable to one that can provide epsilon less than microsecond under ideal conditions but may have milliseconds clock uncertainty under some corner cases.


Some related posts

http://muratbuffalo.blogspot.com/2019/09/do-leases-buy-us-anything.html

http://muratbuffalo.blogspot.com/2021/03/sundial-fault-tolerant-clock.html

http://muratbuffalo.blogspot.com/search?q=failure+detector

http://muratbuffalo.blogspot.com/search?q=time+synchronization


Comments

Jinkun Geng said…
Hi, Murat.
Thanks for the blog post and it is really inspiring. Here, I would like to conduct some follow-up discussion on your statement that "If we had perfect time synchronization, would we be able to solve consensus easier, better, more robust, or more efficiently? No! "

In consensus, one major task (bottleneck) is to achieve ordering agreement for all the incoming requests. Typical protocols, such as Raft and Multi-Paxos use single-leader to do that, and that is why they can often suffer from throughput bottleneck. If we can have good clock synchronization, there are at least two benefits that can be used to accelerate or improve consensus protocols:

(1) Clock sync enables the shift from a single-leader protocol to multi-leader protocol, and the multiple leaders (with good clock sync) independently serialize the incoming requests, and finally the requests from multiple leaders are merged and committed based on the timestamp order. A concrete example is Craft https://www.proquest.com/docview/2467863602?pq-origsite=gscholar&fromopenview=true. Craft leverages the synced clocks to extend Raft to multi-leader protocol so as to increase the throughput for consensus. When it comes to WAN setting, the LAN RTT (client<->leader) becomes trivial. Then compared with RAFT (2 WAN RTT), Craft also saves 1 WAN RTT thanks to clock sync. The later work Ocean Vista (VLDB'19) actually does something similar.

(2) Clock sync also creates opportunities to design new protocols which require less coordination between leaders-followers. This is especially useful for speculative consensus protocols which have 2 paths: fast path and slow path. When conflicts are heavy, the protocol goes to slow path more frequently, which degrades the performance (latency and also throughput in some cases). But if clock sync is used, it helps to improve the fast commit ratio and mitigate conflict cases. A concrete example is TOQ, which is applied to EPaxos https://www.usenix.org/system/files/nsdi21-tollman.pdf [Figure 20]. Synced clocks to decide a common ProcessAt time to process requests, so as to improve fast commit ratio. Another example is Nezha, https://arxiv.org/abs/2206.03285, which leverages clock sync in a different way. It attaches a common deadline for every request. If the deadline is properly set, different replicas can rectify the reordering of requests and process requests in the consistent order without coordination with each other. When clock sync is bad or one-way-delay variance cannot be covered by the deadline, it will trigger the slow path. Just as you mentioned in the blog, accurate clock synchronzation enables a better measurement of one-way-delay. From this perspective, clock sync also provides benefit to accelerate consensus.


Moving a step further, when we extend from linearizability (i.e. single-shard consensus) to strict serializability (consensus + multi-shard concurrency control), clock sync becomes even more important. Like spanner, in order not to violate strict serializability, it makes the write block for the max-error-bound (~7ms) when it encounters conflicts. If the clock error bound is very large, Spanner will suffer from very bad performance. CockroachDB does not have such tightly-bound clocks, so it has to consider a different design (to bump timestamp and retry read txn), https://www.cockroachlabs.com/blog/living-without-atomic-clocks/. But we can see that even CockroachDB (SIGMOD'20) also suffers from a (theorectically) worst-case latency due to the badly synchronized clocks (large max-clock-offset). For NTP, this could be 250ms, so the worst case for CockroachDB to retry will cost 250ms. Therefore, I think that clock synchronzation can indeed play an important role in the domain of consensus and concurency control.

Popular posts from this blog

Hints for Distributed Systems Design

Learning about distributed systems: where to start?

Making database systems usable

Looming Liability Machines (LLMs)

Foundational distributed systems papers

Advice to the young

Linearizability: A Correctness Condition for Concurrent Objects

Scalable OLTP in the Cloud: What’s the BIG DEAL?

Understanding the Performance Implications of Storage-Disaggregated Databases

Designing Data Intensive Applications (DDIA) Book