The purpose of this paper is to revisit how the logical and physical clock concepts are applied in the context of developing distributed data store systems for the cloud and review the choice of clocks in relation to consistency/performance tradeoffs.
The use of clocks in weak consistency data storesDynamo employs sloppy quorums and hinted hand-off and uses version vector (a special case of vector clocks) to track causal dependencies within the replication group of each key. A version vector contains one entry for each replica (thus the size of clocks grows linearly with the number of replicas). The purpose of this metadata is to detect conflicting updates and to be used in the conflict reconciliation function.
Here is a link to my Dynamo review.
COPS is a geo-replicated datastore and it assigns a scalar clock to each object. Clients maintain the last clock value of all objects read in the causal past. Updates piggyback their dependencies when being propagated to other data centers. When a data center receives an update propagated by another data center, it only makes it visible when its dependencies are satisfied. COPS provides a partial form of transactions called causally consistent read-only transactions which return versions of the read objects that belong to a causally consistent snapshot. A two-round protocol implements these transactions. In the worst case the list of dependencies can grow and slow down the system.
Here is a link to my COPS review.
The GentleRain protocol aims to reduce the metadata piggybacked on updates propagation and to eliminate dependency checking procedures. The idea is to only allow a data center to make a remote update visible once all partitions (within the data center) have seen all updates up to the remote update time stamp. Thus, a client that reads a version is automatically ensured to read causally consistent versions in subsequent reads without the need of explicitly checking dependencies or being forced to wait until a causally consistent version is ready. In other words, GentleRain shoehorns causality in to physical clocks by delaying updates.
ORBE uses vector clocks, organized as a matrix, to represent dependencies. The vector clock has an entry per partition and data center. Physical clocks are used for generating read snapshot times, and ORBE can complete read-only transactions in one round by relying on physical clocks.
The use of clocks in strong consistency data storesClock-SI assumes loosely synchronized clocks that only move forward, and provides Snapshot Isolation consistency, where read-only transactions read from a consistent (possibly multi-versioned) snapshot, and other transactions commit if no object written by them was also written concurrently. To ensure safety against clocks skews, Clock-SI introduces delays to read operations. Here is a link to my Clock-SI review.
Google Spanner employs TrueTime (which employs GPS and atomic clocks), and provides a strong consistency property: external consistency, which is also known as strict serializability. To ensure safety against clock skews, Spanner also introduces delays to read operations, and also delays commits in update operations to provide strict serializability. Here is a link to my Spanner review.
Finally CockroachDB, an open-source clone of Spanner, employs Hybrid Logical Clocks (HLC) in order to serialize transactions and ensure Snapshot Isolation and Serializable Snapshot Isolation. Hybrid Logical Clocks (HLC) is my recent work in collaboration with Dr. Sandeep Kulkarni. HLC couples physical clock with a scalar logical clock in order to efficiently order causally related transactions whose uncertainty intervals overlap. Here is a link to our Hybrid Logical Clocks (HLC) work.
I am quoting from the "On the use of Clocks to Enforce Consistency in the Cloud" paper about HLC: "Unlike Spanner, CockroachDB does not assume the availability of specialized hardware to ensure narrow bounds on clock synchronization, but relies on conventional NTP-based clock synchronization that frequently imposes clock skews of several tens of milliseconds. HLC is hence particularly beneficial in this case, at it allows for ensuring external consistency across causally related transactions while sparing from the costs of commit waits."
DiscussionThe paper has an intriguing discussion section. It makes the observation that we do not fully understand the trade-offs between logical and physical clocks yet, and mentions that HLC is an interesting and promising approach to investigate these tradeoffs. It gives some comparisons of the above protocols to show that time (in terms of its precision and comprehensiveness) is a resource that can be a factor in the performance and consistency tradeoffs in distributed data stores. The paper also talks about the costs of totally-ordered versus concurrent operations in distributed datastores. I found that this discussion make similar points with my "distributed is not necessarily more scalable than centralized" post.
Use of clocks in distributed datastores for consistency/performance tradeoffs is certainly an interesting and fruitful research area nowadays.
So how does your favorite data store use clocks/version-stamps? How would changing to a different clock scheme affect performance versus consistency tradeoffs in that data store?
Earlier I had discussed about the use of clocks in Granola, and how upgrading to HLC can improve performance and throughput.