Tiga: Accelerating Geo-Distributed Transactions with Synchronized Clocks

This paper (to appear at SOSP'25) is one of the latest efforts exploring the dream of a one-round commit for geo-replicated databases. TAPIR tried to fuse concurrency control and consensus into one layer. Tempo and Detock went further using dependency graphs. 

Aleksey and I did our usual thing. We recorded our first blind read of the paper. I also annotated a copy while reading, which you can access here.

We liked the paper overall. This is a thoughtful piece of engineering, not a conceptual breakthrough. It uses future timestamps to align replicas in a slightly new way, and the results are solid. But the presentation needs refinement and stronger formalization. (See our livereading video about how these problems manifested themselves.) Another study to add to my survey, showing how, with modern clocks, time itself is becoming a coordination primitive.


The Big Idea

Tiga claims to do strictly serializable, fault-tolerant transactions in one wide-area round trip (1-WRTT) most of the time by predicting/tracking the future commit times of the transactions. Instead of waiting for messages to arrive and then ordering them, Tiga assigns each transaction a future timestamp at submission.

If all goes well, the transaction arrives before that timestamp at all replicas, waits until the local clock catches up, and then executes in order.

There is no dependency graph to track. Synchronized clocks and flight-of-message prediction promise to still get us strict serializability with 1-WRTT for most cases. Well, at least for more cases than the competition. You don't need to outrun the bear, but just the other campers.

This is essentially the Deadline-Ordered Multicast (DOM) idea from the Nezha paper. Figures 1–2 in the paper show the contrast with Tapir. Tapir commits optimistically and fails when transactions arrive in different orders at different regions. Tiga fixes this by giving both transactions predetermined timestamps: all servers delay execution until their clocks reach those timestamps, ensuring consistent order.

Tiga also merges consensus and concurrency control into a single timestamp-based protocol. I think the "Unanimous 2PC: Fault-tolerant Distributed Transactions Can be Fast and Simple" is a very relevant protocol to compare with here, but unfortunately, Tiga fails to cite U2PC.


Algorithm in a Nutshell

In the best case (steps 1-3), Tiga commits a transaction in 1-WRTT, essentially by predicting the correct global order instead of discovering it. If the prediction falters, steps 4-6 reconcile timestamps and logs, recovering correctness at the cost of another half to one full round trip.

1. Timestamp Initialization: The coordinator uses the measured one-way delays (OWDs) to each replica to predict when the transaction should arrive everywhere. It assigns the transaction a future timestamp t = send_time + max_OWD + Δ, where Δ is a small safety headroom (≈10 ms). This t represents the intended global serialization time. The coordinator then multicasts the transaction T and its timestamp to all shards.

2. Optimistic Execution: Upon receipt, each server buffers T in a priority queue sorted by timestamp. When the local clock reaches t, followers simply release T (they do not execute yet) while leaders execute T optimistically, assuming their local timestamp ordering will hold. The green bars in Figure 3 mark this optimistic execution phase.

3. Quorum Check of Fast Path: The coordinator collects fast-replies from a super quorum on each shard (the leader + f + ⌈f / 2⌉ followers). If the replies agree on the same log hash and timestamp, T is fast-committed. This completes the ideal 1-WRTT commit: half a round from coordinator to replicas, half back. (The other leader-inclusive paper I remember is Nezha, prior work to this one.)

4. Timestamp Agreement: Sometimes leaders execute with slightly different timestamps due to delays or clock drift. They then exchange their local timestamps to compute a common agreed value (the maximum). If all timestamps already match, the process costs 0.5 WRTT. If some leaders lag, another half round (total 1-WRTT) ensures alignment. If any executed with an older timestamp, that execution is revoked and T is re-executed at the new agreed time (slow path). This phase corresponds to the curved inter-leader arrows in the figure.

5. Log Synchronization: After leaders finalize timestamps, they propagate the consistent log to their followers. Followers update their logs to match the leader’s view and advance their sync-point. This ensures replicas are consistent before commit acknowledgment. The figure shows this as another 0.5 WRTT of leader-to-follower synchronization.

6. Quorum Check of Slow Path: Finally, the coordinator verifies that enough followers (≥ f) have acknowledged the synchronized logs. Once that quorum is reached, T is committed via the slow path. Even in this fallback case, the total latency stays within 1.5–2 WRTTs.

I am skipping details and optimizations. Leaders across many shards being located in the same datacenter/AZ is an optimization to improve the latency of timestamp-agreement (that this paper seem to have borrowed from the recent OSDI'25 Mako paper.) This then opens the door for a preventive flavor of the Tiga workflow as shown in Figure 6.

Evaluation Highlights

Running on Google Cloud across three regions, Tiga outperforms Janus, Tapir, and Calvin+ by 1.3–7x in throughput and 1.4–4x lower latency. In low-contention microbenchmarks, it easily sustains 1-WRTT commits. Under high contention, Calvin+ catches up somewhat but with 30% higher latency. Calvin+ replaces Calvin's Paxos-based consensus layer with Nezha, saving at least 1-WRTT in committing transactions. A lot of work must have gone into these evaluation results.




Comments

Popular posts from this blog

Hints for Distributed Systems Design

My Time at MIT

Scalable OLTP in the Cloud: What’s the BIG DEAL?

Foundational distributed systems papers

Learning about distributed systems: where to start?

Advice to the young

Distributed Transactions at Scale in Amazon DynamoDB

Disaggregation: A New Architecture for Cloud Databases

Making database systems usable

Looming Liability Machines (LLMs)