GaussDB-Global: A Geographically Distributed Database System

This paper, presented in the industry track of ICDE 2024, introduces GaussDB-Global (GlobalDB), Huawei's geographically distributed database system. GlobalDB replaces the centralized transaction management (GTM) of GaussDB with a decentralized system based on synchronized global clocks (GClock). This approach mirrors Google Spanner's TrueTime approach and its commit-wait technique, which provides externally serializable transactions by waiting out the uncertainty interval. However, GlobalDB claims compatibility with commodity hardware, avoiding the need for specialized networking infrastructure for synchronized clock distribution.

The GClock system uses GPS receivers and atomic clocks as the global time source device at each regional cluster. Each node synchronizes its clock with the global time source over TCP every 1 millisecond. Clock deviation is kept low because synchronization is achieved within 60 microseconds as a TCP round trip, and the CPU’s clock drift is bounded within 200 parts per million. The GClock timestamp includes a clock time and an error bound ($T_{err}$), which accounts for network latency and clock drift. The timestamp generation in GlobalDB the formula $TS_{GClock} = T_{clock} + T_{err}$ where $T_{err} = T_{sync} + T_{drift}$.

Syncronizing over TCP every millisecond seems extreme, and the paper does not go into detail about how tight-synchronization is achieved, and how reliable is $T_{err}$. They make a passing reference to the FaRM paper (SIGMOD 2019), and they seem to have adapted their synchronization mechanism from that work.

Transactions in GlobalDB use the following protocol to obtain timestamps:  

  • Invocation: Wait until \( T_{clock} > TS_{GClock} \) and begin the transaction. (Single-shard queries bypass this wait by using the node’s last committed transaction timestamp.)  
  • Commit: Wait until \( T_{clock} > TS_{GClock} \) and commit.

GlobalDB introduces two major algorithmic components:  

  1. Seamless Transition Between Centralized and Decentralized Modes:  GlobalDB supports DUAL mode, enabling zero-downtime transitions between centralized (GTM) and decentralized (GClock) transaction management. 
  2. Flexible Asynchronous Replication: GlobalDB supports asynchronous replication with strong consistency guarantees for reads on replicas. This is achieved through a Replica Consistency Point (RCP), which ensures that all replicas provide a consistent snapshot of the database, even if they are not fully up-to-date.  

It seems like point one, transitioning between the centralized and decentralized, is also heavily inspired by the FaRM paper's protocol on this. I will explain this below.

The second point,  asynchronous replication scheme, raises questions about durability. If the primary crashes before logs are sent to replicas, data loss could occur. The paper does not fully address this issue, leaving it unclear how GlobalDB ensures durability in such scenarios.


So what do we get with GlobalDB?  GlobalDB’s use of synchronized clocks gives us decentralized transaction management, removing the need for a centralized service to order transactions. This improves throughput, especially in geo-distributed deployments. Another key benefit is that synchronized clocks enhance read performance on asynchronous local replicas by returning consistent response from a global snapshot of the database at a given time.

OK, that was the overview. Now we can discuss the architecture and protocol details. 


Architecture

GaussDB is a **shared-nothing distributed database** consisting of:  

  • Computing Nodes (CNs): Stateless nodes that handle query parsing, planning, and coordination.  
  • Data Nodes (DNs): Host portions of tables based on hash or range partitioning. Replica DNs are placed remotely for high availability.  
  • Global Transaction Manager (GTM): A lightweight centralized service that provides timestamps for transaction invocation and commit.  

GlobalDB replaces the GTM with decentralized transaction management using GClock, improving scalability and performance, especially in geo-distributed deployments. Primary DNs continuously transmit updates to replica nodes in the form of **Redo logs**. The paper argues that asynchronous replication avoids the performance degradation of waiting for remote replicas, but it does not discuss how to deal with potential durability gaps, as the primary may crash before logs are replicated.


Clock Transition Between Centralized and Decentralized Modes

As we mentioned above, a key innovation in GlobalDB is the DUAL mode, which enables seamless transitions (in both directions) between centralized (GTM) and decentralized (GClock) transaction management without downtime. This is critical for maintaining system availability during upgrades or failures, such as when global clock synchronization fails. The DUAL mode uses a hybrid timestamp mechanism:  \( TS_{DUAL} = \max(TS_{GTM}, TS_{GClock}) + 1 \)  

A DUAL mode timestamp TSDUAL is guaranteed to be larger than both the most recent GTM timestamp and clock upperbound.  This ensures monotonicity by guaranteeing that new timestamps are always larger than both the most recent GTM and GClock timestamps. Transitions require waiting for \( 2 \times T_{err} \) to prevent anomalies like stale reads due to timestamp inversion. By acting as a bridge between the two modes, DUAL mode allows the system to remain fully operational during transitions, supporting live upgrades and fault recovery while meeting enterprise SLAs.



Asynchronous Replication with Strong Consistency

GlobalDB employs asynchronous replication with strong consistency guarantees for reads on replicas, achieved through a Replica Consistency Point (RCP). In some ways, this reminds me of the Volume Complete LSN calculation in Amazon Aurora. The RCP is a globally consistent snapshot calculated as the minimum of the maximum commit timestamps across all replicas. This ensures that any transaction committed before the RCP is visible and consistent across the system, even if replicas are not fully up-to-date.  To maintain progress, heartbeat transactions prevent RCP stagnation on idle replicas. This approach allows GlobalDB to offer strong consistency for read-only queries on replicas, even in an asynchronous replication setup. Users can query nearby replicas for faster response times, without worrying about getting inconsistent or incorrect results.

Data Definition Language (DDL) statements (such as CREATE TABLE or DROP INDEX) impose extra restrictions on using RCP for reads. GlobalDB ensures consistency for read-only queries (ROR) by requiring that the RCP is greater than either the largest DDL timestamp or the timestamp of each table involved in the query, ensuring compatibility with schema changes.

I think several challenges still remain: calculating RCP across regions introduces latency proportional to replica count, frequent updates can widen gaps between primary and replica timestamps, and asynchronous replication risks data loss if the primary fails before logs are replicated.


Evaluation

The paper evaluates GlobalDB on both single-region and geo-distributed clusters using TPC-C and Sysbench. The results show 14x higher read throughput and 50% higher TPC-C throughput compared to the baseline system. Geo-distributed setups achieve 91% of the throughput of a co-located cluster, despite added network latency. 

However, the evaluation has severe limitations. Tests use Linux `tc` to simulate delays, not representing real-world variable latency and packet loss. Moreover, the baseline is Huawei’s prior centralized system, with no comparisons to other distributed databases like Spanner, CockroachDB, or Yugabyte.  



Comments

Popular posts from this blog

Hints for Distributed Systems Design

Learning about distributed systems: where to start?

Making database systems usable

Looming Liability Machines (LLMs)

Advice to the young

Foundational distributed systems papers

Distributed Transactions at Scale in Amazon DynamoDB

Linearizability: A Correctness Condition for Concurrent Objects

Understanding the Performance Implications of Storage-Disaggregated Databases

Designing Data Intensive Applications (DDIA) Book