Posts

Showing posts from March, 2022

Amazon Aurora: Design Considerations + On Avoiding Distributed Consensus for I/Os, Commits, and Membership Changes

Image
Amazon Aurora is a high-throughput cloud-native relational database. I will summarize its design as covered by the Sigmod 17 and Sigmod 18 papers from the Aurora team. Aurora uses MySQL or PostgreSQL for the database instance at top, and decouples the storage to a multi-tenant scale-out storage service.  In this decoupled architecture, each database instance acts as a SQL endpoint and supports query processing, access methods, transactions, locking, buffer caching, and undo management. Some database functions, including redo logging, materialization of data blocks, garbage collection, and backup/restore, are offloaded to the storage nodes. A big innovation in Aurora is to do the replication among the storage nodes by pushing the redo log; this reduces networking traffic and enables fault-tolerant storage that heals without database involvement. In contrast to CockroachDB and FoundationDB , Aurora manages not to use consensus at all. It uses a primary secondary failover at the comp

FoundationDB: A Distributed Unbundled Transactional Key Value Store (Sigmod 2021)

Image
This paper  from Sigmod 2021 presents FoundationDB, a transactional key-value store that supports multi-key strictly serializable transactions across its entire key-space. FoundationDB (FDB, for short) is opensource. The paper says that: " FDB is the underpinning of cloud infrastructure at Apple, Snowflake and other companies, due to its consistency, robustness and availability for storing user data, system metadata and configuration, and other critical information. " The main idea in FDB is to decouple transaction processing from logging and storage. Such an unbundled architecture enables the separation and horizontal scaling of both read and write handling. The transaction system combines optimistic concurrency control (OCC) and multi-version concurrency control (MVCC) and achieves strict serializability or snapshot isolation if desired. The decoupling of logging and the determinism in transaction orders greatly simplify recovery by removing redo and undo log processing f

CockroachDB: The Resilient Geo-Distributed SQL Database

Image
This paper appeared in Sigmod 2020. Here is a link to the 10 minute, but extremely useful, Sigmod presentation . There is also a 1 hour extended presentation . CockroachDB is open source available via GitHub. The core features of the database are under a Business Source License (BSL), which converts to a fully open-source Apache 2.0 license after three years. Storage layer CockroachDB (or CRDB for short) consists of a distributed SQL layer on top of a distributed key-value store. The key value store is laid out in a single monolithic ordered key space. The logical key space is physically realized by dividing it into contiguous ranges of keys which we call ranges. Ranges are about 64 megabytes in size. Ranges start empty, grow, and split when they get too large, and merge with their neighbors when they get too small. The ranges are sorted, and I will talk about why in the SQL discussion later. As we will discuss soon, CRDB uses multi-version concurrency control. Hybrid logical clocks

Popular posts from this blog

Hints for Distributed Systems Design

Learning about distributed systems: where to start?

Making database systems usable

Looming Liability Machines (LLMs)

Foundational distributed systems papers

Advice to the young

Linearizability: A Correctness Condition for Concurrent Objects

Understanding the Performance Implications of Storage-Disaggregated Databases

Scalable OLTP in the Cloud: What’s the BIG DEAL?

Designing Data Intensive Applications (DDIA) Book