Posts

Showing posts from April, 2023

Empowering Azure Storage with RDMA

Image
This paper appeared in Usenix'23 last week. The paper presents the experience of deploying across datacenter  (i.e., intra-region) Remote Direct Memory Access (RDMA) to support storage workloads in Azure. The paper reports that around 70% of traffic in Azure is RDMA and intra-region RDMA is supported in all Azure public regions. RDMA is a network technology that offloads the network stack to the network interface card (NIC) hardware. By allowing direct memory access from one computer to another without involving the OS or the CPU, RDMA helps achieve low latency, high throughput and near zero CPU overhead. This means that RDMA frees up CPU cores from processing networking packets, and allows Azure to sell these CPU cycles as customer virtual machines (VMs) or use for application processing. Although RDMA solutions have been around and being deployed at small scales for a decade now, the paper provides an experience report from a large production system, and talks about practical ch...

The end of a myth: Distributed transactions can scale

Image
This paper appeared in VLDB'17. The paper presents NAM-DB, a scalable distributed database system that uses RDMA (mostly 1-way RDMA) and a novel timestamp oracle to support snapshot isolation (SI) transactions. NAM stands for network-attached-memory architecture, which leverages RDMA to enable compute nodes talk directly to a pool of memory nodes. Remote direct memory access (RDMA) allows bypassing the CPU when transferring data from one machine to another. This helps relieve a major factor in scalability of distributed transactions: the CPU overhead of the TCP/IP stack. With so many messages to process, CPU may spend most of the time serializing/deserializing network messages, leaving little room for the actual work. We had seen this phenomena first hand when we were researching the performance bottlenecks of Paxos protocols. This paper reminds me of the "Is Scalable OLTP in the Cloud a Solved Problem? (CIDR 2023)" which we reviewed recently. The two papers share one ...

ARIES: A Transaction Recovery Method Supporting Fine-Granularity Locking and Partial Rollbacks

Image
This paper is from IBM, 1992. This is a foundational paper in databases area. ARIES achieves long-running transaction recovery in a performant/nonblocking fashion. It is more complicated than simple (write-ahead-log) WAL-based per-action-recovery, as it needs to preserve the Atomicity and Durability properties for ACID transactions. Any transactional database worth its salt (including PostGres, Oracle, MySQL) implements recovery techniques based on the ARIES principles. "I have this condition... It's my memory." --From the movie Memento Background There is memory and there is disk (these days it is SSD, back in the old days it was a rotating hard disk). Memory is fast, but not persistent. Disk is durable, but slow. We want both fast and durable. We might execute and commit a transaction in-memory to achieve fast execution, but a committed transaction should also be durable. Flushing each transaction to the disk would add long I/O stalls before each commit. So it looks li...

Popular posts from this blog

Hints for Distributed Systems Design

Learning about distributed systems: where to start?

Making database systems usable

Looming Liability Machines (LLMs)

Foundational distributed systems papers

Advice to the young

Linearizability: A Correctness Condition for Concurrent Objects

Understanding the Performance Implications of Storage-Disaggregated Databases

Scalable OLTP in the Cloud: What’s the BIG DEAL?

Designing Data Intensive Applications (DDIA) Book