Showing posts from April, 2023

Empowering Azure Storage with RDMA

This paper appeared in Usenix'23 last week. The paper presents the experience of deploying across datacenter  (i.e., intra-region) Remote Direct Memory Access (RDMA) to support storage workloads in Azure. The paper reports that around 70% of traffic in Azure is RDMA and intra-region RDMA is supported in all Azure public regions. RDMA is a network technology that offloads the network stack to the network interface card (NIC) hardware. By allowing direct memory access from one computer to another without involving the OS or the CPU, RDMA helps achieve low latency, high throughput and near zero CPU overhead. This means that RDMA frees up CPU cores from processing networking packets, and allows Azure to sell these CPU cycles as customer virtual machines (VMs) or use for application processing. Although RDMA solutions have been around and being deployed at small scales for a decade now, the paper provides an experience report from a large production system, and talks about practical ch

The end of a myth: Distributed transactions can scale

This paper appeared in VLDB'17. The paper presents NAM-DB, a scalable distributed database system that uses RDMA (mostly 1-way RDMA) and a novel timestamp oracle to support snapshot isolation (SI) transactions. NAM stands for network-attached-memory architecture, which leverages RDMA to enable compute nodes talk directly to a pool of memory nodes. Remote direct memory access (RDMA) allows bypassing the CPU when transferring data from one machine to another. This helps relieve a major factor in scalability of distributed transactions: the CPU overhead of the TCP/IP stack. With so many messages to process, CPU may spend most of the time serializing/deserializing network messages, leaving little room for the actual work. We had seen this phenomena first hand when we were researching the performance bottlenecks of Paxos protocols. This paper reminds me of the "Is Scalable OLTP in the Cloud a Solved Problem? (CIDR 2023)" which we reviewed recently. The two papers share one

ARIES: A Transaction Recovery Method Supporting Fine-Granularity Locking and Partial Rollbacks

This paper is from IBM, 1992. This is a foundational paper in databases area. ARIES achieves long-running transaction recovery in a performant/nonblocking fashion. It is more complicated than simple (write-ahead-log) WAL-based per-action-recovery, as it needs to preserve the Atomicity and Durability properties for ACID transactions. Any transactional database worth its salt (including PostGres, Oracle, MySQL) implements recovery techniques based on the ARIES principles. "I have this condition... It's my memory." --From the movie Memento Background There is memory and there is disk (these days it is SSD, back in the old days it was a rotating hard disk). Memory is fast, but not persistent. Disk is durable, but slow. We want both fast and durable. We might execute and commit a transaction in-memory to achieve fast execution, but a committed transaction should also be durable. Flushing each transaction to the disk would add long I/O stalls before each commit. So it looks li

Popular posts from this blog

Learning about distributed systems: where to start?

Hints for Distributed Systems Design

Foundational distributed systems papers

Metastable failures in the wild

The demise of coding is greatly exaggerated

Scalable OLTP in the Cloud: What’s the BIG DEAL?

The end of a myth: Distributed transactions can scale

SIGMOD panel: Future of Database System Architectures

Why I blog

There is plenty of room at the bottom