Posts

New directions in cloud programming

Image
This paper appeared in CIDR'21 . This paper is also on operationalizing CALM theorem, and is a good companion to the CALM-CRDT paper we covered yesterday. The paper starts by pointing out the challenges of cloud programmability. It says that most developers find it hard to harness the enormous potential of the cloud, and that the cloud has yet to provide a programming environment that exposes the inherent power of the platform. The paper then lays out an agenda for providing a new generation of cloud programming environment to programmers in an evolutionary fashion. I would like to start by challenging the claim that the cloud about not yet providing a noteworthy programming environment. I think there are many examples of successful cloud programming paradigms and frameworks that have emerged in the past decade, such as MapReduce, Resilient Distributed Datasets, Hadoop environment, Spark environment, real time data processing and streaming systems, distributed machine learning syst

Keep CALM and CRDT On

Image
This paper is from VLDB'22. It focuses on the read/querying problem of conflict-free replicated data types (CRDTs). To solve this problem, it proposes extending CRDTs with a SQL API query model, applying the CALM theorem to identify which queries are safe to execute locally on any replica. The answer is of no surprise: monotonic queries can provide consistent observations without coordination. CRDTs To ensure replica consistency in distributed systems, a common method is to enforce strong consistency at the storage layer using traditional distributed coordination techniques such as consensus or transactions. However, for some applications this may create concerns about latency and availability (especially when a quorum is not readily available). Alternatively, developers can use weakly consistent storage models that don't require coordination, but they must ensure consistency at the application level. This is where CRDTs enter the picture, as they can provide a straightforward

Open Versus Closed: A Cautionary Tale

Image
This paper appeared in NSDI'06 . It explores the behavior of open and closed system models. These two models differ in how they handle new job arrivals. In an open system model, new jobs arrive independently of job completions. In contrast, in a closed system model, new jobs are only triggered by job completions, followed by think time. The paper makes a basic but an important point that was missed by many people that were building workload generators and making system design decisions. In this sense this is a great demonstration of the thesis that "The function of academic writing is NOT to communicate your ideas, but to change the ideas of an existing community." The paper shows that while most workload generators model systems as closed systems, in reality the systems are closer to open systems, and conclusions drawn from closed model behavior of the systems do not translate to the performance of the systems in real-world settings.    Here is the gist of the idea. For

Empowering Azure Storage with RDMA

Image
This paper appeared in Usenix'23 last week. The paper presents the experience of deploying across datacenter  (i.e., intra-region) Remote Direct Memory Access (RDMA) to support storage workloads in Azure. The paper reports that around 70% of traffic in Azure is RDMA and intra-region RDMA is supported in all Azure public regions. RDMA is a network technology that offloads the network stack to the network interface card (NIC) hardware. By allowing direct memory access from one computer to another without involving the OS or the CPU, RDMA helps achieve low latency, high throughput and near zero CPU overhead. This means that RDMA frees up CPU cores from processing networking packets, and allows Azure to sell these CPU cycles as customer virtual machines (VMs) or use for application processing. Although RDMA solutions have been around and being deployed at small scales for a decade now, the paper provides an experience report from a large production system, and talks about practical ch

The end of a myth: Distributed transactions can scale

Image
This paper appeared in VLDB'17. The paper presents NAM-DB, a scalable distributed database system that uses RDMA (mostly 1-way RDMA) and a novel timestamp oracle to support snapshot isolation (SI) transactions. NAM stands for network-attached-memory architecture, which leverages RDMA to enable compute nodes talk directly to a pool of memory nodes. Remote direct memory access (RDMA) allows bypassing the CPU when transferring data from one machine to another. This helps relieve a major factor in scalability of distributed transactions: the CPU overhead of the TCP/IP stack. With so many messages to process, CPU may spend most of the time serializing/deserializing network messages, leaving little room for the actual work. We had seen this phenomena first hand when we were researching the performance bottlenecks of Paxos protocols. This paper reminds me of the "Is Scalable OLTP in the Cloud a Solved Problem? (CIDR 2023)" which we reviewed recently. The two papers share one

ARIES: A Transaction Recovery Method Supporting Fine-Granularity Locking and Partial Rollbacks

Image
This paper is from IBM, 1992. This is a foundational paper in databases area. ARIES achieves long-running transaction recovery in a performant/nonblocking fashion. It is more complicated than simple (write-ahead-log) WAL-based per-action-recovery, as it needs to preserve the Atomicity and Durability properties for ACID transactions. Any transactional database worth its salt (including PostGres, Oracle, MySQL) implements recovery techniques based on the ARIES principles. "I have this condition... It's my memory." --From the movie Memento Background There is memory and there is disk (these days it is SSD, back in the old days it was a rotating hard disk). Memory is fast, but not persistent. Disk is durable, but slow. We want both fast and durable. We might execute and commit a transaction in-memory to achieve fast execution, but a committed transaction should also be durable. Flushing each transaction to the disk would add long I/O stalls before each commit. So it looks li

Getting schooled by AI, colleges must evolve

As we enter the age of AI, it becomes more important for knowledge workers to excel in their strengths, and aim big for some strikes instead of settling for a comfortable average all around. How can colleges reform to endow graduates better? Here are my opinions, for what they are worth. Human skills for the AI era In the age of AI, doing rather than knowing becomes more important. Shallow information is worthless, but mastery of principles, critical thinking, and synthesis is priceless. Colleges should teach collaboration, entrepreneurship/innovation, communication/writing, and critical thinking and problem solving skills. How can colleges reform to cultivate these skills? First, they should transition from zero-sum mentality to the win-win mentality. This is not easy, because the system has been built on making students compete against each other and stack-ranking them. I don't know what kind of structural changes and scaffolding can help for this. I have some practical advice

Characterizing Microservice Dependency and Performance: Alibaba Trace Analysis

Image
This paper got the best paper award at SOCC 2021 . The paper conducts a comprehensive study of large scale microservices deployed in Alibaba clusters.  They analyze the behavior of more than 20,000 microservices in a 7-day period and profile their characteristics based on the 10 billion call traces collected. They find that: microservice graphs are dynamic in runtime most graphs are scattered to grow like a tree size of call graphs follows a heavy-tail distribution Based on their findings, they offer some practical tips about improving microservice runtime performance. They also develop a stochastic model to simulate microservice call graph dependencies and show that it approximates the dataset they collected (which is available at https://github.com/alibaba/clusterdata ). What are microservices? Microservices is a software development approach that divides an application into independently deployable services, owned by small teams organized around business capabilities. Each service c

Aria: A Fast and Practical Deterministic OLTP Database

Image
This paper is from VLDB2020. Aria is an OLTP database that does epoch-based commits similar to the Silo paper we discussed last week. Unlike Silo, which was a single-node database, Aria is a distributed and deterministic database. Aria's biggest contribution is that it improves on Calvin by being able to run transactions without prior knowledge of read and write sets. Another nice thing in Aria is its deterministic re-ordering mechanism to commit transactions in an order that reduces the number of conflicts. Evaluation results on YCSB and TPC-C show that Aria outperforms other protocols by a large margin on a single node and up to a factor of two on a cluster of eight nodes. Aria versus Calvin Recall that Calvin uses locks ( here is a summary of the Calvin paper ). The key idea in Calvin is that read/write locks for a transaction are acquired according to the ordering of input transactions and the transaction is assigned to a worker thread for execution once all needed locks are gr

Popular posts from this blog

The end of a myth: Distributed transactions can scale

Foundational distributed systems papers

Strict-serializability, but at what cost, for what purpose?

Learning about distributed systems: where to start?

Speedy Transactions in Multicore In-Memory Databases

The Seattle Report on Database Research (2022)

Checking statistical properties of protocols using TLA+

Anna: A Key-Value Store For Any Scale

Amazon Aurora: Design Considerations + On Avoiding Distributed Consensus for I/Os, Commits, and Membership Changes

SQLite: Past, Present, and Future