Posts

Showing posts from May, 2023

New directions in cloud programming

Image
This paper appeared in CIDR'21 . This paper is also on operationalizing CALM theorem, and is a good companion to the CALM-CRDT paper we covered yesterday. The paper starts by pointing out the challenges of cloud programmability. It says that most developers find it hard to harness the enormous potential of the cloud, and that the cloud has yet to provide a programming environment that exposes the inherent power of the platform. The paper then lays out an agenda for providing a new generation of cloud programming environment to programmers in an evolutionary fashion. I would like to start by challenging the claim that the cloud about not yet providing a noteworthy programming environment. I think there are many examples of successful cloud programming paradigms and frameworks that have emerged in the past decade, such as MapReduce, Resilient Distributed Datasets, Hadoop environment, Spark environment, real time data processing and streaming systems, distributed machine learning syst

Keep CALM and CRDT On

Image
This paper is from VLDB'22. It focuses on the read/querying problem of conflict-free replicated data types (CRDTs). To solve this problem, it proposes extending CRDTs with a SQL API query model, applying the CALM theorem to identify which queries are safe to execute locally on any replica. The answer is of no surprise: monotonic queries can provide consistent observations without coordination. CRDTs To ensure replica consistency in distributed systems, a common method is to enforce strong consistency at the storage layer using traditional distributed coordination techniques such as consensus or transactions. However, for some applications this may create concerns about latency and availability (especially when a quorum is not readily available). Alternatively, developers can use weakly consistent storage models that don't require coordination, but they must ensure consistency at the application level. This is where CRDTs enter the picture, as they can provide a straightforward

Open Versus Closed: A Cautionary Tale

Image
This paper appeared in NSDI'06 . It explores the behavior of open and closed system models. These two models differ in how they handle new job arrivals. In an open system model, new jobs arrive independently of job completions. In contrast, in a closed system model, new jobs are only triggered by job completions, followed by think time. The paper makes a basic but an important point that was missed by many people that were building workload generators and making system design decisions. In this sense this is a great demonstration of the thesis that "The function of academic writing is NOT to communicate your ideas, but to change the ideas of an existing community." The paper shows that while most workload generators model systems as closed systems, in reality the systems are closer to open systems, and conclusions drawn from closed model behavior of the systems do not translate to the performance of the systems in real-world settings.    Here is the gist of the idea. For

Popular posts from this blog

Hints for Distributed Systems Design

Learning about distributed systems: where to start?

Making database systems usable

Looming Liability Machines (LLMs)

Foundational distributed systems papers

Advice to the young

Linearizability: A Correctness Condition for Concurrent Objects

Scalable OLTP in the Cloud: What’s the BIG DEAL?

Understanding the Performance Implications of Storage-Disaggregated Databases

Designing Data Intensive Applications (DDIA) Book