Posts

Showing posts from August, 2024

Taming Consensus in the Wild (with the Shared Log Abstraction)

Image
This paper recently appeared at ACM SIGOPS Operating Systems Review. It provides an overview of the shared log abstraction in distributed systems, particularly focusing on its application in State Machine Replication (SMR) and consensus protocols. The paper argues that this abstraction can simplify the design and implementation of distributed systems, and can make them more reliable and easier to maintain. What is the shared log abstraction? The shared log abstraction proposes to separate the system into two layers: the database layer (proposers/learners) and the log layer (acceptors). The shared log provides a simple API for appending entries, checking the tail, reading entries, and trimming the log. This separation allows the SMR layer to focus on higher-level concerns without dealing with the complexities of consensus protocols. This is a wisdom packed paper. It approaches the problem more from software engineering and systems/operations perspectives. ( Previous work, the Delos OSD

DDIA: Chp 4. Encoding and Evolution (Part 1)

Image
This first part of Chapter 4 of the Designing Data Intensive Applications (DDIA) book discusses the concepts of data encoding and evolution in data-intensive applications. As applications inevitably change over time, it's important to build systems that can adapt to these changes easily, a property referred to as evolvability (under maintainability) in Chapter 1 of the book . Different data models handle change differently. Relational databases typically enforce a single schema at any given time, with changes implemented through schema migrations. In contrast, schema-on-read databases allow for a mix of older and newer data formats. Data format or schema change often necessitates corresponding application code changes. However, in large applications, code changes can't always happen instantaneously. This leads to situations where old and new versions of code and data formats coexist in the system. To handle this, we need to maintain both backward compatibility (newer code can

Looming Liability Machines (LLMs)

As part of our zoom reading group ( wow, 4.5 years old now ), we discussed a paper that uses LLMs for automatic root cause analysis (RCA) for cloud incidents. This was a pretty straightforward application of LLMs. The proposed system employs an LLM to match incoming incidents to incident handlers based on their alert types, predicts the incident's root cause category, and provides an explanatory narrative. The only customization is through prompt-engineering. Since this is a custom domain, I think a more principled and custom-designed  machine learning system would be more appropriate rather than adopting LLMs. Anyways, the use of LLMs for RCAs spooked me vicerally. I couldn't find the exact words during the paper discussion, but I can articulate this better now. Let me explain. RCA is serious business Root cause analysis (RCA) is the process of identifying the underlying causes of a problem/incident, rather than just addressing its symptoms. One RCA heuristic is asking 5 Why&#

DDIA: Chp 3. Storage and Retrieval (Part 2)

Image
This is Chapter 3, part 2 for the Designing Data Intensive Applications (DDIA) book . This focuses on storage and retrieval for OLAP databases.  Analytics, data warehousing, star and snowflake schemas A data warehouse is a dedicated database designed for analytical queries. It houses a read-only replica of data from transactional systems within the organization. This separation ensures that analytics (OLAP) operations do not interfere with OLTP operations which are critical to the operation of the business. Data is extracted from OLTP databases, either through periodic snapshots or a continuous stream of updates. It's then transformed into a format optimized for analysis, cleaned to remove inconsistencies, and loaded into the data warehouse. This process is known as Extract-Transform-Load (ETL). Large enterprises often have massive data warehouses containing petabytes of transaction history stored in a "fact table". The "fact table" represents individual events

DDIA: Chp 3. Storage and Retrieval (Part 1)

Image
This is Chapter 3, part 1 for the Designing Data Intensive Applications (DDIA) book.  This part focuses on storage and retrieval for OLTP databases. Even if you won't be implementing a storage engine from scratch, it is still important to understand how databases handle storage and retrieval internally. This knowledge allows you to select and tune a storage engine that best fits your application's needs. The performance of your application often hinges on the efficiency of the storage engine, especially when dealing with different types of workloads, such as transactional (involving many small, frequent read and write operations) or analytical (involving block reads for executing complex queries over large datasets). There are two main families of storage engines: log-structured storage engines (e.g., LevelDB, RocksDB) and page-oriented storage engines like B-trees (e.g., PostGres, InnoDB/MySQL, WiredTiger/MongoDB). Log-Structured Storage Engines These engines use an append-onl

Making database systems usable

Image
C. J. Date's Sigmod 1983 keynote, "Database Usability", was prescient. Usability is the most important thing to the customers. They care less about impressive benchmarks or clever algorithms, and more about whether they can operate and use a database efficiently to query, update, analyze, and persist their data with minimal headache. (BTW, does anyone have a link to the contents of this Sigmod'83 talk? There is no transcript around, except for this short abstract .) The paper we cover today is from Sigmod 2007. It takes on the database usability problem raised in that 1983 keynote head-on, and calls out that the king is still naked.  Let's give some context for the year 2007. Yes, XML format was still popular then. The use-case in the paper is XQuery. The paper does not contain any  reference to json. MongoDB would be released in 2009 with the document model; and that seems to be great timing for some of the usability pains mentioned in the paper! Web 2.0 was in

Linearizability: A Correctness Condition for Concurrent Objects

Image
This paper is from Herlihy and Wing appeared in ACM Transactions on Programming Languages and Systems 1990. This is the canonical reference for the linearizability definition. I had not read this paper in detail before, so I thought it would be good to go to the source to see if there are additional delightful surprises in the original text. Hence, this post. I will dive into a technical analysis of the paper first, and then discuss some of my takes toward the end. I had written an accessible explanation of linearizability earlier; you may want to read that first. I will assume an understanding of linearizability to keep this review at reasonable length. Introduction I love how the old papers just barge in with the model, without bothered by pleasantries such as motivation of the problem. These are the first two sentences of the introduction. "A concurrent system consists of a collection of sequential processes that communicate through shared typed objects . This model encompass

Designing Data Intensive Applications (DDIA) Book

Image
We started reading this book as part of Alex Petrov's book club . We just got started, so you can join us, by joining the discord channel above. We meet Wednesday's 11am Eastern Time.  Previously we had read transaction processing book by Grey and Reuters. This page links to my summaries of that book. Chp 1. Reliable, Scalable, and Maintainable Applications I love the diagrams opening each chapter. Beautiful! The first chapter consists of warm up stuff. It talks about the definitions of reliabilty, scalability, and maintainability. It is still engaging, because  it is written with an educator and technical blogger voice, rather than a dry academic voice. This book came out on 2017. Martin is working on the new version. So if you have comments for things to focus on for the new version, it would be helpful to collect them in a document and email it to Martin. For example, I am curious about how the below paragraph from the Preface will get revised with 8 more years of hindsight:

Index for the Transaction Processing Book

We finished covering this book in the reading group. So I am creating this post as an index to the chapters I summarized. In the future, I like to revisit this page, and write an overall evaluation and lessons learned to get some more closure. Foreword, Chp1: Introduction, Chp2: Basic Computer Terms Chp3: Fault-tolerance Chp4: Transaction models Chp5/6: Transaction processing monitors Chp7: Isolation concepts  Chp7: Isolation concepts (Part 2) A critique of ANSI SQL isolation layers (followup) I wanted to write a summary on the lock implementation, and especially about how they found a way to approximate/implement multiple granularity predicate locking through hierarchical locking. Alas, I got distracted, and moved on. I didn't write about the Recovery sections, but my review of WAL provides a good high level summary . Metadata Here is some fun reading about how this book came to be written.  I was impressed by the quantitative approach in the book, and it turns out Jim was inspire

Popular posts from this blog

Hints for Distributed Systems Design

Learning about distributed systems: where to start?

Making database systems usable

Looming Liability Machines (LLMs)

Foundational distributed systems papers

Advice to the young

Linearizability: A Correctness Condition for Concurrent Objects

Scalable OLTP in the Cloud: What’s the BIG DEAL?

Understanding the Performance Implications of Storage-Disaggregated Databases

Designing Data Intensive Applications (DDIA) Book