Posts

DDIA: Chp 5. Replication (Part 2)

Image
Chapter 5 of the Designing Data Intensive Applications (DDIA) book discusses strategies and challenges in replicating data across distributed systems. I had covered the first part last week , here is the second part of leaderless replication. Leaderless replication abandons the concept of a leader node, and allows any replica to directly accept writes from clients. This method, while used in some of the earliest replicated data systems, fell out of favor during the dominance of relational databases. However, it regained popularity after Amazon implemented it in their in-house Dynamo system (not to be confused with DynamoDB). This inspired several open-source datastores like Riak, Cassandra, and Voldemort , which are often referred to as Dynamo-style databases. In leaderless replication systems, the concepts of quorums and quorum consistency are crucial. These systems use three key parameters: n: the total number of replicas w: the write quorum (number of nodes that must acknowledge a...

DDIA: Chp 5. Replication (Part 1)

Image
Chapter 5 of the Designing Data Intensive Applications (DDIA) book discusses strategies and challenges in replicating data across distributed systems. Replication is critical for ensuring data availability, fault tolerance, and scalability. One of the key challenges in replication is maintaining consistency across multiple replicas. Leader-based replication, the most common method, involves a single leader node accepting all writes, while follower nodes replicate the leader's changes. While single leader replication does not guarantee consistency readily (due to leader failover cornercases), it gives us a fighting chance. One primary advantage of leader-based replication is its simplicity: all write requests are serialized through the leader, ensuring a consistent order of operations. Followers receive a stream of changes (i.e., the replication log) from the leader and apply them in sequence, ensuring eventual consistency. However, replication lag, the delay between when a write i...

FlexiRaft: Flexible Quorums with Raft

Image
This paper appeared in CIDR23 and is from Meta (wow, this is the first time I used the new name without needing to mention it is in fact Facebook... wait.. goddammit). The paper talks about how they applied Raft to MySQL replication, and used the flexible quorums in the process. This is not a technically deep paper, but it was interesting to see a practical application of flexible quorums idea to Raft rather than Paxos. The most technically interesting part is the adoption of flexible quorums to Raft rather than Paxos. What is the difference? Flexible quorums idea was developed for Paxos's two phases. But, Raft needs to impose an extra requirement on quorums in order to guarantee Leader Completeness: "the new leader must already have all log entries replicated by a majority of nodes in the previous term." The paper does not call this explicitly. This is what the paper says: "For state machine safety, every data commit quorum needs to intersect with every leader elec...

DDIA: Chp 4. Encoding and Evolution (Part 2)

This second part of Chapter 4 of the Designing Data Intensive Applications (DDIA) book discusses methods of data flow in distributed systems, covering dataflow through databases, service calls, and asynchronous message passing. For databases, the process writing to the database encodes the data, and the reading process decodes it. We need both backward and forward compatibility, as older and newer versions of code may coexist during rolling upgrades. The book emphasizes that *data often outlives code*, hence this makes schema evolution crucial. The techniques we discussed in the first part of the chapter for encoding/decoding and backward/forward compatibility of schemas apply here. Most databases avoid rewriting large datasets when schema changes occur, instead opting for simple changes like adding nullable columns. For service calls, the chapter primarily discusses web services, which use HTTP as the underlying protocol. Web services are used not only for client-server communicatio...

Taming Consensus in the Wild (with the Shared Log Abstraction)

Image
This paper recently appeared at ACM SIGOPS Operating Systems Review. It provides an overview of the shared log abstraction in distributed systems, particularly focusing on its application in State Machine Replication (SMR) and consensus protocols. The paper argues that this abstraction can simplify the design and implementation of distributed systems, and can make them more reliable and easier to maintain. What is the shared log abstraction? The shared log abstraction proposes to separate the system into two layers: the database layer (proposers/learners) and the log layer (acceptors). The shared log provides a simple API for appending entries, checking the tail, reading entries, and trimming the log. This separation allows the SMR layer to focus on higher-level concerns without dealing with the complexities of consensus protocols. This is a wisdom packed paper. It approaches the problem more from software engineering and systems/operations perspectives. ( Previous work, the Delos OSD...

DDIA: Chp 4. Encoding and Evolution (Part 1)

Image
This first part of Chapter 4 of the Designing Data Intensive Applications (DDIA) book discusses the concepts of data encoding and evolution in data-intensive applications. As applications inevitably change over time, it's important to build systems that can adapt to these changes easily, a property referred to as evolvability (under maintainability) in Chapter 1 of the book . Different data models handle change differently. Relational databases typically enforce a single schema at any given time, with changes implemented through schema migrations. In contrast, schema-on-read databases allow for a mix of older and newer data formats. Data format or schema change often necessitates corresponding application code changes. However, in large applications, code changes can't always happen instantaneously. This leads to situations where old and new versions of code and data formats coexist in the system. To handle this, we need to maintain both backward compatibility (newer code can ...

Looming Liability Machines (LLMs)

As part of our zoom reading group ( wow, 4.5 years old now ), we discussed a paper that uses LLMs for automatic root cause analysis (RCA) for cloud incidents. This was a pretty straightforward application of LLMs. The proposed system employs an LLM to match incoming incidents to incident handlers based on their alert types, predicts the incident's root cause category, and provides an explanatory narrative. The only customization is through prompt-engineering. Since this is a custom domain, I think a more principled and custom-designed  machine learning system would be more appropriate rather than adopting LLMs. Anyways, the use of LLMs for RCAs spooked me vicerally. I couldn't find the exact words during the paper discussion, but I can articulate this better now. Let me explain. RCA is serious business Root cause analysis (RCA) is the process of identifying the underlying causes of a problem/incident, rather than just addressing its symptoms. One RCA heuristic is asking 5 Why...

DDIA: Chp 3. Storage and Retrieval (Part 2)

Image
This is Chapter 3, part 2 for the Designing Data Intensive Applications (DDIA) book . This focuses on storage and retrieval for OLAP databases.  Analytics, data warehousing, star and snowflake schemas A data warehouse is a dedicated database designed for analytical queries. It houses a read-only replica of data from transactional systems within the organization. This separation ensures that analytics (OLAP) operations do not interfere with OLTP operations which are critical to the operation of the business. Data is extracted from OLTP databases, either through periodic snapshots or a continuous stream of updates. It's then transformed into a format optimized for analysis, cleaned to remove inconsistencies, and loaded into the data warehouse. This process is known as Extract-Transform-Load (ETL). Large enterprises often have massive data warehouses containing petabytes of transaction history stored in a "fact table". The "fact table" represents individual events ...

DDIA: Chp 3. Storage and Retrieval (Part 1)

Image
This is Chapter 3, part 1 for the Designing Data Intensive Applications (DDIA) book.  This part focuses on storage and retrieval for OLTP databases. Even if you won't be implementing a storage engine from scratch, it is still important to understand how databases handle storage and retrieval internally. This knowledge allows you to select and tune a storage engine that best fits your application's needs. The performance of your application often hinges on the efficiency of the storage engine, especially when dealing with different types of workloads, such as transactional (involving many small, frequent read and write operations) or analytical (involving block reads for executing complex queries over large datasets). There are two main families of storage engines: log-structured storage engines (e.g., LevelDB, RocksDB) and page-oriented storage engines like B-trees (e.g., PostGres, InnoDB/MySQL, WiredTiger/MongoDB). Log-Structured Storage Engines These engines use an append-onl...

Making database systems usable

Image
C. J. Date's Sigmod 1983 keynote, "Database Usability", was prescient. Usability is the most important thing to the customers. They care less about impressive benchmarks or clever algorithms, and more about whether they can operate and use a database efficiently to query, update, analyze, and persist their data with minimal headache. (BTW, does anyone have a link to the contents of this Sigmod'83 talk? There is no transcript around, except for this short abstract .) The paper we cover today is from Sigmod 2007. It takes on the database usability problem raised in that 1983 keynote head-on, and calls out that the king is still naked.  Let's give some context for the year 2007. Yes, XML format was still popular then. The use-case in the paper is XQuery. The paper does not contain any  reference to json. MongoDB would be released in 2009 with the document model; and that seems to be great timing for some of the usability pains mentioned in the paper! Web 2.0 was in ...

Popular posts from this blog

Hints for Distributed Systems Design

My Time at MIT

Scalable OLTP in the Cloud: What’s the BIG DEAL?

Foundational distributed systems papers

Advice to the young

Learning about distributed systems: where to start?

Distributed Transactions at Scale in Amazon DynamoDB

Optimize for momentum

Making database systems usable

Use of Time in Distributed Databases (part 1)