Posts

Showing posts from December, 2024

Use of Time in Distributed Databases (part 3): Synchronized clocks in databases

Image
This is part 3 of our "Use of Time in Distributed Databases" series . In this post, we explore how synchronized physical clocks enhance database systems, focusing on research and prototype databases. Discussion of time's role in production databases will follow in our next post. To begin, let's revisit the utility of synchronized clocks in distributed systems. As highlighted in Part 1 , synchronized clocks provide a shared time reference across distributed nodes and partitions. For simple, single-key replication tasks, such precision is often unnecessary and leader-based approaches such as MultiPaxos or Raft is much more appropriate. Even WPaxos might be considered if you need a WAN deployment. Of course, if you want to go very fancy by using a leaderless designs, such as those in the EPaxos family/Tempo/Accord,  then dependency graphs and time synchronization re-enter the picture. The true value of synchronized clocks becomes apparent in distributed multi-key operat...

Use of Time in Distributed Databases (part 2): Use of logical clocks in databases

Image
This is part 2 of our "Use of Time in Distributed Databases" series . We talk about the use of logical clocks in databases in this post. We consider three different approaches: vector clocks dependency graph maintenance epoch service  In the upcoming posts we will allow in physical clocks for timestamping, so there is no (almost no) physical clocks involved in the systems in part 2.    1. Vector clocks Dynamo: Amazon's highly available key-value store (SOSP'07) Dynamo employs sloppy quorums and hinted hand-off and uses version vector (a special case of vector clocks) to track causal dependencies within the replication group of each key. A version vector contains one entry for each replica (thus the size of clocks grows linearly with the number of replicas). The purpose of this metadata is to detect conflicting updates and to be used in the conflict reconciliation function. Dynamo provides eventual consistency thanks to this reconciliation function and conflict detect...

Use of Time in Distributed Databases (part 1)

Image
Distributed systems are characterized by nodes executing concurrently with no shared state and no common clock. Coordination between nodes are needed to satisfy some correctness properties, but since coordination requires message communication there is a performance tradeoff preventing nodes from frequently communicating/coordinating. Timestamping and ordering Why do the nodes, in particular database nodes, need coordinating anyway? The goal of coordination among nodes is to perform event ordering and align independent timelines of nodes with respect to each other. This is why timestamping events and ordering them is very important. Nodes run concurrently without knowing what the other nodes are doing at the moment. They learn about each other's states/events only by sending and receiving messages and this information by definition come from the past state of the nodes. Each node needs to compose a coherent view of the system from these messages and all the while the system is movi...

Utilizing highly synchronized clocks in distributed databases

Image
This master's thesis at Lund University Sweden  explores how CockroachDB 's transactional performance can be improved by using tightly synchronized clocks. The paper addresses two questions: how to integrate high-precision clock synchronization into CockroachDB and the resulting impact on performance. Given the publicly available clock synchronization technologies like Amazon Time Sync Service and ClockBound, the researchers (Jacob and Fabian) argue that the traditional skepticism around the reliability of clocks in distributed systems is outdated.  CockroachDB vs Spanner approaches CockroachDB uses loosely-synchronized NTP clocks, and achieves linearizability by using Hybrid Logical Clocks (HLC) and relying on a static maximum offset (max_offset=500milliseconds) to account for clock skew. However, this approach has limitations, particularly in handling transactional conflicts within a predefined uncertainty interval. When a transaction reads a value with a timestamp falling ...

Best of Metadata in 2024

Image
I can't believe we wasted another good year. It is time to reflect back on the best posts at Metadata blog in 2024. (I think you guys should tip me just because I didn't call this post "Metadata wrapped".) Distributed systems posts Transactional storage for geo-replicated systems(SOSP11):  I like this paper because it asked the right questions, and introduced parallel snapshot isolation. No individual part is novel (vector clocks, csets) but their composition together and application to WAN web applications have been novel. Walter showed how to limit WAN coordination, while still developing useful applications. An Hourglass Architecture for Distributed Systems (SRDS24 Keynote):  This work successfully bridges theoretical research and practical implementation in large-scale distributed systems in Facebook/Meta control plane. The shared log abstraction proposes to separate the system into two layers: the database layer (proposers/learners) and the log layer (acceptors)....

Index for Designing Data Intensive Applications (DDIA) book

The DDIA book is a great textbook, because it is not written as a textbook, but more of a guidebook.  Textbooks are generally bland and boring. Textbooks that are written by professors even more so, because thoser are often written to impress other professors and to flaunt academic flair. Few textbooks take teaching as the primary goal. DDIA book has clear writing, and it is pragmatic. It is as if your smart colleague is filling you in about the fundamentals as well as intricacies (dirty secrets) of their field. It is genuine and authentic. Kudos Kleppmann! Here are my summaries of the DDIA book chapters. A new version of the book is coming soon, and I look forward to seeing the updated content. Designing Data Intensive Applications (DDIA) Book, Chp 1. Intro and Chp2. Data Models and Query Languages DDIA: Chp 3. Storage and Retrieval (Part 1) DDIA: Chp 3. Storage and Retrieval (Part 2) DDIA: Chp 4. Encoding and Evolution (Part 1) DDIA: Chp 4. Encoding and Evolution (Part 2) DDIA: C...

DDIA: Chapter 11 - Stream Processing

Image
Daily batch processes introduce significant latency, since input changes reflected in the output only after a day. For fast paced business, this is too slow. To reduce delays, stream processing occurs more frequently (e.g., every second) or continuously, where events are handled as they happen.  In stream processing, a record is typically called an event—a small, immutable object containing details of an occurrence, often with a timestamp. Polling for new events becomes costly when striving for low-latency continuous processing. Frequent polling increases overhead as most requests return no new data. Instead, systems should notify consumers when new events are available. Messaging systems handle this by pushing events from producers to consumers. Direct messaging systems require application code to handle message loss and assume producers and consumers are always online, limiting fault tolerance. Message brokers (or message queues) improve reliability by acting as intermediaries. P...

Exploring the NaiadClock TLA+ model in TLA-Web

Image
I have been impressed by the usability of TLA-Web from Will Schultz . Recently I have been using it for my TLA+ modeling of MongoDB catalog protocols internally, and found it very useful to explore and understand behavior. This got me thinking that TLA-Web would be really useful when exploring and understanding an unfamiliar spec I picked up on the web. To test my hunch, I browsed through the TLA+ spec examples here ,  and I came across this spec about the Naiad Clock . Since I had read DBSP paper recently , this was all the more interesting to me. I had written about Naiad in 2014 , and about dataflow systems more broadly in 2017 . Getting to the ASCII version of the spec Unfortunately, I would not be able to play with the spec, because it only came in paper form: "The Naiad Clock Protocol: Specification, Model Checking, and Correctness Proof. "  The spec was available only as 13 pages of latex symbols in the Appendix A of this paper. I did briefly consider manually transfor...

Popular posts from this blog

Hints for Distributed Systems Design

Learning about distributed systems: where to start?

Making database systems usable

Looming Liability Machines (LLMs)

Advice to the young

Foundational distributed systems papers

Linearizability: A Correctness Condition for Concurrent Objects

Understanding the Performance Implications of Storage-Disaggregated Databases

Designing Data Intensive Applications (DDIA) Book

Use of Time in Distributed Databases (part 2): Use of logical clocks in databases