Posts

Best of Metadata in 2024

Image
I can't believe we wasted another good year. It is time to reflect back on the best posts at Metadata blog in 2024. (I think you guys should tip me just because I didn't call this post "Metadata wrapped".) Distributed systems posts Transactional storage for geo-replicated systems(SOSP11):  I like this paper because it asked the right questions, and introduced parallel snapshot isolation. No individual part is novel (vector clocks, csets) but their composition together and application to WAN web applications have been novel. Walter showed how to limit WAN coordination, while still developing useful applications. An Hourglass Architecture for Distributed Systems (SRDS24 Keynote):  This work successfully bridges theoretical research and practical implementation in large-scale distributed systems in Facebook/Meta control plane. The shared log abstraction proposes to separate the system into two layers: the database layer (proposers/learners) and the log layer (acceptors)....

Index for Designing Data Intensive Applications (DDIA) book

The DDIA book is a great textbook, because it is not written as a textbook, but more of a guidebook.  Textbooks are generally bland and boring. Textbooks that are written by professors even more so, because thoser are often written to impress other professors and to flaunt academic flair. Few textbooks take teaching as the primary goal. DDIA book has clear writing, and it is pragmatic. It is as if your smart colleague is filling you in about the fundamentals as well as intricacies (dirty secrets) of their field. It is genuine and authentic. Kudos Kleppmann! Here are my summaries of the DDIA book chapters. A new version of the book is coming soon, and I look forward to seeing the updated content. Designing Data Intensive Applications (DDIA) Book, Chp 1. Intro and Chp2. Data Models and Query Languages DDIA: Chp 3. Storage and Retrieval (Part 1) DDIA: Chp 3. Storage and Retrieval (Part 2) DDIA: Chp 4. Encoding and Evolution (Part 1) DDIA: Chp 4. Encoding and Evolution (Part 2) DDIA: C...

DDIA: Chapter 11 - Stream Processing

Image
Daily batch processes introduce significant latency, since input changes reflected in the output only after a day. For fast paced business, this is too slow. To reduce delays, stream processing occurs more frequently (e.g., every second) or continuously, where events are handled as they happen.  In stream processing, a record is typically called an event—a small, immutable object containing details of an occurrence, often with a timestamp. Polling for new events becomes costly when striving for low-latency continuous processing. Frequent polling increases overhead as most requests return no new data. Instead, systems should notify consumers when new events are available. Messaging systems handle this by pushing events from producers to consumers. Direct messaging systems require application code to handle message loss and assume producers and consumers are always online, limiting fault tolerance. Message brokers (or message queues) improve reliability by acting as intermediaries. P...

Exploring the NaiadClock TLA+ model in TLA-Web

Image
I have been impressed by the usability of TLA-Web from Will Schultz . Recently I have been using it for my TLA+ modeling of MongoDB catalog protocols internally, and found it very useful to explore and understand behavior. This got me thinking that TLA-Web would be really useful when exploring and understanding an unfamiliar spec I picked up on the web. To test my hunch, I browsed through the TLA+ spec examples here ,  and I came across this spec about the Naiad Clock . Since I had read DBSP paper recently , this was all the more interesting to me. I had written about Naiad in 2014 , and about dataflow systems more broadly in 2017 . Getting to the ASCII version of the spec Unfortunately, I would not be able to play with the spec, because it only came in paper form: "The Naiad Clock Protocol: Specification, Model Checking, and Correctness Proof. "  The spec was available only as 13 pages of latex symbols in the Appendix A of this paper. I did briefly consider manually transfor...

Blood draw

[trigger warning: blood] I had my first blood draw in 13 years yesterday. The lengthy gap is not random. My last blood draw had gone horribly wrong. The last time That previous visit had been for a fasting blood draw. Until then, I'd never had issues with blood draw before. Nurses always complimented me on my veins. One of them said that I had "veins like a garden hose, they are impossible to miss." So, I felt zero fear/anxiety for the procedure. But I think I made the mistake of looking at the syringe. My blood flows fast, and blood draw goes twice as quickly for me than for my wife. I saw my bright red blood enthusiastically gushing the syringe. That was the last thing I remember.  Then came nothing. It was the void, and after an indeterminate time, came the reboot.  I literally experienced my brain booting up like an old Intel 488 computer booting up DOS. First the BIOS kicked in, it checked for available memory, scanned disk to locate the operating system, and starte...

Everything is a Transaction: Unifying Logical Concurrency Control and Physical Data Structure Maintenance in Database Management Systems

Image
This paper from CIDR'21 introduces the Deferred Action Framework (DAF). This framework aims to unify transaction control and data structure maintenance under multi-version concurrency control (MVCC) database systems, particularly for complex maintenance tasks like garbage collection and index cleanup. In MVCC systems, transactions and data maintenance tasks (like garbage collection) often interfere, requiring complex coordination. This can lead to issues in system performance and code maintainability, as physical structures (e.g., B+ trees) aren't inherently designed to handle multi-versioning. The paper proposes DAF to handle deferred actions—maintenance tasks that are queued to execute only when they no longer interfere with active transactions. DAF relies on timestamp-based ordering and executes tasks only when their actions won’t affect concurrent transactions. DAF guarantees to process actions deferred at some timestamp t only after all transactions started before t have ...

DDIA: Chp 10. Batch Processing

Image
Batch processing allows large-scale data transformations through bulk-synchronous processing. The simplicity of this approach allowed building reliable, scalable, maintainable applications with it. If you recall, "reliable-scalable-maintainable" was what we set out to learn when we began the DDIA book. This story of MapReduce starts when Google engineers realize there were a lot of repetitive tasks involved when computing over large data. These tasks often involved individually processing elements and then gathering and fusing their output. Interestingly, this bores a striking resemblance to electromechanical IBM card-sorting machines from the 1940-50s. MapReduce also got some inspiration from the map reduce operations in Lisp: (map square '(1 2 3 4)) gives us  (1 4 9 16), and (reduce + '(1 4 9 16))  gives us 30. The key innovation of Google's MapReduce framework was its ability to simplify parallel processing by abstracting away complex network communication and ...

DBSP: Automatic Incremental View Maintenance for Rich Query Languages

Image
Incremental computation represents a transformative (!) approach to data processing. Instead of recomputing everything when your input changes slightly, incremental computation aims to reuse the original output and efficiently update the results. Efficiently means performing work proportional only to input and output changes. This paper introduces DBSP, a programming language inspired by signal processing (hence the name DB-SP). DBSP is  simple, yet it offers extensive computational capabilities. With just four operators, it covers complex database queries, including entire relational algebra, set and multiset computations, nested relations, aggregations, recursive queries, and streaming computations. Basic DBSP operators  The language is designed to capture computation on streams. Streams are represented as infinite vectors indexed by consecutive time. Each stream can represent values of any type and incorporates basic mathematical operations like addition, subtraction, and ...

UB Hacking 2024

I attended the University at Buffalo Hacking event over the weekend. It was fun. There were 90+ projects, I judged 15 projects. There were some interesting talks as well. It was good to see youth energy. It feels good to teach next generation something. Another thing,  GeoGuessr played as a group game under time pressure is a lot of fun. This may be a great family activity. Why should you care about proofs, if all you want to do is coding? Atri and Andrew decided this would be a good talk to give at a hackathon. Daring! They did a good job imparting their passion about the joys and benefits of mathematical thinking. They talked about Paul Erdos 's the book of proofs concept, and the difference between a  correct proof versus a great proof from "the book". They talked about the deep insight that you can achieve through an abstract mathematical thinking. They also mentioned that if you don't have good insight to the problem or your program, you will have a hard time de...

DDIA: Chp 9. Consistency and Consensus

Image
The chapter 9 of the Designing Data Intensive Applications (DDIA) book has the following 4 sections (which contain a total of 10 subsections).   Consistency guarantees Linearizability Ordering guarantees Distributed transactions and consensus TMI (Too much info) The chapter tries to do too much. Almost an entire semester of distributed systems content is force-jammed into this chapter. I don't think the discord reading group will be able to complete this chapter and make sense of what is going on. For a more manageable load, much of this chapter should be moved to other chapters. The chapter starts with explaining linearizability. It would have been better to explain this in Chapter 5, Replication. Linearizability sounds easy, but it takes time to internalize. I had discussed linearizability here , and later revisited it with this paper . And of course it was mentioned in many other posts. The chapter continues with CAP tradeoff, rightly calling it "the unhelpful CAP theorem...

Auto-WLM: machine learning enhanced workload management in Amazon Redshift

Image
This paper appeared in Sigmod'23. What? Auto-WLM is a machine learning based *automatic workload manager* currently used in production in Amazon Redshift. I thought this would be a machine learning paper, you know deep learning and stuff. But this paper turned out to be a practical/applied data systems paper. At its core, this paper is about improving query performance and resource utilization in data warehouses, possibly the first for a database system in production at scale.  They are not using deep learning, and rightly so! The main take-away from the paper is that locally-trained simple models (using XGBoost , a decision tree-based model built from the query plan trees) outperformed globally trained models, likely due to their ability to "instance optimize" to specific databases and workloads. They are using simple ML. And it works. Why? This is an important problem. If tuning is done prematurely, resources are unnecessarily wasted, and if it is done too late, overall...

Popular posts from this blog

Hints for Distributed Systems Design

Learning about distributed systems: where to start?

Making database systems usable

Looming Liability Machines (LLMs)

Foundational distributed systems papers

Advice to the young

Linearizability: A Correctness Condition for Concurrent Objects

Understanding the Performance Implications of Storage-Disaggregated Databases

Scalable OLTP in the Cloud: What’s the BIG DEAL?

Designing Data Intensive Applications (DDIA) Book