Posts

Showing posts from October, 2023

Cabbage, Goat, and Wolf Puzzle in TLA+

Image
Over the past couple of months, I have been harping on the value of TLA+ for teaching engineers the art of abstraction . It is important to emphasize that this is an art, not a science, and it is best learned through studying examples and practicing hands-on with modeling. TLA+ excels in providing rapid feedback on your modeling and designs, which facilitates this learning process significantly. Modeling the "cabbage, goat, and wolf" puzzle taught me that tackling real/physical-world scenarios is a great way to practice abstraction and design -- cutting out the clutter and focusing on the core challenge. How will you represent this world? What will be your actors, objects, and the relations between them? Will one approach result in a cleaner protocol model than another? Or, as I like to ask, does the protocol flow cleanly from the model? The puzzle description Here is the puzzle. A farmer with a wolf, a goat, and a cabbage must cross a river by boat. The boat can carry only t

Kora: A Cloud-Native Event Streaming Platform For Kafka

Image
This paper from VLDB'23 (awarded the Best Industry Paper) describes how Confluent built Kora, to provide Kafka as a managed cloud event streaming platform. Kora combines best practices to deliver cloud features such as high availability, durability, scalability, elasticity, cost efficiency, performance, multi-tenancy. For example, the Kora architecture decouples its storage and compute tiers to facilitate elasticity, performance, and cost efficiency. As another example, Kora defines a Logical Kafka Cluster (LKC) abstraction to serve as the user-visible unit of provisioning, so it can help customers distance themselves from the underlying hardware and think in terms of application requirements. The writing of the paper could be much better. I think the paper fails to symphatize with the reader, who lacks the context about Kafka in the first place, and rushes in to explaining the mechanics how Kora makes Kafka a cloud managed offering. The motivation and use cases of Kafka could hav

TiDB: A Raft-based HTAP Database

Image
This paper is from VLDB 2020. TiDB is an opensource Hybrid Transactional and Analytical Processing (HTAP) database, developed by PingCap. The TiDB server, written in Go, is the query/transaction processing component; it is stateless, in the sense that  it does not store data and it is for computing only. The underlying key-value store, TiKV, is written in Rust, and it uses RocksDB as the storage engine. They add a columnar store called TiFlash, which gets most of the coverage in this paper. In this figure PD stands for Placement Driver (PD), which is responsible for managing Raft ranges, and automatically moving ranges to balance workloads. PD also hosts the timestamp oracle (TSO), which provides strictly increasing and globally unique timestamps to serve as transaction IDs. Each timestamp includes the physical time and logical time. The physical time refers to the current time with millisecond accuracy, and the logical time takes 18 bits. If you know about CockroachDB/CRDB ( here i

Hints for Distributed Systems Design

This is with apologies to Butler Lampson, who published the " Hints for computer system design " paper 40 years ago in SOSP'83. I don't claim to match that work of course. I just thought I could draft this post to organize my thinking about designing distributed systems and get feedback from others. I start with the same  disclaimer Lampson gave. These hints are not novel, not foolproof recipes, not laws of design, not precisely formulated, and not always appropriate. They are just hints.  They are context dependent, and some of them may be controversial. That being said, I have seen these hints successfully applied in distributed systems design throughout my 25 years in the field, starting from the theory of distributed systems (98-01), immersing into the practice of wireless sensor networks (01-11), and working on cloud computing systems both in the academia and industry ever since. These heuristic principles have been applied knowingly or unknowingly and has proven

Popular posts from this blog

Hints for Distributed Systems Design

Learning about distributed systems: where to start?

Making database systems usable

Looming Liability Machines (LLMs)

Foundational distributed systems papers

Advice to the young

Linearizability: A Correctness Condition for Concurrent Objects

Scalable OLTP in the Cloud: What’s the BIG DEAL?

Understanding the Performance Implications of Storage-Disaggregated Databases

Designing Data Intensive Applications (DDIA) Book