Posts

Showing posts from February, 2023

Speedy Transactions in Multicore In-Memory Databases

Image
This paper is from SOSP'13. The Polyjuice paper, which we studied last week, built on the Silo codebase and commit protocol, which led me to read the Silo paper. Silo is a single-node multi-core in-memory database. It avoids all centralized contention points, including centralized transaction ID (TID) assignment. Silo's key contribution is a commit protocol based on optimistic concurrency control (OCC) that provides serializability while avoiding all shared-memory writes for records that were only read. Logging and recovery is provided by linking periodically-updated epochs with the commit protocol. Silo achieves almost 700,000 transactions per second on a standard TPC-C workload mix on a 32-core machine, as well as near-linear scalability. That is pretty impressive over 2013 hardware. The Silo paper got 445 citations in 10 years. That is also impressive. So let's dive in. Architecture Silo's organization is typical of databases. Tables are implemented as collections...

Polyjuice: High-Performance Transactions via Learned Concurrency Control (OSDI'21)

Image
This paper appeared in OSDI 2021 . I really like this paper. It is informative, it taught me new things about concurrency control techniques. It is novel, it shows a practical application of simple machine learning to an important systems problem, concurrency control. It shows significant benefits for a limited but reasonable setup. It has good evaluation coverage and explanation. It is a good followup paper to the benchmarking papers we have been looking at recently. So let's go on learning about how Polyjuice was brewed. Problem and motivation There is no supreme concurrency control (CC) algorithm for all conditions. Different CC algorithms return the best outcome under different conditions.  Consider two extreme CC algorithms. Two phase locking (2PL) waits-for every dependent transactions to finish. Optimistic concurrency control (OCC) don't track or wait for any dependent transaction but validate at the end. We find that OCC is better with less contention, because it avoids...

Books I read recently

I realize I haven't done a book review post recently. The last one was from September 2021 . But I have been reading (and listening to) books. Well, recently mostly unsuccessfully. Here are some of them in reverse chronological order. (I think there must be some other non-fiction books I read, but I can't remember now.) It (Stephen King, 1986) The horror. Oh my God, it doesn't end... The book doesn't end. I won't spoil the book for you, because I couldn't get to the ending. I borrowed the book in audiobook, and extended twice. It was good listening at the car, while driving for chores. But I only have so much stamina. The book keeps dragging on. I gave up. I will just watch the movie with my son this weekend. Build (Tony Fadell, 2022) This is another book that won't end. The book felt massive. I extended the renewal for 2-3 times, but I couldn't finish the book. This book is actually multiple books bundled together. You can learn about product managemen...

Designing Access Methods: The RUM Conjecture

Image
This paper is from EDBT 2016 . Database veterans would know of the tradeoffs mentioned here, but it is very useful to have this "systematization of knowledge" paper, because it gives a good mental framework for new-comers to the field, and seeing practical/implicit tradeoffs explicitly stated can create an epiphany for even the veterans. The problem Algorithms and data structures for organizing and accessing data are called access methods. Database research/development community has been playing catchup redesigning and tuning access methods to accommodate changes to hardware and workload. As data generation and workload diversification grow exponentially, and hardware advances introduce increased complexity, the effort for the redesigning and tuning have been accumulating as well. The paper suggests it is important to solve this problem once and for all by identifying tradeoffs access methods face, and designing access methods that can adapt and autotune the structures and te...

A Study of Database Performance Sensitivity to Experiment Settings

Image
This paper appeared in VLDB2022 and is a good followup to our TPC-E vs TPC-C post . The paper investigates the following question: Many articles compare to prior works under certain settings, but how much of their conclusions hold under other settings? Their methodology is as follows. They used TPC-C and YCSB benchmarks as they are most widely used. They reproduced and evaluated 11 work (see Table 1). They then tuned benchmark parameters and system features to study effects on the performance of these work. They find that the evaluations of these work (and conclusions drawn from them) are sensitive to experiment settings. They make some recommendations as to how to proceed for evaluation of future systems work. The paper is well written and is an important contribution to systems work. In my summary below, I use many sentences from the paper verbatim. Analysis of TPC-C results TPC-C simulates the database of a wholesale company. It includes a number of warehouses, each maintaining sto...

Popular posts from this blog

Hints for Distributed Systems Design

Learning about distributed systems: where to start?

Making database systems usable

Looming Liability Machines (LLMs)

Advice to the young

Foundational distributed systems papers

Distributed Transactions at Scale in Amazon DynamoDB

Linearizability: A Correctness Condition for Concurrent Objects

Understanding the Performance Implications of Storage-Disaggregated Databases

Designing Data Intensive Applications (DDIA) Book