DDIA: Chp 7. Transactions (Part 1)

- October 07, 2024

Chapter 7 of the Designing Data Intensive Applications (DDIA) book discusses transactions, yay! Transactions in database systems group multiple operations into a logical unit, a box of operations if you will. They simplify error handling and help manage concurrency issues. See Gray-Reuters book introduction and fault-tolerance sections for the motivation on this.

The ACID properties (Atomicity, Consistency, Isolation, Durability) are often used to describe transaction characteristics, but their precise meaning had been left fuzzy by the implementations. Atomicity ensures all operations in a transaction either complete fully or not at all. It's not about concurrency, but about handling faults during transaction processing. Consistency is an application-level concern, ensuring that transactions maintain an application-defined data integrity according to defined rules. This consistencey is not the same consistency as in CAP, and it is best to forget about C in ACID, and understand the other properties. Isolation deals with how concurrent transactions interact. Durability guarantees that committed changes persist even in case of system failures.

Different isolation levels (Read committed, Snapshot isolation, Serializability) offer varying degrees of protection against concurrency anomalies. Common concurrency anomolies include the following. (Copying verbatim from the summary section of DDIA Chapter 7, because it is good).

Dirty reads: One client reads another client’s writes before they have been committed. The read committed isolation level and stronger levels prevent dirty reads.

Dirty writes: One client overwrites data that another client has written, but not yet committed. Almost all transaction implementations prevent dirty writes.

Read skew: A client sees different parts of the database at different points in time. Some cases of read skew are also known as nonrepeatable reads. This issue is most commonly prevented with snapshot isolation, which allows a transaction to read from a consistent snapshot corresponding to one particular point in time. It is usually implemented with multi-version concurrency control (MVCC).

Lost updates: Two clients concurrently perform a read-modify-write cycle. One overwrites the other’s write without incorporating its changes, so data is lost. Some implementations of snapshot isolation prevent this anomaly automatically, while others require a manual lock (SELECT FOR UPDATE).

Write skew: A transaction reads something, makes a decision based on the value it saw, and writes the decision to the database. However, by the time the write is made, the premise of the decision is no longer true. Only serializable isolation prevents this anomaly.

Phantom reads: A transaction reads objects that match some search condition. Another client makes a write that affects the results of that search. Snapshot isolation prevents straightforward phantom reads, but phantoms in the context of write skew require special treatment, such as index-range locks.

It is not productive to describe isolation levels through the anomalies they guard against, but we are stuck with that legacy from ANSI SQL standard. In 1995, "A critique of ANSI SQL Isolation layers" paper pointed this out, and showed the problems with that approach. Unfortunately, that paper was so invested in locking based implementations that it committed the mistake of defining the isolation levels in terms of read/write locking. Double whammy! (Side note: The reason locking was so popular, and optimistic concurrency control was considered unpractical then was due to the high contention rate workloads involved in OLTP applications then over small number of keys. Things changed.) In 1999, Adya's thesis "Weak Consistency: A Generalized Theory and Optimistic Implementations for Distributed Transactions" provided an operational definition of isolation levels, and was cited a lot. But it is more productive to define isolation levels with a state-centric approach, and this was provided in 2017 by the "Seeing is Believing: A Client-Centric Specification of Database Isolation" paper. I used this approach in my TLA+ modeling of database concurrency control protocols.

It is important to note that different databases implement and name isolation levels differently. What one system (*cough* Oracle *cough*) calls "serializable" might actually be snapshot isolation in reality. It's crucial to understand the actual guarantees provided by a particular system rather than relying on isolation level names alone.

Higher isolation levels generally provide stronger guarantees but would come with performance trade-offs. Snapshot isolation, while powerful, doesn't prevent all anomalies. Write skew and phantom reads can still occur under snapshot isolation. Serializability provides the strongest guarantees but would come with a higher performance cost. While serializability is popular in academia, snapshot isolation is popular in industry (Postgres, MySQL, MongoDB, Oracle, SQLServer, InnoDB), and practitioners use better data modeling and manual locking through SELECT FOR UPDATE when needed.

For example the Doctor-on-call write-skew bug from Fig 7-8 can be fixed as follows.

Mechanical sympathy

As a user of a database you won't need to understand exactly how the database is implemented, but you need to have some mechanical sympathy, dammit. So here is that piece.

To implement snapshot isolation, the database must potentially keep several different committed versions of an object, because various in-progress transactions may need to see the state of the database at different points in time. To this end, each transaction is assigned a transaction ID, and the database keeps track of multiple versions of a row in a table using annotations. Each row in a table has a created_by field, containing the ID of the transaction that inserted this row into the table. Moreover, each row has a deleted_by field, which is initially empty. If a transaction deletes a row, the row isn’t actually deleted from the database, but it is marked for deletion by setting the deleted_by field to the ID of the transaction that requested the deletion. At some later time, when it is certain that no transaction can any longer access the deleted data, a garbage collection process in the database removes any rows marked for deletion and frees their space. An update is internally translated into a delete and a create as shown in Fig 7.7.

The implementation uses visibility rules to determine which data versions a transaction can see and presents a consistent snapshot of the database this way.

At the start of each transaction, the database makes a list of all the other transactions that are in progress (not yet committed or aborted) at that time.
Any writes that those transactions have made are ignored, even if the transactions subsequently commit.
Any writes made by aborted transactions are ignored.
Any writes made by transactions with a later transaction ID (i.e., which started after the current transaction started) are ignored, regardless of whether those transactions have committed.
All other writes are visible to the application’s queries.

How do indexes work in a multi-version database? The basic idea is to have the index simply point to all versions of an object and require an index query to filter out any object versions that are not visible to the current transaction. When garbage collection removes old object versions that are no longer visible to any transaction, the corresponding index entries can also be removed. Implementations employ many optimizations over this basic idea. Doesn't sound much fun.

Ok, here is an unrelated but still useful piece of information. I didn't really understand "cursor stability" definitions before when I was reading the Gray/Reuters book. The following paragraph in this book helped me understand. "Atomic operations are usually implemented by taking an exclusive lock on the object when it is read so that no other transaction can read it until the update has been applied. This technique is sometimes known as cursor stability."