Chapter 4: Non-Locking Schedulers (Concurrency Control Book)
Chapter 4 of the Concurrency Control and Recovery in Database Systems book (1987) opens with a sentence that doesn't quite pass the grammar test: "In this chapter we will examine two scheduling techniques that do not use locks, timestamp ordering (TO) and serialization graph testing (SGT)." That comma is trying to do the job of a colon and failing at it. Precision matters, more so in technical writing.
The writing is otherwise clear and careful. And as par the book, it is ahead of its time. The chapter presents a spectrum of non-locking schedulers, starting from Basic TO, expanding into certifiers (which basically stands for optimistic concurrency control), and ending with modular, composable scheduler designs that separate synchronization concerns cleanly between read-write and write-write synchronization.
Let's dig into the details.
Timestamp Ordering (TO)
Timestamp Ordering (TO) uses transaction start timestamps to impose a serial order on conflicting operations. It maintains for every data item x the maximum timestamps of Reads and Writes on x that it has sent to the DM, denoted max-r-scheduled[x] and coordinates with the data manager (DM) to preserve ordering constraints.
Timestamp management in TO is a disguised form of watermarking and garbage collection. The system purges "stale" max-r/max-w entries using a time threshold ts_min, and rejects operations too old to be safely scheduled.
Strict TO extends Basic TO with delayed visibility to ensure recoverability, cascading abort avoidance, and strictness. Reads and writes are withheld until prior conflicting writes are resolved via commit or abort acknowledgment. I was happy to predict this modification while reading. It’s the same trick repeated from the 2PL chapter: keep the locks all the way and delay read/write visibility until commit/abort. This technique also shows up in Conservative SGT and certifiers later in this chapter. It's a useful pattern.
Despite being aggressive (rejecting operations that arrive "late"), Strict TO is still more permissive than two-phase locking (2PL). It accepts more interleavings as serializable (see the example below), and reads never wait. This is great for read-heavy workloads.
The section on distributed TO is particularly interesting. TO is trivially parallelizable: each scheduler can operate independently using local metadata. Unlike distributed 2PL, it avoids inter-scheduler communication and deadlocks entirely. The global timestamp order serves as a natural tiebreaker. This is simple and powerful. With loosely synchronized clocks/hybrid logical clocks, you get a deadlock-free system that preserves serializability with minimal coordination.
For more on this topic, see my review of the TO paper from VLDB'90 by Bernstein and Goodman here. I also wrote about how Amazon DynamoDB implements Basic TO before. This illustrates how Basic TO made it into production systems. It is a sweet deal: Synchronization helps performance, but safety does not depend on it.
It might be illustrative to contrast TO with 2PL here.
- TO avoids locking overhead and deadlocks.
- TO doesn't serialize unnecessarily,
- TO is a natural fit for distributed systems, especially with loosely synchronized clocks. And it is great when conflicts are rare.
Serialization Graph Testing (SGT)
SGT builds the serialization graph (SG) incrementally and aborts transactions to prevent cycles. This can, in theory, yield maximum concurrency under SER constraints, but it comes at a cost. It requires maintaining the SG, detecting cycles, and carefully pruning old transactions. These incur space and computation overhead.
Maybe the quickest way to introduce SGT is again to compare it with 2PL.
- 2PL prevents cycles via locking. It blocks and delays operations, can deadlock, and is conservative. SGT detects cycles dynamically. It allows speculative progress, aborts instead of blocking, and is more permissive.
- SGT is more general and precise than 2PL but costs more in metadata and coordination. 2PL, on the other hand, is simpler, and naturally enforces recoverability and strictness.
The Basic SGT approach has no space bounds unless you prune carefully. Even committed transactions can't be garbage-collected immediately.
Conservative SGT can avoid aborts by requiring predeclared read/write sets and using readiness rules to delay operations. But this only works in restricted environments where access sets can be declared in advance.
Trying to distribute SGT hits a wall. Global cycle detection is hard. Unlike deadlock detection (which can take its sweet time because deadlocked transactions are blocked) SGT must check cycles before committing. Since that's a high-frequency, high-cost operation, it doesn't scale.
Certifiers
Certifiers are optimistic schedulers. They execute operations without delay and validate serializability only at commit time. In other words, transactions proceed speculatively/optimistically, and the scheduler check for conflicts at commit time. This model avoids blocking under light contention but wastes work under heavy contention.
The section presents three certifier styles: 2PL-based, TO-based, and SGT-based.
- The 2PL certifier is textbook OCC. It tracks read/write sets and validates at commit. There's no locking, just late conflict detection.
- TO certification is also straightforward, but redundant: it applies the same checks as Basic TO but defers them till the end. So Basic TO is better.
- SGT certification builds the SG dynamically and checks for cycles at commit. It mirrors SGT scheduling but pushes all cycle detection to the end.
Certifiers seem to help for distributed transactions, as you can delay the cost of validation until the transaction decides to commit. Distributed certifiers would then engage in coordination, and this brings us straight into two-phase commit (2PC) territory. Each certifier votes on whether to commit or abort. The TM collects votes, makes a global decision, and informs all participants.
Integrated Schedulers
This is conceptually the most exciting part of the chapter. This section suggests decomposing concurrency control into two independent subproblems: read-write (rw) synchronization and write-write (ww) synchronization. Each subproblem can then be solved with any of the core techniques (2PL, TO, or SGT), and the solutions can be composed into a composite scheduler.
Thomas Write Rule (TWR) makes a cameo appearance here as a ww synchronizer. TWR simply discards late writes if they've been overwritten by a newer write. No reordering, no rejection is needed --just drop the write and move on because it is overwritten anyway. But TWR doesn't provide serializability on its own. It must be paired with a proper rw synchronizer to ensure proper transaction isolation. The section discusses how to combine TWR with TO-based rw synchronization to build an integrated scheduler.
The mixed scheduler (2PL for rw + TWR for ww) is also interesting. Reads are protected by locks, ensuring strictness and recoverability, while writes proceed optimistically and avoid unnecessary aborts. This enables non-blocking writes without sacrificing the guarantees of strict 2PL for reads.
The chapter claims that hundreds of correct schedulers can be built from these compositions. But it stops here, only two examples are given. I think, in practice, many modern systems (especially MVCC-based ones) implicitly implement composite schedulers. Snapshot reads impose TO-like constraints, while writes are often validated with locking.
Comments