Chapter 6: Centralized Recovery (Concurrency Control Book)
With Chapter 6, the Concurrency Control and Recovery in Database Systems book shifts focus from concurrency control to the recovery! This chapter addresses how to preserve the atomicity and durability of transactions in the presence of failures, and how to restore the system to a consistent state afterward.
The book offers a remarkably complete foundation for transactional recovery, covering undo/redo logging, checkpointing, and crash recovery. While it doesn't use the phrase "write-ahead logging", the basic concepts are there, including log-before-data and dual-pass recovery. When the book was written, the full WAL abstraction in ARIES was still to come in another five years at 1992 (see my review here). I revisit this discussion/comparison at the end of the post.
System Model and Architecture
In earlier chapters, we had reviewed the architecture: transactions pass through a transaction manager (TM), which sends operations to a scheduler and then to a data manager (DM). For recovery, the DM is split into:
- A Cache Manager (CM), which handles reading/writing between memory and disk, with a volatile cache and a stable storage backend.
- A Recovery Manager (RM), which interprets high-level operations (Read, Write, Commit, Abort, Restart) and ensures durability and atomicity. What is Restart? Well, we will discuss it in the recovery algorithm section.
The model distinguishes between volatile storage (lost on crash) and stable storage (assumed to survive crashes). Cache slots have a "dirty" bit to track whether their content has diverged from stable storage. The focus of the chapter is primarily on system failures that involve a crash or reboot that wipes out volatile memory but leaves the disk intact.
Stable Storage and Shadowing
For maintaining data in stable storage, in-place updating means overwrites destroy prior values. The shadowing idea allows fast copy as new versions are written to different locations, and a directory atomically switches the system view to the new state.
Shadowing avoids UNDO, but complicates consistency. In-place updating is simpler and more compatible with buffering, but requires careful logging. I had seen a shadowing solution being developed in S3 for a "fast copy" feature, and I was amazed how much complexity it introduces for correctness of deleting/garbage collection. It is surprisingly tricky. When describing the recovery algorithms, the book goes into detail discussing how the algorithm would handle shadowing.
Logging and Recovery
This section lays the groundwork for WAL in modern systems. The log contains entries of the form [T_i, x, v], which record that transaction T_i wrote value v to data item x.
There are two key rules to ensure safe recover.
- Undo Rule: Before a transaction overwrites a committed value in the database, the old value must be saved.
- Redo Rule: A transaction's writes must be logged before the transaction is considered committed.
These collectively ensure that the system can UNDO the effects of uncommitted transactions and REDO the effects of committed ones.
Recovery Algorithm Types
The chapter categorizes recovery algorithms by their needs:
- Undo/Redo: This can handle both missing and uncommitted updates. This is the most flexible category as it allows steal/no-force as in ARIES. The chapter focuses on Undo/Redo algorithms as the most general and efficient in runtime operation.
- Undo/No-Redo: This requires forcing all updates to disk before commit, eliminating redo. This is steal/force.
- No-Undo/Redo: This is no-steal and no-force. So it disallows overwriting before commit, eliminating undo.
- No-Undo/No-Redo: This is the least flexible one: no-steal, force. It requires full data persistence at commit.
What is steal and force, you ask? Steal says: A transaction can steal pages from the buffer to do work. This allows Transaction C's dirty pages to be written to disk before it completes so that there's room for Transaction B to do work. No-Force says: Dirty pages for finished transactions can be flushed to disk whenever convenient. They don't have to flush to disk before we can commit the transaction.
Remember the restart operation, which we said that the RM is responsible for. A Restart is invoked after a crash to restore the database to the last consistent committed state. This involves for the most flexible category both Undo followed by Redo:
- Scanning the log backward to undo changes from uncommitted transactions.
- Scanning forward to redo changes from committed ones.
The book discusses how this process should be performed in an idempotent manner, so that it can be safely retried if another crash interrupts it.
Checkpointing
Without optimization, Restart must scan the entire log. Checkpointing minimizes recovery time by flushing current database state to disk and marking that point in the log. Checkpointing approaches trades off between runtime efficiency and recovery time. Types include:
- Commit-consistent: Waits for all transactions to finish, then flushes all dirty pages.
- Cache-consistent: Flushes all dirty cache slots but doesn't wait for transactions to finish.
- Fuzzy checkpointing: Flushes only old dirty pages, reducing disruption.
Implementation of Undo/Redo with optimizations
The chapter discusses several performance optimizations for implementing Undo/Redo algorithm.
- Partial data item logging: Log only modified bytes/fields within a page.
- Buffering the log: To reduce I/O, log entries are staged in memory and flushed in batches.
- Log Sequence Numbers (LSNs): Pages and log entries are tagged with sequence numbers to track which updates have been applied.
- Delayed (group) commit: Allows batching commits to improve efficiency.
These optimization suggestions resemble those that ARIES (1992) paper formalized and implemented efficiently and correctly. More on this below.
Logical Logging
Instead of logging low-level data changes, the system may log high-level operations like "insert record" or "update field". Logical logging is more compact but harder to undo or redo, since it may depend on the current state.
Logical logging contrasts with ARIES-style logging, which uses physiological logging: a hybrid of logical and physical approaches. While logical logging records abstract operations without referencing specific data layouts, ARIES logs changes to specific pages along with logical intent (e.g., "insert at offset X in page P") and includes both before- and after-images to enable precise undo and redo. ARIES also relies on Log Sequence Numbers (LSNs) embedded in pages and Compensation Log Records (CLRs) to make recovery idempotent and efficient. This makes ARIES more robust under the steal/no-force buffer policy, whereas logical logging is lighter but more fragile.
ARIES critiques earlier approaches, like the one in this book, for performing undo before redo. It shows that this ordering becomes incorrect when combined with optimizations like LSNs and selective redo. In such hybrids, undo can write CLRs that raise page LSNs and cause the redo phase to mistakenly skip necessary committed updates.
ARIES reverses the order. It does redo first, and reapplies all committed changes that might be missing. Then it does the undo and erases changes from uncommitted transactions. This avoids the problem above. The redo is based on comparing page LSNs to log record LSNs. Since undo hasn't happened yet, page LSNs still reflect what's truly on disk. With this approach, you only advance page LSNs after you know the change is present. Tricky stuff.
Comments