Chapter 7: Distributed Recovery (Concurrency Control Book)
Chapter 7 of the Concurrency Control and Recovery in Database Systems book by Bernstein and Hadzilacos (1987) tackles the distributed commit problem: ensuring atomic commit across a set of distributed sites that may fail independently.
The chapter covers these concepts:
- The challenges of transaction processing in distributed database systems (which wasn't around in 1987)
- Failure models (site and communication) and timeout-based detection
- The definition and guarantees of Atomic Commitment Protocols (ACPs)
- The Two-Phase Commit (2PC) protocol (and its cooperative termination variant)
- The limitations of 2PC (especially blocking)
- Introduction and advantages of the Three-Phase Commit (3PC) protocol
Despite its rigor and methodical development, the chapter feels like a suspense movie today. We, the readers, equipped with modern tools like FLP impossibility result and Paxos protocol watch as the authors try to navigate a minefield, unaware of the lurking impossibility results that were published a couple years earlier and the robust consensus frameworks (Viewstamped replication and Paxos) that would emerge just a few years later.
Ok, let's dive in.
Atomic Commitment Protocol (ACP) problem
The problem is to ensure that in the presence of partial failures (individual site failures), a distributed transaction either commits at all sites or aborts at all sites, and never splits the decision. The authors define the desired properties of ACPs through a formal list of conditions (AC1–AC5).
We know that achieving these in an asynchronous setting with even one faulty process is impossible as FLP impossibility result established in 1985. Unfortunately, this impossibility result is entirely absent from the chapter’s framework. The authors implicitly assume bounded (and with known bounds) message delays and processing times, effectively assuming a synchronous system. That is an unrealistic portrayal of real-world distributed systems, even today in the data-centers.
A more realistic framework for distributed systems is the partially asynchronous model. Rather than assuming known and fixed bounds on message delays and processing times, the partially asynchronous model allows for periods of unpredictable latency, with the only guarantee being that bounds exist, just not that we know them. This model captures the reality of modern data centers, where systems often operate efficiently but can occasionally experience transient slowdowns or outages where fixed bounds would be violated and maybe higher bounds might be established for some duration before convergence to stable. This also motivates the use of weak failure detectors, which cannot definitively distinguish between a crashed node and a slow one.
This is where Paxos enters the picture. Conceived just a few years after this chapter, Paxos provides a consensus protocol that is safe under all conditions, including arbitrary message delays, losses, and reordering. It guarantees progress only during periods of partial synchrony, when the system behaves reliably enough for long enough, but it never violates safety even when conditions degrade. This doesn't conflict with what the FLP impossibility result of 1985 proves: you cannot simultaneously guarantee both safety and liveness in an asynchronous system with even one crash failure. But that doesn't mean you must give up on safety. In fact, the brilliance of Paxos lies in this separation: it preserves correctness unconditionally and defers liveness until the network cooperates. This resilience is exactly what's missing in the ACP designs of Bernstein and Hadzilacos even when using 3PC protocols.
If you like a quick intro to the FLP and earlier Coordinated Attack impossibility results, these three posts would help.
2PC and 3PC protocols
The authors first present the now-classic Two-Phase Commit (2PC) protocol, where the coordinator collects YES/NO votes from participants (the voting phase) and then broadcasts a COMMIT or ABORT (the decision phase). While 2PC satisfies AC1–AC4 in failure-free cases, it fails AC5 under partial failures. If a participant votes YES and then loses contact with the coordinator, it is stuck in an uncertainty period, unable to decide unilaterally whether to commit or abort. The authors provide a cooperative termination protocol, where uncertain participants consult peers to try to determine the outcome. It reduces, but does not eliminate, blocking.
Thus comes the Three-Phase Commit (3PC) protocol, which attempts to address 2PC's blocking flaw by introducing an intermediate state: PRE-COMMIT. The idea is that before actually committing, the coordinator ensures all participants are "prepared" and acknowledges that they can commit. Only once everyone has acknowledged this state does the coordinator send the final COMMIT. If a participant times out during this phase, it engages in a distributed election protocol and uses a termination rule to reach a decision.
Indeed, in synchronous systems, 3PC is non-blocking, and provides an improvement over 2PC. The problem is that 3PC relies critically on timing assumptions, always requiring bounded message and processing delays. The protocol's reliance on perfect timeout detection and a perfect failure detector makes it fragile. As another secondary problem, the 3PC protocol discussed in the book (Skeen 1982) has also been shown to contain some subtle bugs as well even in the synchronous model.
In retrospect
Reading this chapter today feels like watching a group of mountaineers scale a cliff without realizing they’re missing key gear. I spurted out my tea when I read these lines in the 3PC discussion. "To complete our discussion of this protocol we must address the issue of elections and what to do with blocked processes." Oh, no, don't go up that path without Paxos and distributed consensus formalization!! But the book predates Paxos (1989, though published later), Viewstamped Replication (1988), and the crystallization of the consensus problem. It also seems to be completely unaware of the FLP impossibility result (1985), which should have stopped them in their tracks.
This chapter is an earnest and technically careful work, but it's flying blind without the consensus theory that would soon reframe the problem. The chapter is an important historical artifact. It captures the state of the art before consensus theory illuminated the terrain. The authors were unable to realize that the distributed commit problem includes in it the distributed consensus problem, and that all the impossibility, safety, and liveness tradeoffs that apply to consensus apply here too.
Modern distributed database systems use Paxos-based commit. This is often via 2PC over Paxos/Raft groups for participant-sites. See for example our discussion and TLA+ modeling of distributed transactions in MongoDB.
Miscellaneous
This is funny. Someone is trolling on Wikipedia, trying to introduce Tupac as an alternative way to refer to 2PC.
Comments