TLA+/Pluscal solution for modeling of 2-phase commit transactions
I had posted about the second TLA+ project I assigned for the distributed systems class the other day. Here is the solution to the project. The solution is pretty simple and straightforward. (This was built by modifying the /P2TCommit.tla/ at Lamport's site to add a TM and a BTM agent).
Lines 12-13 set the initial RMstates for each replica and the initial TMstate.
Lines 15-18 define the canCommit and canAbort conditions. If all RMs are "prepared" (they are ready for a commit decision), the TM uses this to set tmState="commit". CanCommit is also true if there is an RM that already has "committed" state. This is possible if the TM already made the "commit" decision and an RM went ahead with it and transitioned to "committed" from "prepared". Then if the TM failed, and BTM is to make a decision, the BTM sets tmState="commit" in order to keep Consistency.
If there exists an RM with "abort" state or an RM with a "failed" state, canAbort becomes truthified, provided that there does not exist an RM with "committed" state. This Line 18 is there to prevent inconsistency with the Line 16 condition. If there is a failed node, it is possible to Abort, but not if a previous Commit was signaled and made an RM="committed".
Lines 37-38 show the actions RM take. Since "working" and "prepared" states are nonterminal states, the RM keeps trying an action until a terminal state is reached. The RM may call one of the macros Prepare, Decide, and Fail.
Prepare first checks that the RM is in "working" state, and if so transitions it to the "prepared" state. (In the 2-phase commit implementations, generally this happens upon a prepare-request sent by the TM, but this modeling keeps things simple and abstracts that away.)
Decide changes the RM state to "committed" if tmState="commit", and changes it to "aborted" if tmState="abort". Note that, when Decide is called RM is either in "working" or "prepared" state. As the canCommit condition shows above tmState goes to "commit" only if all RMs are in "prepared" state, so the allowed transitions for RM is prepared->committed, prepared->aborted, and working->aborted.
Fail changes the RM state to "failed".
Don't worry about all RMs go to failed, the model checker is going to check all possible scenarios including all up, one down (at any of the dozens of possible points in computation), two down (at any of the dozens of possible points in computation).
The TM model is simple. The TM checks if it canCommit or canAbort and updates tmState accordingly. TM can also fail which makes the tmState="hidden". To keep things simple yet interesting, I only made the TM fail after it makes a decision. But if you notice, I put a label before the fail action. That makes these two updates nonatomic: That is, the tmState will be available for RMs to read for a duration, before the TM fails.
The BTM model is very similar to the TM model. The BTM checks for tmState="hidden" before it interjects, but otherwise it makes the same decision. Of course there will be some complications, maybe an RM changed state seeing TMs decision, or maybe an RM failed, since the TMs decision.
For simplicity we assume the BTM does not fail.
Termination checks if all processes stop executing eventually. TypeOK is just a sanity check.
Consistency checks that there are no 2 RMs such that one says "committed" and other says "aborted". That was what the 2-phase commit was trying to prevent in the first place.
When we uncomment the BTM code, the BTM is able to take over, and the Termination is also satisfied. Of course we didn't get to this flawless model all at once. The model checker found some counterexamples to my flawed models, and I fixed them. I guess this post would be more instructive if I had documented some of the mistakes I made instead of showing only the final state of the code. Well, next time.
The reason we succeeded easily is because we cheated: we assumed clean failures and perfect failure detectors. If we didn't have perfect failure detectors, this would be a mess: A node may act as if another is dead, when the other node is alive and act as if this one is dead. This asymmetry of information is the root of all evil in distributed systems.
In a future post, I plan to make the failure detectors more fuzzy in this 2PC with backup TM code and show what type of problems can arise.
Before I end this post, let me answer this hanging question: How do we deal with partial failures and imperfect failure detectors? We use Paxos to cut the Gordian Knot and act as a definitive judge to get the processes agree on the current state/configuration of the system. This is exactly what Lamport and Gray show in their Consensus on Transaction Commit to make 2-phase commit fault-tolerant. I will discuss that work also in a future blog post.
Initial definitions
Lines 12-13 set the initial RMstates for each replica and the initial TMstate.
Lines 15-18 define the canCommit and canAbort conditions. If all RMs are "prepared" (they are ready for a commit decision), the TM uses this to set tmState="commit". CanCommit is also true if there is an RM that already has "committed" state. This is possible if the TM already made the "commit" decision and an RM went ahead with it and transitioned to "committed" from "prepared". Then if the TM failed, and BTM is to make a decision, the BTM sets tmState="commit" in order to keep Consistency.
If there exists an RM with "abort" state or an RM with a "failed" state, canAbort becomes truthified, provided that there does not exist an RM with "committed" state. This Line 18 is there to prevent inconsistency with the Line 16 condition. If there is a failed node, it is possible to Abort, but not if a previous Commit was signaled and made an RM="committed".
RM modeling
Lines 37-38 show the actions RM take. Since "working" and "prepared" states are nonterminal states, the RM keeps trying an action until a terminal state is reached. The RM may call one of the macros Prepare, Decide, and Fail.
Prepare first checks that the RM is in "working" state, and if so transitions it to the "prepared" state. (In the 2-phase commit implementations, generally this happens upon a prepare-request sent by the TM, but this modeling keeps things simple and abstracts that away.)
Decide changes the RM state to "committed" if tmState="commit", and changes it to "aborted" if tmState="abort". Note that, when Decide is called RM is either in "working" or "prepared" state. As the canCommit condition shows above tmState goes to "commit" only if all RMs are in "prepared" state, so the allowed transitions for RM is prepared->committed, prepared->aborted, and working->aborted.
Fail changes the RM state to "failed".
Don't worry about all RMs go to failed, the model checker is going to check all possible scenarios including all up, one down (at any of the dozens of possible points in computation), two down (at any of the dozens of possible points in computation).
TM modeling
The TM model is simple. The TM checks if it canCommit or canAbort and updates tmState accordingly. TM can also fail which makes the tmState="hidden". To keep things simple yet interesting, I only made the TM fail after it makes a decision. But if you notice, I put a label before the fail action. That makes these two updates nonatomic: That is, the tmState will be available for RMs to read for a duration, before the TM fails.
BTM modeling
The BTM model is very similar to the TM model. The BTM checks for tmState="hidden" before it interjects, but otherwise it makes the same decision. Of course there will be some complications, maybe an RM changed state seeing TMs decision, or maybe an RM failed, since the TMs decision.
For simplicity we assume the BTM does not fail.
Properties
Termination checks if all processes stop executing eventually. TypeOK is just a sanity check.
Consistency checks that there are no 2 RMs such that one says "committed" and other says "aborted". That was what the 2-phase commit was trying to prevent in the first place.
Model checking
Model checking with TM failure (with the BTM code commented out) led to a Termination violation as expected. That is if TM fails before the tmState is read by all RMs, the prepared RMs will block and won't be able to terminate.When we uncomment the BTM code, the BTM is able to take over, and the Termination is also satisfied. Of course we didn't get to this flawless model all at once. The model checker found some counterexamples to my flawed models, and I fixed them. I guess this post would be more instructive if I had documented some of the mistakes I made instead of showing only the final state of the code. Well, next time.
But that was easy
How were we able to solve this problem? Isn't this supposed to be a complicated problem in the database literature? There were a lot of attempts to make 2-phase commit fault-tolerant, fix it with a backup TM. Lamport and Gray paper on Transaction Commit says that all of those attempts had bugs in corner cases:A non-blocking commit protocol is one in which the failure of a single process does not prevent the other processes from deciding if the transaction is committed or aborted. They are often called Three-Phase Commit protocols. Several have been proposed, and a few have been implemented [3, 4, 19]. They have usually attempted to “fix” the Two-Phase Commit protocol by choosing another TM if the first TM fails. However, we know of none that provides a complete algorithm proven to satisfy a clearly stated correctness condition. For example, the discussion of non-blocking commit in the classic text of Bernstein, Hadzilacos, and Goodman [3] fails to explain what a process should do if it receives messages from two different processes, both claiming to be the current TM. Guaranteeing that this situation cannot arise is a problem that is as difficult as implementing a transaction commit protocol.
The reason we succeeded easily is because we cheated: we assumed clean failures and perfect failure detectors. If we didn't have perfect failure detectors, this would be a mess: A node may act as if another is dead, when the other node is alive and act as if this one is dead. This asymmetry of information is the root of all evil in distributed systems.
In a future post, I plan to make the failure detectors more fuzzy in this 2PC with backup TM code and show what type of problems can arise.
Before I end this post, let me answer this hanging question: How do we deal with partial failures and imperfect failure detectors? We use Paxos to cut the Gordian Knot and act as a definitive judge to get the processes agree on the current state/configuration of the system. This is exactly what Lamport and Gray show in their Consensus on Transaction Commit to make 2-phase commit fault-tolerant. I will discuss that work also in a future blog post.
Comments