Do leases buy us anything?

Consider the atomic storage problem, which is a simpler problem than consensus. CAP theorem says that when there is a partition, you will need to sacrifice strong-consistency or high-availability.

Using leases, it is possible to sacrifice availability for sometime (until lease expires), and reconfigure the storage system (often by shrinking the quorum) to keep providing availability. Consistency is preserved throughout, and availability is sacrificed only during lease expiration time. This is a good tradeoff. (I am going to bring up the question of whether this may help circumvent CAP at the end of the post.)

But is this power unique to leases? Is there a way to explain this in an alternate way using only failure detectors instead of leases? This is the question I want to explore here.

Many distributed systems implement leases with just countdown timers and without NTP timestamps. This is because in the short-term the rate of clocks at processes don't drift too much.

So maybe, we can simulate a lease expiration by a node suspecting itself as failed. If the other nodes have knowledge of the timeout on the failure detector of this expired node, they can wait that time out, and start reconfiguration of the storage system after that. While a failure detector requires only unilateral decision, this explanation requires that other nodes know about the bound on the failure detector at the expired node. Let's see if we can do without that requirement.

For reconfiguration, two options are possible. One is decentralized reconfiguration implemented over the nodes themselves, and the other is by using a reconfiguration box (often implemented by Paxos) as the arbiter.

An example of the former is the dynamic atomic storage without consensus (2011) setup. There, majority is needed to pass reconfiguration. But there, even for a consistent read, majority is needed. So we don't really need a failure-detector at the partitioned node to drop itself of the system. It won't be able to serve reads or writes by itself anyways.

An example of the latter is chain replication. Consider that the tail node is partitioned. The original chain cannot complete serving writes anymore, because the tail is partitioned. The tail can still serve reads though for the clients that contact it directly for some time.

Here the reconfiguration box can reconfigure the chain to remove the partitioned tail. To explain this with failure detector terminology, let's say the reconfiguration box has its failure detector itself. The failure detector suspects the tail, and passes a reconfiguration with a higher epoch and takes the tail off. But before reconfiguring the chain to remove the original tail, the reconfiguration box should make sure that the tail stops serving reads to client. How can this be accomplished? The reconfiguration message will not reach the partitioned old tail. So the tail should know about the reconfiguration box's failure detector timeout duration. Without this knowledge, without tying the reconfiguration server's failure detector about the tail to the tail's failure detector about itself, we wouldn't know when it is safe to switch from the old configuration and start the new configuration. (The alternative is that the tail checks with the reconfiguration box for each operation, so it confirms its status in the configuration. Even with this one, due to asymmetric message delay, the reconfiguration box may need to wait some duration before reconfiguring.)

Leases buy us time and optionality

Using leases, a node does not have to check its status in a configuration for each operation. Provided that the lease holds, the nodes status in the configuration is unchanged. Using leases on the acceptors, a Paxos leader can serve reads locally, without checking with a quorum. And using leases, a tail in chain replication can serve reads without checking if it is still the tail. This translates to efficiency because checking your status in the configuration is not done for each operation, but rather batched and done once for each lease renewal.

In terms of trading off availability to get availability, it seems like leases provides more information than a unilateral failure detector and can buy you consistency in the presence of a partitioned node. This comes with the loss of some availability because reconfiguration for quorum shrinking needs to wait the lease time to expire.

Leases also provide an advantage for reconfiguration in the presence of partitions. Leases sacrifice availability to restore availability (while preserving safety) for the storage system. This functionality requires more than the unilateral decision taken by failure detectors, but rather a bilateral information on the expiration of the timeouts.

MAD questions

1. Did we circumvent the CAP result?
CAP defines availability as "Each request for read or write eventually receives a response." Even with leases and reconfiguration, there will be a request that does not receive a response. In my example, the old tail will not respond to read request after the expiration of the lease. But since at that point the old tail is not part of the system anymore, why does that count against the availability of the system? But the formulation of CAP is too strict and defines the system as the initial set of nodes in the system. That formulation prohibits any reconfiguration of the system even when there are no partitions.

I think we need more refined versions of CAP. It has a very rough granularity formulation.


I can't help to see the heavy reliance on the assumption that time flows at the same rate in each system. So, a MAD question ;) ... would leases work for nodes moving at relativistic speeds with respect to each other?
Murat said…
Yes, the assumption is each node has access to timers that tick about the same rate. With cheap oscillator clocks found in computers this is implemented fine because the clock drifts in a couple seconds is negligible.

If the timers tick with different speed, it is still possible to implement leases, if the nodes learn how to convert each others tick rates to their terms. This is pretty much how NTP implements clock drift prevention, and how very precise time synchronization services were developed for wireless sensor networks.

Popular posts from this blog

Learning about distributed systems: where to start?

Hints for Distributed Systems Design

Foundational distributed systems papers

Metastable failures in the wild

Scalable OLTP in the Cloud: What’s the BIG DEAL?

The end of a myth: Distributed transactions can scale

Always Measure One Level Deeper

Dude, where's my Emacs?

There is plenty of room at the bottom

Know Yourself