I didn't read this paper before class, because I thought I had read it earlier. It turns out I hadn't and I was confusing it with the "Paxos Made Live" PODC 2007 paper. I realized this only towards the middle of the class :-) The "Paxos Made Live" paper focused on the Paxos consensus implementation that makes the state replication and availability work in Chubby. In contrast, this Chubby Lock Service paper focuses only on the lock service part and is very client-centric. It is the need and uses cases of the clients (i.e., the programmers in Google) that has dictated the design of the service and the interfaces, and even the presentation of the paper.
Chubby serves course grain locks rather than fine-grained locks to keep the load light. The paper mentions that the primary goal is to provide reliability/availability and thruput/storage are secondary goals.
Chubby uses Paxos for consistent state replication among the five machines in the Chubby cell. The details of this process is best left to the "Paxos Made Live" paper. Paxos is simple yet could be hard to internalize. When I am teaching the distributed systems class, I dedicate one week solid to the Paxos algorithm so that students have time to understand and internalize the algorithm. As far as this paper is concerned, the thing to keep in mind from the Paxos algorithm is that the master performs a write (for installing a lock) by contacting all the replicas and getting an acknowledgement from at least the majority of the replicas that write is performed. So, writing takes time, the master has to clear everything with the other replicas first before responding to the client that the write is performed. The read is quicker, the master can serve the reads locally. This is achieved by using a lease on the replicas; while the lease holds the replicas promise they will not choose a new master, so the master is guaranteed that it has the latest information and can serve reads locally. (This lease should not be confused with the lease to clients that will be mentioned below.)
OK, let's proceed to discuss the meat of the paper. I have to confess that there are several things I could not understand in this paper. Maybe this is because I have more of an academic background and little background as a programmer, the intended audience of the paper. The students in the seminar also were not any successful than me in answering the questions I raise below.
Chubby provides an interface much like a distributed file system with advisory locks. A typical name is: "/ls/foo/wombat/pouch" where ls stands for lock service, and foo is the name of a Chubby cell, the remaining path is the file that holds the metadata for the lock. "Locks are advisory" means that locks conflict only with other attempts to acquire the same lock: holding a lock called F neither is necessary to access the file F, nor prevents other clients from doing so. The author states the reason for this decision as "Chubby didn't want to break other clients", "We rejected mandatory locks, which make locked objects inaccessible to clients not holding their locks." But I don't understand how you can have an application that uses locks and one that ignores locks run at the same datacenter without giving up consistency and sanity perhaps. Is there an assumption about the filesystem that it will check with Chubby if a client without a lock is trying to acccess a file with a lock?
A lot of the paper is devoted to explaining the lease to clients and client-side caching of the locks to reduce the traffic to the master. Caching is all about improving performance via leases, so that the client doesn't need to read-request the lock from the master everytime it wants to check it. And it turns out the programmers were really in the habit of busy-waiting on the locks. This is from page 13 of the paper: "Originally, we did not appreciate the critical need to cache the absence of files, nor to reuse open file handles. Despite attempts at education, our developers regularly write loops that retry indefinitely when a file is not present, or poll a file by opening it and closing it repeatedly when one might expect they would open the file just once."
So, what the Chubby system does is to solve this problem for the clients. Chubby master makes the clients cache the locks, and promises to contact the client when that cache needs to be invalidated. So, the client does the busy-waiting on the lock in its cache without overwhelming the master with read-requests. This is how the paper puts it: "Chubby clients cache file data and node meta-data in a consistent write-through cache held in memory. The cache is maintained by a lease, and kept consistent by invalidations sent by the master, which keeps a list of what each client may be caching."
Keep alive messages are employed to extend the client leases with the master. I really like this idea. The client asks for a lease extension via an RPC, and the master, upon receiving this rpc, blocks the RPC and answers it (granting the lease) only close to the end of the lease. This allows the master to extend lease with max time and little overlap. I guess this process could have been at the master immediately on receving the request if the reply referenced a physical time rather than a period. But then replying early is a disadvantage for the master: What if it grants the lease too early (then it is promising for a long interval in the future), and it dies right after this reply. Then availability suffers, because first the gap and then the lease need to be waited. There is a reason the client is not sending the request toward the end of the lease and i rather sending it early on. This is because if the master is down, the client can find this out early on with this check to the master and plan accordingly. Another advantage of this RPC mechanism is that the notifications from the master to the client is piggybacked to the callback on the keep alive RPC. This way firewalls are not an issue for the master.
I found the Section 2.9 on failovers very complicated. This section talks about how the caches and leases can survive a master failover (new master selection in Chubby) with least interruption as possible. The paper also acknowledges this: "Readers will be unsurprised to learn that the fail-over code, which is exercised far less often than other parts of the system, has been a rich source of interesting bugs." There are all sorts of things to deal with in a failover. The new master tries to construct the leases the old master has distributed to the clients by observing the old handles the clients use to refer the files (handles include a sequence-number that allows a master to tell whether a handle was generated by it or by a previous master), and tries to keep these sessions alive seamlessly as much as possible.