Centrifuge: Integrated Lease Management and Partitioning for Cloud Services

- April 19, 2011

For performance reasons many large-scale sites (LinkedIn, Digg, Facebook, etc.) employ a pool of backend servers that operate purely on in-memory state. The frontend servers that forward the requests to the backend servers then should be very carefully implemented to ensure that they forward the requests to the correct backends. The frontend should ensure that the selected backend has the requested object in its memory and also possesses the lease to update the object. Unfortunately, things may change quickly at the backend; backend servers may fail, new ones may join, backend servers may start to cache different objects due to load-balancing requirements, and leases may exchange hands. All of these make the task of programming the frontend very challenging and frustrating.

This work (from NSDI'10) proposes Centrifuge, a datacenter lease manager that addresses this problem. The Centrifuge system has been deployed as part of Microsoft live mesh service, a large scale commercial cloud service that finds use for file sharing, notifications of activity on shared files, virtual desktop, etc.

There are 3 parts to the Centrifuge. The frontend servers use lookup library to learn about the leases and partitioning from the manager and accordingly forward the requests to the backends. The manager is centralized (and Paxos replicated for fault-tolerance) and decides on leases and partitioning for the objects. The backend servers use owner library to coordinate with the manager about the leases and partitioning.

The manager consists of one leader and two standbys. A Paxos group is also used as a persistent/consistent state storage and as the leader elector. The leader makes all the decisions for partitioning and leases. If the leader dies, one of the standbys become a leader by contacting the Paxos group, and then learn the state of leases and partitioning from the Paxos group to start serving as the new leader.

The leader partitions the key space to 64 ranges using consistent hashing and handles the leases. The leader performs load-balancing by rearranging and reassigning these lease ranges to the backends while accounting for lightly/heavily used ranges and failed backends. In contrast to the traditional model where backends would request for the leases, in Centrifuge manager assigns leases to the backends unilaterally and this simplifies a lot of things such as enabling the assignment of leases as ranges rather than per object basis.

The lookup library at the frontend maintains a complete copy of the lease table (200KB constant since leases are per range not per object basis). The lookup returns "hints", these are checked at the backend owner library again. If the lookup table was wrong, the backend informs the corresponding frontend, and this triggers the lookup library to get the new table from the manager. Otherwise, the lookup table is renewed by contacting the manager every 30 seconds. Since leases are granted by the manager to the backends for 60 second periods (and most likely are renewed by the backend), the 30 second period for lookup table renewal is reasonable.

The paper provides extensive performance evaluations both from the MS live mesh system and from controlled examples. The Centrifuge system would come as handy for many cloud deployments.