Online migration for geodistributed storage systems, Usenix ATC 2011

This paper investigates the problem of migrating data between data centers. Data needs to be moved from one center to another based on the access patterns, for example, the user may have moved from East to West coast. The problem is complicated by the large size of data that needs to be moved, the requirement to perform the migration online without blocking access to any part of the data from anywhere, and finally that the data can be accessed/modified concurrently in different locations.

To address this problem, the paper proposes an overlay abstraction. The goal of the abstraction is to implement migration as a service, so that the developer does not have to deal with the race conditions that may result while migrating data in ad hoc ways. The analogy of overlay is a sheet of transparencies. Remember the old days before powerpoint? The presenters used to print the slides on transparencies, and do animation by overlaying one transparency over another. The overlay idea is similar. "Where it is clear, the overlay reveals the contents underneath; where it is written, the overlay overrides those contents." Basically, the idea is to represent data as stacked layers in different places. This enables migration of data in smaller units, and the capability of having part of the data in one location and the other parts in other locations.

Overlay is implemented much like the (doubly) linked-list. Each overlay has two pointers, one pointing to the overlay below, and one pointing to the overlay above. Overlay insertion and deletion are similar to those one would expect from linked-list implementations. The overlay is designed such that every operation is linearized by the overlay structure even when the operations are submitted from any data center. Moreover, read and write operations can be executed concurrently with the overlay structure operations and with each other, at many clients without blocking.

To write data to an object the client first finds the highest overlay by following the above pointers starting from the based location. (Base location is learned from the directory service.) The data is written to this highest level overlay. To read an object, again the highest overlay is found as the first step. If the data to be read is not there, then below pointers are followed until the data is reached.

The contribution of the paper is the abstraction that nicely separates policy level from the concurrency-safe execution of the actual migration operations. The paper presents several optimizations and more use-cases for the overlay structure (such as exploiting in-built replication for migration, multiway caching, and split overlays).


Popular posts from this blog

I have seen things

SOSP19 File Systems Unfit as Distributed Storage Backends: Lessons from 10 Years of Ceph Evolution

PigPaxos: Devouring the communication bottlenecks in distributed consensus

Frugal computing

Learning about distributed systems: where to start?

Fine-Grained Replicated State Machines for a Cluster Storage System

My Distributed Systems Seminar's reading list for Spring 2020

Cross-chain Deals and Adversarial Commerce

Book review. Tiny Habits (2020)

Zoom Distributed Systems Reading Group