Online migration for geodistributed storage systems, Usenix ATC 2011

This paper investigates the problem of migrating data between data centers. Data needs to be moved from one center to another based on the access patterns, for example, the user may have moved from East to West coast. The problem is complicated by the large size of data that needs to be moved, the requirement to perform the migration online without blocking access to any part of the data from anywhere, and finally that the data can be accessed/modified concurrently in different locations.

To address this problem, the paper proposes an overlay abstraction. The goal of the abstraction is to implement migration as a service, so that the developer does not have to deal with the race conditions that may result while migrating data in ad hoc ways. The analogy of overlay is a sheet of transparencies. Remember the old days before powerpoint? The presenters used to print the slides on transparencies, and do animation by overlaying one transparency over another. The overlay idea is similar. "Where it is clear, the overlay reveals the contents underneath; where it is written, the overlay overrides those contents." Basically, the idea is to represent data as stacked layers in different places. This enables migration of data in smaller units, and the capability of having part of the data in one location and the other parts in other locations.

Overlay is implemented much like the (doubly) linked-list. Each overlay has two pointers, one pointing to the overlay below, and one pointing to the overlay above. Overlay insertion and deletion are similar to those one would expect from linked-list implementations. The overlay is designed such that every operation is linearized by the overlay structure even when the operations are submitted from any data center. Moreover, read and write operations can be executed concurrently with the overlay structure operations and with each other, at many clients without blocking.

To write data to an object the client first finds the highest overlay by following the above pointers starting from the based location. (Base location is learned from the directory service.) The data is written to this highest level overlay. To read an object, again the highest overlay is found as the first step. If the data to be read is not there, then below pointers are followed until the data is reached.

The contribution of the paper is the abstraction that nicely separates policy level from the concurrency-safe execution of the actual migration operations. The paper presents several optimizations and more use-cases for the overlay structure (such as exploiting in-built replication for migration, multiway caching, and split overlays).


Popular posts from this blog

Learning about distributed systems: where to start?

Hints for Distributed Systems Design

Foundational distributed systems papers

Metastable failures in the wild

The demise of coding is greatly exaggerated

Scalable OLTP in the Cloud: What’s the BIG DEAL?

The end of a myth: Distributed transactions can scale

SIGMOD panel: Future of Database System Architectures

Why I blog

There is plenty of room at the bottom