Wednesday, February 12, 2014

Consistency-Based Service Level Agreements for Cloud Storage

This paper is from Microsoft Research and appeared in SOSP'13. This paper is more of a position and vision paper. The paper introduces a consistency-based SLA concept, which can provide a win-win arrangement for cloud providers and developers.


For performing reads from key-value stores, currently you have two options. You can do strongly-consistent reads by increasing the size of your read replica quorum, but this will increase latency of the responses, and you don't get the flexibility to revert to a quick dirty (eventually-consistent) read if a strong-consistent read would take a long time to respond. Or you go with best effort reads (which are eventually-consistent) from the key value store because you insist on low-latency answers.

(Actually there is another option: You can use a strongly consistent multiversion data store like BigTable or Spanner, and relax it by reading slightly stale data and get flexibility. I will revisit this option in the discussion.)

| Rank | Consistency | Latency | Utility |
|   1.    | strong         | 150 ms  |     1.0  |
|   2.    | eventual      | 150 ms  |     0.5  |
|   3.    | strong         | 500 ms  |    0.25 |

Enter the consistency-based SLA concept. The SLA acts as an interface between the application developer and the inners of the cloud. The developer provides a wishlist for their get (i.e., read) operations from the key-value store as above. Here the developer says "I want a reply in under 150 ms and prefer strongly consistent data but will accept any data; if no data can be obtained quickly then I am willing to wait up to 500ms for up-to-date data." The cloud-provider backend is structured such that it keeps track of which of these reads is feasible currently for that location and it satisfies the highest ranked one it can in order to give the best utility to the developer.

Using such an SLA makes good business sense. With this SLA the developers put their money where their mouth is. They agree to pay more for better utility provided to them. The cloud-providers can use the SLAs to prioritize the read requests: they can give more priority to consistency requiring higher paying (higher utility) customers.

To illustrate, Figure 3 shows some read latencies at a given point from given locations. The developer does not have access to all per region or per client latencies like this, but in the SLA she can state her ranked preferences for latency and consistency of the reads she thinks would make most sense for her application, and through this interface she has access to dynamic tuning of performance of her application.

Pileus Architecture:

To showcase the SLA, the authors developed a replicated key-value store called Pileus. Pileus is a type of cloud formation, it is a cap cloud. (Get it? A "CAP" cloud.) Pileus dynamically selects which servers to access in order to deliver the best service given the current configuration and system conditions.

Some storage nodes are designated as primary nodes, which hold the master data, while others are secondary nodes. All Puts (i.e., writes) in Pileus are performed and strictly ordered at a primary site. Secondary nodes eventually receive from the primary core all the updated objects along with their update timestamps. Since all Put operations are assigned increasing update timestamps from the primary site and the asyncronous replication protocol transfers updated objects in timestamp order, at any point in time, each secondary node has received a prefix of the overall sequence of Put operations.

When selecting the node to which a Get operation should be sent, the desired consistency guarantee, along with the previous object versions that have been read or written in the current session and the key being read, determines the minimum acceptable read timestamp. The minimum acceptable read timestamp indicates how far a secondary node can lag behind the primary and still provide an answer to the given Get operation with the desired consistency. This is being decided by the client library of Pileus.

This architecture forces all the writes to be performed on a single primary core limits the problem space, and simplifies things for ensuring consistency for the reads in the consistency-spectrum. But this also limits the performance on reads (except for eventual-consistency reads). Moreover, with this setup you don't get to specify latency bounds for writes.

Evaluation results show that consistency-based SLAs can indeed improve application-specific levels of service (i.e., utility).


Q: How rich is the class of applications that benefit from this SLA?

A: I am still confused about this. It sounds like this can be applicable to a large class of applications, but sometimes I revert to thinking maybe not that big.

For latency-favoring (eventual-consistency happy) applications there are existing solutions: DynamoDB, and several key-value stores. And the target applications are those that tolerate relaxed consistency but, nevertheless, benefit from improved consistency. It may seem that these are already served to some extent by the eventual-consistent key-value stores. They are just best effort. You don't know what you get, but fresher more consistent data improves service the same as in Pileus. Pileus gives you tuned performance, but maybe you could have gotten that performance by probabilistic means also. (Peter Bailis has a very nice work on probabilistically bounded staleness, which is also a related approach here.)

For consistency-favoring applications, there are existing solutions like Bigtable, Spanner. And you can still do a quick dirty read from Spanner, by giving a slightly past read timestamp. This works because Spanner is a multiversion key-value store. But I guess you still need to manage when you would want to revert to the quick dirty reads.

Q: How does Pileus change the application code?

A: Yes we learn from API when we get back a consistent read and when not, but reacting on the type of reads may lead to polluting my program with a lot of branches and checks. Maybe programming languages people may have an answer to that. I guess, this way is still better than monitoring for latencies and implement these tuning in your application.

No comments: