Scalable distributed data structures for internet service construction
I think this 2000 paper (by Gribble, Brewer, Hellerstein, and Culler) may as well be the original NoSQL paper. The paper starts off by identifying the problems with RDBMS that prohibit scalability. (This is how you would motivate a NoSQL key-value store system even today :-)
Figure: High-level view of a DDS: a DDS is a self-managing, cluster-based data repository. All service instances (S) in the cluster see the same consistent image of the DDS; as a result, any WAN client (C) can communicate with any service instance.
DDS is basically a key-value store as we understand it today. As the paper puts it, DDS provides a persistent data management layer designed to simplify cluster-based Internet service construction. A distributed hash-table underlies this data management layer, and it simplifies Internet service construction by decoupling service-specific logic from the complexities of persistent, consistent state management. This allows services to inherit the necessary service properties from the DDS rather than having to implement them themselves. DDS presents a conventional single-host data structure interface to service authors, but in fact it partitions and replicates the data across a cluster. DDS is a cluster-level data structure and is not designed for a WAN.
The novel aspects of a DDS are the level of abstraction it presents to service authors (by providing a data structure at the programming language level), the consistency model it supports, the access behavior (concurrency and throughput demands) that it presupposes, and its design and implementation choices that are made based on its expected runtime environment and the types of failures that it should withstand. SEDA is employed for implementing DDS to achieve high throughput and high concurrency.
Figure: Distributed hash table architecture: each box in the diagram represents a software process.
Services using DDS may keep soft-state but they rely on the hash table to manage all persistent state. DDS library contains only soft-state, including metadata about the cluster's current configuration and the partitioning of data in the distributed hash tables across the bricks. The DDS library acts as the 2-phase commit coordinator for update operations on the distributed hash tables. (Dynamo forgoes this consistency step, and avoids the complications discussed next.) The paper explains recovery mechanisms for what happens when coordinator fails during this 2-phase commit. However, this unavoidably leads to many corner cases and complicated to manage and may lead to recovery-induced inconsistencies. The 2-phase commit would also slow down write operations and limit scalability.
Figure: Distributed hash table metadata maps: The key is used to traverse the DP map trie and retrieve the name of the key's replica group. The replica group name is then used looked up in the RG map to find the group's current membership.
The DDS key-lookup uses a trie-based mapping that can deal nicely with overloaded and hot keys. (For this Dynamo employs a ring-based consistent hashing.) To find the partition that manages a particular hash table key, and to determine the list of replicas in partitions' replica groups, the DDS libraries consults two metadata maps that are replicated on each node of the cluster. First is DP map maintained as trie. And the second map is replica group membership table. These two maps are soft-state and self-cleaning. Instead of enforcing consistency synchronously, the libraries can drift out of date, but lazily updated when they are used to perform operations on the bricks.
- RDBMSs have not been designed with Internet service workloads, service properties, and cluster environments in mind, and as a result they fail to provide the right scaling, consistency, or availability guarantees.
- RDBMSs permit users to decouple the logical structure of their data from its physical layout, which is a good thing, but excessive data independence (isolation of application from modifying the layout of data definition and organization) can make parallelization and therefore scaling hard.
- RDBMSs always choose consistency over availability.
DDS and key-value stores
Figure: High-level view of a DDS: a DDS is a self-managing, cluster-based data repository. All service instances (S) in the cluster see the same consistent image of the DDS; as a result, any WAN client (C) can communicate with any service instance.
The novel aspects of a DDS are the level of abstraction it presents to service authors (by providing a data structure at the programming language level), the consistency model it supports, the access behavior (concurrency and throughput demands) that it presupposes, and its design and implementation choices that are made based on its expected runtime environment and the types of failures that it should withstand. SEDA is employed for implementing DDS to achieve high throughput and high concurrency.
DDS architecture
Figure: Distributed hash table architecture: each box in the diagram represents a software process.
Services using DDS may keep soft-state but they rely on the hash table to manage all persistent state. DDS library contains only soft-state, including metadata about the cluster's current configuration and the partitioning of data in the distributed hash tables across the bricks. The DDS library acts as the 2-phase commit coordinator for update operations on the distributed hash tables. (Dynamo forgoes this consistency step, and avoids the complications discussed next.) The paper explains recovery mechanisms for what happens when coordinator fails during this 2-phase commit. However, this unavoidably leads to many corner cases and complicated to manage and may lead to recovery-induced inconsistencies. The 2-phase commit would also slow down write operations and limit scalability.
Figure: Distributed hash table metadata maps: The key is used to traverse the DP map trie and retrieve the name of the key's replica group. The replica group name is then used looked up in the RG map to find the group's current membership.
The DDS key-lookup uses a trie-based mapping that can deal nicely with overloaded and hot keys. (For this Dynamo employs a ring-based consistent hashing.) To find the partition that manages a particular hash table key, and to determine the list of replicas in partitions' replica groups, the DDS libraries consults two metadata maps that are replicated on each node of the cluster. First is DP map maintained as trie. And the second map is replica group membership table. These two maps are soft-state and self-cleaning. Instead of enforcing consistency synchronously, the libraries can drift out of date, but lazily updated when they are used to perform operations on the bricks.
Comments
http://en.wikipedia.org/wiki/Shard_(database_architecture)
Since databases provide too much isolation, you cannot have control over sharding.
- Gary
AlphaPoint Data Management