- RDBMSs have not been designed with Internet service workloads, service properties, and cluster environments in mind, and as a result they fail to provide the right scaling, consistency, or availability guarantees.
- RDBMSs permit users to decouple the logical structure of their data from its physical layout, which is a good thing, but excessive data independence (isolation of application from modifying the layout of data definition and organization) can make parallelization and therefore scaling hard.
- RDBMSs always choose consistency over availability.
DDS and key-value stores
Figure: High-level view of a DDS: a DDS is a self-managing, cluster-based data repository. All service instances (S) in the cluster see the same consistent image of the DDS; as a result, any WAN client (C) can communicate with any service instance.
The novel aspects of a DDS are the level of abstraction it presents to service authors (by providing a data structure at the programming language level), the consistency model it supports, the access behavior (concurrency and throughput demands) that it presupposes, and its design and implementation choices that are made based on its expected runtime environment and the types of failures that it should withstand. SEDA is employed for implementing DDS to achieve high throughput and high concurrency.
Figure: Distributed hash table architecture: each box in the diagram represents a software process.
Services using DDS may keep soft-state but they rely on the hash table to manage all persistent state. DDS library contains only soft-state, including metadata about the cluster's current configuration and the partitioning of data in the distributed hash tables across the bricks. The DDS library acts as the 2-phase commit coordinator for update operations on the distributed hash tables. (Dynamo forgoes this consistency step, and avoids the complications discussed next.) The paper explains recovery mechanisms for what happens when coordinator fails during this 2-phase commit. However, this unavoidably leads to many corner cases and complicated to manage and may lead to recovery-induced inconsistencies. The 2-phase commit would also slow down write operations and limit scalability.
Figure: Distributed hash table metadata maps: The key is used to traverse the DP map trie and retrieve the name of the key's replica group. The replica group name is then used looked up in the RG map to find the group's current membership.
The DDS key-lookup uses a trie-based mapping that can deal nicely with overloaded and hot keys. (For this Dynamo employs a ring-based consistent hashing.) To find the partition that manages a particular hash table key, and to determine the list of replicas in partitions' replica groups, the DDS libraries consults two metadata maps that are replicated on each node of the cluster. First is DP map maintained as trie. And the second map is replica group membership table. These two maps are soft-state and self-cleaning. Instead of enforcing consistency synchronously, the libraries can drift out of date, but lazily updated when they are used to perform operations on the bricks.