Posts

Showing posts from November, 2010

Dynamo: Amazon's highly available key-value store

This paper, which appeared in SOSP'07, describes Dynamo, the underlying storage technology for several core services in Amazon's e-commerce platform. Dynamo is a NoSQL system and provides a single key-value store. The commonly accepted (yet still disputed) wisdom is that RDBMS are overkill for simple key-value stores and are unsuitable for large-scale multi datacenter systems.

The goal in Dynamo is providing reliability at large scale. As the paper says in the introduction "The reliability and scalability of a system is dependent on how its application state is managed". This was a key lesson from the Ousterhout'90 paper: "The role of distributed state".

Dynamo is an optimistic replication system. According to the optimistic survey taxonomy, Dynamo is a multi-master (multiple coordinators can update the data) system that employs state transfer and asynchronous propagation of updates. Hence, Dynamo allows conflicting updates in to the system. Dynamo uses …

Crowdsourced sensing and collaboration using Twitter

First, a brief prelude about my research interests. I started my PhD on distributed algorithms and self-stabilizing systems in 1998. But then in 2000, after my advisor received a DARPA grant on networked embedded sensor technologies, I have started working on wireless sensor networks. After 10 years of working on wireless sensor networks, I am now in the process of switching topics. I am doing a lot of reading on cloud computing. This is a topic I enjoy, and I think I can contribute here due to my background in distributed systems and theory.

I am also doing a lot of reading on smartphones, as they provide a good alternative/complement to wireless sensor network systems. The main appeal of smartphones is that they have solved the market penetration problem, which the wireless sensor networks have been perpetually struggling with. The killer applications for smartphones are communication and offering a lightweight ubiquitous PC replacement. Smartphones are incidentally better sensors t…

VL2: A Scalable and Flexible Data Center Network

Image
This paper is by MS Research and appeared in Sigcomm 2009. The paper investigates data center networking, the same problem as the Portland paper (which also appeared in Sigcomm 2009!). Naturally there are similarities in the approaches recommended by the two papers.

The motivation for this paper is from a slightly different angle than the Portland paper. This paper puts more emphasis on the network capacity problem in the data centers. The paper argues that the network is the bottleneck of the computation, since the switches at the higher levels (i.e., aggregation and core switches) are oversubscribed heavily. Servers typically have 1:1 over-subscription to other servers in the same rack --that is, they can communicate at the full rate of their interfaces (e.g., 1 Gbps). However, up-links from top of the rack (ToR) switches are typically 1:20 oversubscribed (i.e., 1 Gbps of up-link for 20 servers), and paths through the highest layer of the tree can be 1:240 oversubscribed.

The paper fo…

PortLand: A Scalable Fault-Tolerant Layer 2 Data Center Network Fabric

Image
Last week we covered the Portland paper by UC San Diego, which appeared at Sigcomm'09. This paper is well written and I really appreciate its clarity and simplicity.

The motivation for the work is the need to scale at the data center networks (DCNs). Data centers today may have around 100K machines, with 32 virtual machines at each, which yield a total of 3 million hosts. Assuming one switch is required for every 25 physical hosts and accounting for interior nodes, the topology would consist of 8,000 switches. Current network protocols impose significant management overhead at this scale let alone supporting seamless VM migration across machines. For example, broadcasting ARPs and updates to every swithch is out of the question for such a network.

The desiderata for the DCN is given as:
R1. Any VM may migrate to any physical machine. Migrating VMs should not have to change their IP addresses as doing so will break pre-existing TCP connections and application-level state.
R2. An admini…

Availability in Globally Distributed Storage Systems

Image
This paper by Google Research provides a report on the availability behavior of large cloud storage systems by studying up to 7K nodes at Google over a year. The paper does not propose any new protocols, but provides sufficiently accurate models of system behavior/performance based on the study. These models are important since they enable us to correctly design and optimize these multi layered systems for data availability.
The work is divided into two parts. The first is the analysis of the component availability (disks, machines, racks), and the second is the analysis of the data availability, as inferred from the component availability results and design decisions of the distributed storage system.
Preliminaries A storage node is defined as unavailable when it fails to respond positively to periodic health checking pings sent by the monitoring system. The paper does not investigate about network errors specifically; those are also swept under unavailability with software & har…

My iPhone 4 has a transparent screen

Image
I just took a picture of my palm, and set it as the wallpaper to give the transparency effect. The trick worked well on some unsuspecting friends.

Optimistic Replication

Image
This 2005 paper by Saito and Shapiro provides a comprehensive survey of the optimistic replication area and is a must-read for distributed services designers. Below, I am providing some snippets from the paper to give a brief overview.
Data replication improves both availability and performance for distributed services. Availability is improved by allowing access to the data even when some of the replicas are unavailable. Performance improvements concern reduced latency (by enabling local access from replica instead of remote access) and increased throughput (by letting multiple computers serve the data).
Pessimistic techniques block access to a replica unless it is provably up to date. Such pessimistic techniques perform well in local-area networks, in which latencies are small and failures uncommon, but is unsuitable for wide-area networks, because the Internet remains slow and unreliable. Moreover, pessimistic algorithms scale poorly, because its throughput and availability suffer a…

A presentation tip (brought to you by a weary audience)

A common and serious flaw I see in student presentations is that the presenter rushes through the first 5-10 slides. This is the worst thing to do in a presentation, these introduction slides to the problem and solution are the most important ones. If the presenter loses the audience in these beginning slides, this wastes the entire hour for the audience. In comparison, losing the audience in the middle of the presentation is less critical, less time is wasted, and the audience can even use this time to think about alternative solutions to the problem (which was presented clearly in the introduction).
You would think this is common sense, but most of the students still commit this mistake. I guess the reason is since the presenter had read and understood the relatively easy introduction part of the paper, he thinks the audience can also understand and follow it with ease. And armed with the powerpoint slides, the presenter can finish the first 10 slides in less than 10 minutes in bulle…

The role of distributed state

This is a 1990 technical report by John Ousterhout, who has been a very influential and pioneer figure in systems research. I had written a summary of his log-structured file system (LFS) paper earlier in this blog. It seems like his ideas on LFS and RAMCloud are becoming important in cloud computing these days. Ousterhout is also the creator of TCL/TK, so he has a lot to say in programming languages for web applications domain as well. The miscellaneous articles on his non-academic homepage include a lot of wisdom from his experience in the software development industry, and are also worth reading carefully.
This paper titled "Role of distributed state" describes the advantages and disadvantages of distributed state and suggests that the act of building a distributed system consists of making tradeoffs among various alternatives for managing the distributed state. Illustrating this point, the two case studies included in the paper, NFS and Sprite file systems, end up with a…

Popular posts from this blog

I have seen things

SOSP19 File Systems Unfit as Distributed Storage Backends: Lessons from 10 Years of Ceph Evolution

PigPaxos: Devouring the communication bottlenecks in distributed consensus

Frugal computing

Learning about distributed systems: where to start?

Fine-Grained Replicated State Machines for a Cluster Storage System

My Distributed Systems Seminar's reading list for Spring 2020

Cross-chain Deals and Adversarial Commerce

Book review. Tiny Habits (2020)

Zoom Distributed Systems Reading Group