Showing posts from October, 2010

Paxos taught

In my previous posts I have alluded a couple of times to how I teach Paxos. Here I will explain how I go about teaching Paxos. These are the slides I use in class. (In my slides I reuse a lot of material from the lecture slides of Jeff Chase (Duke) . He shared his slides with me under the Creative Commons license, where I get to remix/change the contents with proper credit to him.) Paxos is a protocol for solving distributed consensus. The problem is easy to state. Consensus requires the following three properties. The first two are safety properties, the last one is a liveness property. Agreement: No two process can commit different decisions. Validity (Non-triviality): If all initial values are same, nodes must commit that value. Termination: Nodes commit eventually. I first start with a discussion of impossibility results in consensus. A major impossibility result is the coordinated attack (aka two generals) paradox . The result states that there is no determi

Speculative Execution in a Distributed File System

This SOSP'05 paper is about optimization techniques for client server distributed computing model, especially focusing on distributed file systems. In distributed filesystems, providing strong cash-consistency among concurrently editing clients require several synchronization messages to be exhanged between the clients and the server. Network delays, especially in wide area networks, make this unfeasibly slow. In fact AFS and NFS sacrifice consistency guarantees for speed, and provide weaker consistency guarantees such as close-to-open consistency where a client that opens the file see modifications by clients that have previously closed the file. Even these weaker consistency guarantees require synchronizing with the server and introduce latencies. Can distributed file systems be safe, consistent, yet fast? The paper states that it is possible to achieve this by using operating system support for lightweight checkpointing and speculative execution. The idea is simple. The main rea

Live migration of virtual machines

This is a 2005 paper by the same group at Cambridge that developed the Xen virtualization system discussed in the previous post . This review will be short since it is only based on the notes I took while listening to the paper presentation in the seminar. Live virtual machine (VM) migration is the relocation of a VM from one machine to another while its applications continue to execute. The motivation for live migration is to perform load management across machines and even across data centers. Another motivation is fault-tolerance, if the original host needs to go down due to faults, maintenance, or power outage, migration can provide availability of the applications running on the hosted VMs. The challenges with live migration of VMs is to minimize the downtime and to provide a seamless relocation of the VM so the VM can continue to operate normally in its new address. One strawman for migrating the contents of VM memory is the stop-and-copy approach. This approach leads to a

Xen and the art of virtualization

This week in the seminar class we discussed the Xen virtualization paper from SOSP 2003. Xen is the first open source virtualization solution, however, Vmware was already available in 2000 as a commercial solution for virtualization. A virtual machine (VM) is a software implementation of a machine (i.e. a computer) that executes instructions like a physical machine. The biggest benefit of virtualization is in server consolidation: enabling efficient usage of computer server resources in order to reduce the total number of servers that an organization requires. Thanks to the virtualization's ability to separate the OS and application from the hardware, it becomes possible to run multiple applications (in complete isolation from each other) on each server instead of just one application per server. This increases the utilization rate of servers and prevents the problem of "server sprawl", a situation in which multiple, under-utilized servers take up more space and resou

Mencius: building efficient replicated state machines for Wide-Area-Networks (WANs)

I will write my short summary of the Mencius paper from OSDI08 here. Writing this helped me understand a couple subtle points better, so I hope it will also help others. Replicated state machines is a common approach for achieving fault-tolerance. State machine replication works by replicating the state and function across multiple servers (replicas) and using consensus to agree on the sequence of the commands the replicas execute. Assuming deterministic service, since all replicas execute the same sequence of commands, they all keep the same state, and strong consistency guarantees are provided to the face of server failures. Paxos is a fault-tolerant implementation of replicated state machines. Paxos has become hugely popular because it is the only formally proven protocol that works in the face of asynchronous model. Paxos preserves safety (no incorrect decisions are taken by replicas) to the face of violations of timing assumptions on the channels and servers. Paxos satisfies prog

Future of powerpoint presentations

This Ted talk reminded me of a dream I had. In that dream, hundreds of soldiers are being shipped into a big empty convention center room, and they are being deployed into lines that are only a meter across from other enemy combatant soldiers. Between these two combatant lines there are some soil bags lined up, but the height of these bag blocks is only waist level. Everyone is so serious, this is war, so they take these deployment instructions very seriously. Nobody wants to make a mistake. After the deployment is finished, the commander is supposed to announce start, and soldiers are supposed to start firing at each other. I wake up before this happens. This is pure madness and terror. I felt disturbed by this very visual image for many days. But if you think about it, this was pretty much how WW1 and WW2 were in abstract. I think that deployment would make a really good performance art project. It is both abstract and concrete(visual) at the same time. It drives the point home. It

Rethinking Enterprise Network Control

This post will cover three closely related papers together: Rethinking Enterprise Network Control NOX: Towards an Operating System for Networks OpenFlow: Enabling Innovation in Campus Networks These papers are all about simplifying the job of the network administrators, that is, that of monitoring and enforcing policies over a large network. Existing approaches for network administration are hard to implement and maintain. One of these approaches is using middleboxes at network chokepoints (e.g., firewalls). Another is adding functionality to networks (e.g., using diagnostic tools, controls for VLANs, access-control lists, filters). These papers advocate the use of an omniscient central controller to simplify the implementation and enforcing of policies and simplifying network equipment. The papers argue that despite its centralized nature, the controller has good scalability in this approach. The three objectives of this approach are: 1) Network should be governed by policies declar

The Chubby Lock Service for Loosely-Coupled Distributed Systems, Burrows, OSDI 2006

I didn't read this paper before class, because I thought I had read it earlier. It turns out I hadn't and I was confusing it with the "Paxos Made Live" PODC 2007 paper. I realized this only towards the middle of the class :-) The "Paxos Made Live" paper focused on the Paxos consensus implementation that makes the state replication and availability work in Chubby. In contrast, this Chubby Lock Service paper focuses only on the lock service part and is very client-centric. It is the need and uses cases of the clients (i.e., the programmers in Google) that has dictated the design of the service and the interfaces, and even the presentation of the paper. Chubby serves course grain locks rather than fine-grained locks to keep the load light. The paper mentions that the primary goal is to provide reliability/availability and thruput/storage are secondary goals. Chubby uses Paxos for consistent state replication among the five machines in the Chubby cell. The det

The Google File System, Ghemawat et al., SOSP 2003

The motivation for the GFS arised because the traditional filesystems didn't fit the bill for Google's use cases: Firstly, failures are continual (i.e., always happening) since Google has thousands of machines. Secondly, multi-GB files are common in Google due to mapreduce jobs. (Though, the paper does not mention mapreduce at all, the mapreduce paper appeared the next year in 2004.) And finally, again due to mapreduce, in Google's use cases file access is read/append mostly, random writes are very rare. (For mapreduce jobs, it was more important to have a high sustained bandwidth than low latency atomic appends.) The GFS architecture is very simple (to the credit of the designers). Clients talk to a single master to get a handle on the chunk to read/write, and then using the handle clients pull/push the data to the chunkservers. Chunks are replicated over (by default) three chunkservers, one of which is designated as a primary replica by the master. So the master in fact

Popular posts from this blog

Learning about distributed systems: where to start?

Hints for Distributed Systems Design

Foundational distributed systems papers

Metastable failures in the wild

The demise of coding is greatly exaggerated

Scalable OLTP in the Cloud: What’s the BIG DEAL?

The end of a myth: Distributed transactions can scale

SIGMOD panel: Future of Database System Architectures

Why I blog

There is plenty of room at the bottom