Showing posts from September, 2010

The Design and Implementation of a Log-Structured File System

by Rosenblum, Mendel and Ousterhout, John K. SOSP 1991. PDF link for the paper The motivation for the log-structured filesystem (LFS) are threefold. 1) Traditional file systems (say UNIX FAST filesystem) perform poorly for write. FAST requires 6 writes to create a new one block file. 2) Main memory caching capacity is increasing, which takes care of reads, so the workloads to disks are dominated by writes. And this is bad, see above. 3) There is a large gap between random and sequential i/o performance. Random is dominated by seeking time (which is very costly), whereas you avoid the seek time with sequential i/o. The LFS idea is simple: focus on write performance and utilize sequential bandwidth of disks. Here is how it works. LFS buffers all writes into a segment in main memory. (Even though the writes may be for different files, they are appended into the segment.) When the segment is full, LFS writes the segment to unused place on disk sequentially again. Since LFS writes sequen

Pig Latin: A Not-So-Foreign Language for Data Processing

I will be honest, I haven't read the week-3 papers. But this does not stop me from writing a short summary of the Pig Latin paper based on the notes I took during its class discussion. So I apologize if there is a mistake in my review. (The second week-3 paper was the Dryad: Distributed Data-Parallel Programs from Sequential Building Blocks. Unfortunately we had little time to discuss this paper, and I also had to leave early. So no review for that paper. I didn't understand much, so I will just paraphrase from its abstract "A Dryad application combines computational vertices with communication channels to form a dataflow graph. Dryad runs the application by executing the vertices of this graph on a set of available computers, communicating as appropriate through files, TCP pipes, and shared-memory FIFOs.") Pig Latin: A Not-So-Foreign Language for Data Processing Christopher Olston, Benjamin Reed, Utkarsh Srivastava, Ravi Kumar, Andrew Tomkins (Yahoo Research) Pig

MapReduce Online

I am running behind my blogging the seminar. Here are some short notes on the papers we have been discussing in the Data Center Computing seminar. The " MapReduce Online" paper by "Tyson Condie, Neil Conway, Peter Alvaro, Joseph M. Hellerstein, Khaled Elmeleegy and Russell Sears" was a reading assignment for the first week, but we didn't have the time to discuss it. This paper is written very simple and provides a clean discussion of Hadoop's inner workings that I felt I should write about the paper just for this reason. The paper presents a modified Hadoop mapreduce framework that allows data to be pipelined between operators, and supports online aggregation, which allows users to see "early returns" from a job as it is being computed. In Hadoop, a downstream worker is blocked until the previous worker process the entire data and make it available in entirety. In contrast, pipelining delivers data to downstream operators more promptly. Not onl

Twitter, open the firehose!

Twitter has 30 millions of users just in US. The users tweet about everything that interest or bother them. These tweets are public and over 50 million tweets are added to this pool everyday. It is no surprise that data mining researchers have been swarming to Twitter data to use it for mining public opinion and trends. Data mining researchers have a problem, though. In this year's KDD conference (arguably the top conference on data mining), the researchers were unanimously vocal about the difficulties of acquiring the data they need from Twitter. The researchers need to learn Twitter API and other networking tools to acquire the data. And Twitter imposes a strict and very low-quota on the data it serves per IP. Of course Twitter does this to ensure that their system is not overwhelmed by these third party requests and can continue to serve its users. To get access to more data there is a lengthy process of whitelisting from Twitter. But, I guess, Twitter is unlikely to whitelis

Intermediate storage support for mapreduce for improved fault-tolerance

The second paper we discussed in the first week of the data center computing seminar is Steve Ko 's work on improving fault-tolerance in mapreduce jobs. The notes below are just the edited version of the notes I took during Steve's presentation, so mistakes and omissions are most probably mine. Making Cloud-Intermediate-Data Fault-Tolerant Steven Y. Ko, Imranul Hoque, Brian Cho, and Indranil Gupta Proceedings of the ACM Symposium on Cloud Computing 2010 (SOCC), 2010 Problem: Yes, mapreduce is fault-tolerant in the sense that the master will reassign a failed map or reduce task to another worker to get the job done. But, this approach to fault-tolerance introduces latency to the jobs. Steve's experiments showed that 1 failure in 20 machines led to 33% increase in runtime in a single mapreduce task. We have a worse problem in our hands when we consider mapreduce job chains. A worker failure in the middle of the chain necessitates a cascaded re-execution till the top of the c

MapReduce: Simplified Data Processing on Large Clusters

Steve Ko joined our department this semester from UIUC by way of Princeton. He is offering a seminar course on data center computing . I am sitting in his seminar to catch-up on these topics. I thought I might as well write some posts about the papers we discuss in that seminar. The first post is on the "MapReduce: Simplefied Data Processing on Large Clusters , J. Dean et al., OSDI 2004" paper. Several companies use mapreduce. Mapreduce helps companies process large data. Mapreduce is popular because it provides a simple abstraction for parallelizing, and hide the meticuluous tasks involved in parallelizing in the datacenter under the hood. Mapreduce has been inspired by the map reduce operations in Lisp. (map square '(1 2 3 4)) output is (1 4 9 16) (reduce + '(1 4 9 16)) output is 30 because of (+ 16 (+ 9 (+ 4 1))) Google noticed that they keep doing tasks that follow the same pattern: they individually process web pages (map), and then gather-reprocess-fuse (

Writing advice

Elements of Style gives very good writing advice. use the active voice put statements in positive form omit needless words use definite, specific, concrete language keep related words together work from a suitable design express coordinate ideas in similar form be concise be precise be clear revise and rewrite But, I think this has got to be the best writing advice. Originally due to Rory Marinich . It's easy to make something incredible. All you do is, don't let what you're doing be shit. If something is shitty, you make it better. There are usually a few hundred thousand things in anything that are shitty, but they're all easily fixable. You just have to dedicate yourself to not being shit. Anything can be shit. An idea can be shit. Don't execute a shitty idea. A design or a lyric or a guitar part or a photo can be shit. If it is, do it better. It baffles me that people think making really, really brilliant, stupendous, worldshocking pieces of work is a part

Popular posts from this blog

Foundational distributed systems papers

Your attitude determines your success

My Distributed Systems Seminar's reading list for Fall 2020

I have seen things

Learning about distributed systems: where to start?

PigPaxos: Devouring the communication bottlenecks in distributed consensus

Read papers, Not too much, Mostly foundational ones

Sundial: Fault-tolerant Clock Synchronization for Datacenters

Facebook's software architecture

Paxos unpacked