Metadata

Posts

Showing posts from February, 2017

1 million pageviews

- February 21, 2017

My blog has recently reached 1 million pageviews. This warrants for a short retrospection. I started the posting regularly on September 2010. I wanted to get into the cloud computing domain, so I needed to accumulate background on cloud computing work. I decided that as I read papers on cloud computing, I will post a summary to this blog. I thought if I could explain what I learned from the papers in my own words, I would internalize those lessons better. And if others read those summaries and benefit, that is an extra plus. "Writing is nature's way of telling you how sloppy your thinking is." In fact, I learned a lot writing those paper reviews. Writing the reviews gave me a deeper understanding of the work done, beyond what I could achieve by passively reading them. Putting them on web was also a nice choice, because I could refer my students to some of these summaries when needed. And it turned out that I referred to those summaries myself very frequently to jog ...

Bowling your way to the top

- February 18, 2017

"Oh, this is very American!" I said, when I finally understood how Bowling scoring works. Bowling scoring is nonlinear In a bowling game, there are 10 rounds. There are 10 pins, and you get 2 shoots in each round to knock as many as you can. Even if you are novice, if you are eager and put effort in it, at each round you can knock down 6 pins. So that gives you a score of 6*10=60. If you knock down 7 pins at each round, you get a score of 70. 8 pins, you get a score of 80. 9 pins, you get a score of 90. Here is where things start to go nonlinear and you get accelerated returns. If you knock down all the 10 pins in your two shoots, this is called a spare. Your score for that round is not just 10, but the point you get from the next round is also added to it. So if you had a spare in round k, and got 7 in the next round k+1, you get 10+7 for round k, and 7 for round k+1, and in total of 17+7=24 points from these two rounds. If we were scoring this linearly, you wou...

Mesos: A platform for fine-grained resource sharing in the data center

- February 16, 2017

This paper appeared in NSDI 11 and introduced the Mesos job management and scheduling platform which proved to be very influential in the big data processing ecosystem. Mesos has seen a large following because it is simple and minimalist. This reminds me of the "worse is better" approach to system design. This is an important point and I will ruminate about this after I explain you the Mesos platform. The problem We need to make multiple frameworks coexist and share the computing resources in a cluster. Yes, we have submachine scheduling abstractions: first the virtual machines and then containers. But we still need a coordinator/arbiter to manage/schedule jobs submitted from these frameworks to make sure that we don't underutilize or overload/overtax the resources in the cluster. Offer-based scheduling Earlier, I have talked about Borg which addressed this cluster management problem. While Borg (and later Kubernetes) takes a request-based scheduling approac...

Large-scale cluster management at Google with Borg

- February 11, 2017

This paper from Google appeared on Eurosys'15 . The paper presents Borg, the cluster management system Google used since 2005. The paper includes a section at the end about the good and bad lessons learned from using Borg, and how these led to the development of Kubernetes container-management system which empowers the Google Cloud Platform and App Engine. Borg architecture This is the Borg. Resistance is futile. A median Borg cell is 10K machines. And all those machines in a cell are served by a logically centralized control: the Borgmaster. Where is the bottleneck in the centralized Borg architecture? The paper says it is still unclear whether this architecture would hit a practical scalability limit. Anytime Borg was given a scalability target, they managed to achieve it by applying basic techniques: caching, loose-synchronization, and aggregation. What helped the most for achieving scalability was decoupling the scheduler component from the Borgmaster. The scheduler...