Posts

Showing posts from 2012

Our brains are radios

Ok, our brains are not actually radios, sorry for the provocative title. But as I will explain shortly, there are some interesting parallels. When I teach about wireless radios in my networking classes, I talk about this connection, and this always perks my students up. The connection stems from the fact that both radios and brains are signal processing systems. The concepts behind how radio receivers work explain some of the mysterious phenomena in our brains, particularly the hearing of phantom tunes.

This is how radios work Most people have an incorrect model of how radios and wireless radio communication work. The layman thinks radios are simple. The transmitter does all the heavy lifting, and puts the signal on the air. Before the signal fades away completely, if the receiver device is in communication range, it will pick up the signal by listening. This naive understanding suggests that the receiver is a passive device, as listening is perceived mostly as a passive activity. Bu…

From Clarity to Efficiency for Distributed Algorithms

Image
Y. A. Liu, S. D. Stoller, B. Lin, M. Gorbovitski, Appeared in OOPSLA'12.

This paper describes a high-level language, DistAlgo, for clear description of distributed algorithms (e.g., Paxos, mutual execution). In contrast to embarrassingly parallel tasks which involve very regular and simple synchronization patterns (e.g., MapReduce tasks), distributed algorithms generally involve complex and tight synchronization patterns. DistAlgo specifies these complex synchronization conditions via using logical quantifications over message histories of processes involved in the algorithm. But, unfortunately, this turns out to be extremely inefficient if executed straightforwardly: each quantifier will cause a linear factor in running time, and any use of the history of messages sent and received will cause space usage to be unbounded.

To address this inefficiency, this paper focuses on optimizing implementations of DistAlgo programs by incrementalization of these expensive synchronization conditi…

Making Geo-Replicated Systems Fast as Possible, Consistent when Necessary

Image
Appeared in OSDI '12, Cheng Li, Daniel Porto, Allen Clement, Johannes Gehrke, Nuno Preguica, and Rodrigo Rodrigues.

In order to reduce latencies to geographically distributed users, big webservices companies (Google, Yahoo, Facebook, Twitter) replicate data across geographical regions. But replication across datacenters, create cross-site consistency problems, which is further complicated with the huge WAN latency delays. If you want to have strong consistent updates across sites, you have to pay the price in terms of latency (basically you revert to doing synchronous replication via say Paxos as in MDCC). This is the ELC part in the PACELC.

To alleviate this latency versus consistency tension, this paper proposes RedBlue consistency, which enables blue operations to be fast/asynchronous (and eventually consistent) while the remaining red operations are strongly-consistent/synchronous (and slow). So a program is partitioned into red and blue operations, which run with different co…

Sensys'12 (day2)

Here are my much anticipated (at least by Ted :-) notes from the second day of Sensys.

Kyun queue: a sensor network system to monitor road traffic queues I missed this talk, but this sounds like an interesting paper. The idea seems to be to deploy a transmitter and a receiver node on the opposite sides of a road. Depending on the density and the speed of the cars passing between the transmitter and the receiver, the received radio signal has different signatures. The work studies these signals and figures out the road conditions based on these.

Low cost crowdcounting using audio tones Crowd counting can be useful for event planning, learning popularity of aisles, etc. This work assumes that everyone in the crowd has a smartphone and this counting app running on their phone. (This is the biggest limitation of the work.) This work uses the audio tones (again ultrasound tones) emitted by the phones to count the people. The app employs the smartphone mic and speaker, no additional hardwa…

Sensys'12 (day 1)

I am attending Sensys'12  at Toronto, which is a short distance from Buffalo. Sensys is a premiere conference for wireless sensor network and, more recently, smartphone sensing research. Here are my rough notes from the first day of conference.

Energy-efficient GPS sensing with cloud offloading This paper is really good, it blew me away. Later at the conference banquet Wednesday night, the paper got selected as the Best Paper. This work aims to break up the GPS as a blackbox and see how we can make it more energy-efficient. The paper goes into fairly technical detail about how GPS works, which I cannot replicate here. Some basics are: GPS uses CDMA to transmit 50bps information over a carrier signal of 1.575Ghz. Seeing more satellites improves accuracy. GPS localization requires the following phases on the device: acquisition, tracking, decoding, code-phase, least-square calculation.

The insight of the paper is to offload some of these phases to the cloud, so the device does less…

An abundance of solutions

My son started the Kindergarten this year. I try to be a good parent and work with him closely on his sight-words and handwriting. Today we were reviewing his weekly reader magazine, and I read to him the following question from the magazine.
There is only one pumpkin left at the pumpkin patch. Mingo and Zip both want the pumpkin. What should they do? I guess the easy answer (maybe, the answer the magazine tries to teach) is that they share the pumpkin. But my son thought about a little bit, and said they should ask the farmer if the farmer has more pumpkins somewhere. I went along, and together we thought of several more solutions to the problem.

Mingo and Zip carve the pumpkin nicely, and sell the carved pumpkin at a good price at the bazaar and buy two pumpkins with that money.They divide the pumpkin into half. For dividing fairly, you know the trick right. One of them gets to cut in two pieces, and the other gets to choose. (However, my wife argues this is not fair if the person t…

MDCC: Multi-Data Center Consistency

Image
In order to reduce latencies to geographically distributed users, Google, Yahoo, Facebook, Twitter, etc., replicate data across geographical regions. But replication across datacenters, over the WAN, is expensive. WAN delays are in the hundreds of milliseconds and vary significantly, so common wisdom is that synchronous wide-area replication is unfeasible, which means strong consistency should be diluted/relaxed. (See COPS paper for an example.)

In the MDCC work (Tim Kraska, Gene Pang, Michael J. Franklin, Samuel Madden, March 2012), the authors describe an "optimistic commit protocol, that does not require a master or partitioning, and is strongly consistent at a cost similar to eventually consistent protocols". Optimistic and strongly-consistent is an odd/unlikely couple. The way MDCC achieves this is by starting with Paxos, which is a strongly consistent protocol. MDCC then adds optimizations to Paxos to obtain an optimistic commit protocol that does not require a master…

Auto-dropboxify all web-downloaded documents

I often download pdf presentations from the web, but then when I reopen them in my laptop later, I wonder if they got updated/changed in the meanwhile. It would be nice to have a service where my local copy also gets automatically updated/synced (a la dropbox) when the authoritative source copy gets changed. (The difference from the dropbox model is here the source does not need to cooperate by deploying a sync software, e.g. dropbox, or even be aware that its document is being synced.)

So here is my project idea: auto-sync all web/url downloaded documents, so the downloaded local copy of a file is kept up-to-date with the authoritative copy on the web. I think many people will find this useful. Some other examples of documents downloaded from the web and would benefit from auto-updating include: city documents (e.g., street-parking rules, garbage collection rules, etc.), event calendars, tax documents, CVs. (Do you have any other examples?)

This shouldn't be too hard to build. A…

Building fault-tolerant applications on AWS

Image
Below is my summary of the Amazon Web Services (AWS) whitepaper released on Oct 2011 by Jeff Barr, Attila Narin, and Jinesh Varia.

AWS aims to simplify the task of building and maintaining fault-tolerant distributed systems/services for its customers. For this, AWS prescribes the customers to embrace the following philosophy when they are building their applications on AWS.

1. Make computing nodes disposable/easily replaceable AWS (as with any cloud provider actually) employs a level of indirection over the physical computer, called virtual machine (VM), to make computing nodes easily replaceable. You then need a template to define your service instance over a VM, and this is called Amazon Machine Image (AMI). The first step towards building fault-tolerant applications on AWS is to create your own AMIs. Starting your application then is simply a matter of launching VM instances on Amazon Elastic Compute Cloud (EC2) using your AMI. Once you have created an AMI, replacing a failing ins…

Don’t Settle for Eventual: Scalable Causal Consistency for Wide-Area Storage with COPS

Image
Wyatt Lloyd, Michael J. Freedman, Michael Kaminsky, David G. Andersen
Proc. 23rd ACM Symposium on Operating Systems Principles
(SOSP ’11) Cascais, Portugal, October 2011.

Geo-replicated, distributed data stores are all the rage now. These stores provide wide area network (WAN) replication of all data across geographic regions in all the datacenters. Geo-replication is very useful for global-scale web-applications, especially social networking applications. In social networking, your updates to your account (tweets, posts, etc) are replicated across several regions because you may have followers there, and they should get low-latency access to your updates. The paper refers to properties desired of such a geo-replication service with the acronym ALPS (Availability, low Latency, Partition-tolerance, and high Scalability).

Of course, in addition to ALPS, we need to add some consistency requirement to the geo-replication service, because otherwise the updates to a record can arrive at repli…

Scalable distributed data structures for internet service construction

Image
I think this 2000 paper (by Gribble, Brewer, Hellerstein, and Culler) may as well be the original NoSQL paper. The paper starts off by identifying the problems with RDBMS that prohibit scalability. (This is how you would motivate a NoSQL key-value store system even today :-)
RDBMSs have not been designed with Internet service workloads, service properties, and cluster environments in mind, and as a result they fail to provide the right scaling, consistency, or availability guarantees. RDBMSs permit users to decouple the logical structure of their data from its physical layout, which is a good thing, but excessive data independence (isolation of application from modifying the layout of data definition and organization) can make parallelization and therefore scaling hard. RDBMSs always choose consistency over availability. The paper then advocates a design sweet-point for achieving scalable and consistent data management for web services. RDBMS is out because it provides a too-high-a-lev…

Crowdsourced line wait-time forecasting using smartphones

Have you ever been frustrated with the unpredictability of the coffee shop line waiting times? I have, and far too many times. (I am fast growing into a grumpy old man.) I frequent the Tim Hortons coffee shop at our Student Union at University at Buffalo. Sometimes I would walk there to grab a quick mocha only to find a long waiting line and I would have to walk all the way back with empty hands. One day it hit me that this is a perfect opportunity to put our research on crowdsourced coordination and collaboration using smartphones to good use. We proceeded to develop Android and iPhone apps that forecasts the current (and near future) line waiting times at this coffee shop.

In the first version of our app, we asked users to manually provide line wait-times when they are waiting in line and tried to serve other users with the data input by these. Our model was: "if you used our app 5 times to get wait-time forecasted, before you can use it again, we ask you to report the wait ti…

My advice to the 2012 class

I feel sad at graduations. When I graduated college in 1997, all I felt was sadness and melancholy, rather than joy. I chose not to attend my MS and PhD graduation ceremonies, but I remember I felt sad after both defenses. It turns out, I also feel sad at my students' graduations. I graduated 3 PhD students and a couple MS students with the thesis option. It was always sad to see them depart.

This semester, I have been teaching distributed systems to the graduating class (seniors) at my sabbatical institute, Bilkent University. They are bright and talented (for example, Ollaa was developed by 3 of them). For the last class of their semester and their college lives as well, instead of discussing yet more about distributed systems, we walked out of the classroom, sat on the grass outside, and had a nice chat together.

I gave my students some parting advice. I tried to convey what I thought would be most useful to them as they pursue careers as knowledge/IT workers and researchers. …

Replicated/Fault-tolerant atomic storage

The cloud computing community has been showing a lot of love for replicated/fault-tolerant storage these days. Examples of replicated storage at the datacenter level are GFS, Dynamo, Cassandra, and at the WAN level PNUTS, COPS, and Walter. I was searching for foundational distributed algorithms on this topic, and  found this nice tutorial paper on replicated atomic storage: Reconfiguring Replicated Atomic Storage: A Tutorial, M. K. Aguilera, I. Keidar, D. Malkhi, J-P Martin, and A. Shraer, 2010.

Replication provides masking fault-tolerance to crash failures. However, this would be a limited/transient fault-tolerance unless you reconfigure your storage service to add a new replica to replace the crashed node. It turns out, this on-the-fly reconfiguration of a replicated storage service is a subtle/complicated issue due to the concurrency and fault-tolerance issues involved. This team at MSR @ Slicon Valley has been working on reconfiguration issues in replicated storage for some time.…

Popular posts from this blog

I have seen things

SOSP19 File Systems Unfit as Distributed Storage Backends: Lessons from 10 Years of Ceph Evolution

PigPaxos: Devouring the communication bottlenecks in distributed consensus

Frugal computing

Learning about distributed systems: where to start?

Fine-Grained Replicated State Machines for a Cluster Storage System

My Distributed Systems Seminar's reading list for Spring 2020

My Distributed Systems Seminar's reading list for Fall 2020

Cross-chain Deals and Adversarial Commerce

Book review. Tiny Habits (2020)