Posts

Showing posts from May, 2012

My advice to the 2012 class

I feel sad at graduations. When I graduated college in 1997, all I felt was sadness and melancholy, rather than joy. I chose not to attend my MS and PhD graduation ceremonies, but I remember I felt sad after both defenses. It turns out, I also feel sad at my students' graduations. I graduated 3 PhD students and a couple MS students with the thesis option. It was always sad to see them depart. This semester, I have been teaching distributed systems to the graduating class (seniors) at my sabbatical institute, Bilkent University. They are bright and talented (for example, Ollaa was developed by 3 of them). For the last class of their semester and their college lives as well, instead of discussing yet more about distributed systems, we walked out of the classroom, sat on the grass outside, and had a nice chat together. I gave my students some parting advice. I tried to convey what I thought would be most useful to them as they pursue careers as knowledge/IT workers and research

Replicated/Fault-tolerant atomic storage

The cloud computing community has been showing a lot of love for replicated/fault-tolerant storage these days. Examples of replicated storage at the datacenter level are GFS, Dynamo, Cassandra, and at the WAN level PNUTS, COPS, and Walter. I was searching for foundational distributed algorithms on this topic, and  found this nice tutorial paper on replicated atomic storage: Reconfiguring Replicated Atomic Storage: A Tutorial , M. K. Aguilera, I. Keidar, D. Malkhi, J-P Martin, and A. Shraer, 2010. Replication provides masking fault-tolerance to crash failures. However, this would be a limited/transient fault-tolerance unless you reconfigure your storage service to add a new replica to replace the crashed node. It turns out, this on-the-fly reconfiguration  of a replicated storage service is a subtle/complicated issue due to the concurrency and fault-tolerance issues involved. This team at MSR @ Slicon Valley has been working on reconfiguration issues in replicated storage for some ti

Popular posts from this blog

Hints for Distributed Systems Design

Learning about distributed systems: where to start?

Making database systems usable

Looming Liability Machines (LLMs)

Foundational distributed systems papers

Advice to the young

Linearizability: A Correctness Condition for Concurrent Objects

Scalable OLTP in the Cloud: What’s the BIG DEAL?

Understanding the Performance Implications of Storage-Disaggregated Databases

Designing Data Intensive Applications (DDIA) Book