Decade in Review

We are entering 2020, and this is a good time to retrospect and review the past decade from the lens of this blog, which started late 2010.
I started posting regularly on September 2010. I wanted to get into the cloud computing domain, so I needed to accumulate background on cloud computing work. I decided that as I read papers on cloud computing, I will post a summary to this blog. I thought if I could explain what I learned from the papers in my own words, I would internalize those lessons better. And if others read those summaries and benefit, that is an extra plus.
Initially I reviewed and posted paper summaries on big data processing systems and NoSQL distributed databases to catch up on these areas. Around 2010s, misrepresentation of the CAP theorem by NoSQL proponents was a problematic issue. I covered some papers that try to clarify this issue. Throughout the years, I covered many papers that discussed the different consistency guarantees offered by distributed databases. Transactions in distributed databases also became a favorite topic in the blog. After reading papers on this topic, I dipped my toes into the distributed databases research area. Distributed databases are a very important and practical topic in the industry. A good geo-distributed database solves a lot of problems for big companies. Last year I was at Microsoft Azure Cosmos DB for my sabbatical and got to learn more about this domain.

Spanner's use of synchronized clocks in distributed databases was a big milestone. This work led us to think about hybrid vector clocks, and later hybrid logical clocks. This later led us to develop Retroscope for querying consistent cuts in distributed systems. Nowadays we are thinking about timely protocols, in continuation of this line.

Cloud computing has always been a big part of this blog. I discussed some datacenter networking papers, but I did not really get in to that field. On the cloud computing topic, there was a lot of excitement about  containers and microservices. Based on the problems discussed in that domain, I have written an exploratory technical paper called stabilization in the cloud. I think that is still an open and interesting problem. Recently serverless (function as a service) is all the rage in the cloud environments and there have been several interesting papers on the topic.

I really got into Paxos in this last decade. I didn't expect to fall for Paxos protocols this hard. I had first encountered Paxos around 1999 in the distributed systems reading seminars I attended as a PhD student. I guess I had liked it back then, but it didn't make a big impression on me. From 2007-2015 Paxos gradually became more and more popular and important in cloud computing. In this blog we had more than 20 Paxos posts only in the last two years.

My students and I have a love-hate relationship with Paxos. We understand it very well and are among the top experts on the topic. But, unfortunately,  academia lost some of its excitement about Paxos and a so called "Paxos fatigue" developed. Because of this cold shoulder, we tell ourselves that perhaps we should be working on other things. On the other hand, Paxos is still transforming and ruling over large scale distributed systems deployments in the industry (one recent example is Facebook's use of Paxos to build a scalable control plane). Paxos is one of the most impactful ideas in cloud computing stacks. And despite the reduced interest from the academia, there is still a vast unexplored algorithm design space, and more work is needed to tailor Paxos  for specific distributed systems deployments, topologies, and workloads.  A striking evidence of this arrived in 2017 with the flexible quorum breakthrough which came unanticipated almost 30 years after the Paxos protocol was first proposed. This further opened up the design space for customizing Paxos to different environments and workloads, which is yet to be fully realized. We hope we will be able to convince more researchers to care about and work on these problems. In any case, we can't seem to pull ourselves away from working on Paxos variants. They are a lot of fun. I am working on Paxos Unpacked now.
Ok, enough about Paxos...

In 2016, the machine learning field exploded and went mainstream. Distributed systems support for machine learning became a hot topic. I learned some machine learning by following online courses and reading papers. I really appreciated the neat mathematics, differential computation employed here. As far as developing distributed systems support for machine learning, we performed some surveys, and thought about the topic for a while. But I gave up after sometime, thinking that it would be hard to do principled algorithmic work here because the ML field works very close to the application, and the solutions are pretty application-specific. It turns out that I was judgmental, as several nice algorithmic and distributed systems started coming out recently.

Blockchains got a lot of hype recently. It took me a long time to get in to blockchains, even though it is a very closely related topic to distributed consensus. When I finally offered a seminar on blockchains in Spring 2018, I started appreciating some of the good work done in this domain, and certain parts of the vision for decentralized computing. I love the premise of ICOs for democratizing the stock market and of smartcontracts for enabling decentralized e-trade without any middleman (please hurry up, we need a decentralized search engine and big data analytics for this to work). Unfortunately the blockhain field has been perpetually overhyped and this damages the progress in the field. I think we finally started to see the hype dying and more solid work in the field reconvening.

MAD questions

What trends are brewing that we are missing? 
Bitcoin was released in 2009 and most of us missed it as we enter 2010. What are some trends that are brewing silently that we are missing now?

IOT and 3D printing areas have been seeing increasingly more interest. But there has not been any revolutionary breakthrough yet as far as I can see.

Differentiable programming seems interesting to me.

Quantum computing is also seeing some nice progress, but I don't know much about the area.

The cloud/datacenter computing came a long way in this last decade. I think the serverless model, dataflow architectures for analytics/transactions, and the use of RDMAs are some promising trends. I am also happy to see increased adoption of formal methods and verification for distributed systems. But, maybe since I am embedded too much in the field, I am unable to identify a big hit clearly for the next coming decade.

Comments

Popular posts from this blog

Hints for Distributed Systems Design

Learning about distributed systems: where to start?

Making database systems usable

Looming Liability Machines (LLMs)

Advice to the young

Foundational distributed systems papers

Distributed Transactions at Scale in Amazon DynamoDB

Linearizability: A Correctness Condition for Concurrent Objects

Understanding the Performance Implications of Storage-Disaggregated Databases

Designing Data Intensive Applications (DDIA) Book