Posts

Showing posts from 2019

HotStuff: BFT Consensus in the Lens of Blockchain

Image
This paper appeared in PODC 2019, and is by Maofan Yin, Dahlia Malkhi, Michael K. Reiter, Guy Golan Gueta, and Ittai Abraham.  The paper presents HotStuff, a leader-based Byzantine fault-tolerant consensus protocol. HotStuff forms the basis of LibraBFT which is used in Facebook's Libra project. HotStuff uses the partially synchronous model for liveness, but it is safe even under an asynchronous model. (This is also how Paxos operates, as a crash tolerant consensus protocol.) Once network communication becomes synchronous, HotStuff enables a correct leader to drive the protocol to consensus at the pace of actual (vs. maximum) round duration --a property called responsiveness . Another innovation in HotStuff is that it provides communication complexity that is linear  (rather than quadratic) in the number of replicas. In other words, all-to-all broadcasts are replaced with only participant-to-leader and leader-to-participant communication, with rotating leaders. HotStuff is the fir

Cross-chain Deals and Adversarial Commerce

Image
This paper appeared in VLDB'19 and is authored by Maurice Herlihy, Barbara Liskov, and Liuba Shrira. How can autonomous, mutually-distrusting parties cooperate safely and effectively? This question is very important for enabling commerce. The trading world answered this question so far by relying on a trusted third party, and in the worst case, on the government/rule-of-law to litigate parties deviating from their contracts. With prevalence of e-commerce and decentralization, this question is recently  considered in *trustless* settings by modern distributed data management systems. Solving the trustless multi-party cooperation when all the parties use the same blockchain is achievable via smartcontracts, but solving the problem where the parties use different blockchains bring many additional challenges. Some of these challenges are familiar to us from the classical distributed systems research on distributed transactions, such as how to combine multiple steps into a single

Decade in Review

We are entering 2020, and this is a good time to retrospect and review the past decade from the lens of this blog, which started late 2010. I started posting regularly on September 2010. I wanted to get into the cloud computing domain, so I needed to accumulate background on cloud computing work. I decided that as I read papers on cloud computing, I will post a summary to this blog. I thought if I could explain what I learned from the papers in my own words, I would internalize those lessons better. And if others read those summaries and benefit, that is an extra plus. Initially I reviewed and posted paper summaries on big data processing systems and NoSQL distributed databases to catch up on these areas. Around 2010s, misrepresentation of the CAP theorem by NoSQL proponents was a problematic issue. I covered some papers that try to clarify this issue. Throughout the years, I covered many papers that discussed the different consistency guarantees offered by distributed databases.

My Distributed Systems Seminar's reading list for Spring 2020

Below is the first draft list of papers I plan to discuss in my distributed systems seminar in the Spring semester. If you have some suggestions on some good/recent papers to cover, please let me know. Paxos Canopus: A Scalable and Massively Parallel Consensus Protocol  (CoNext17)  Consus taming the Paxi   Stable and consistent membership at scale with rapid  (ATC18) Unifying consensus and atomic commit  (VLDB19)  Wormspace: A modular foundation for simple, verifiable distributed systems  (SOCC19)  Replication Mergeable replicated data types  (OOPSLA19)  Exploiting Commutativity For Practical Fast Replication  (NSDI19)  Amazon Aurora: On Avoiding Distributed Consensus for I/Os, Commits, and Membership Changes  (SIGMOD18)  Dynamic atomic storage without consensus (JACM 2011)  PaxosStore:  High-availability Storage Made Practical in WeChat  (VLDB17) Transactions/consistency Interactive checks for coordination avoidance  (VLDB19) SLOG: serializable, low-late

The Ben-Or decentralized consensus algorithm

Image
In PODC 1983, Michael Ben-Or published a randomized distributed asynchronous consensus algorithm in a paper titled "Another advantage of free choice (Extended Abstract): Completely asynchronous agreement protocols" . After 15 years, a paper in 1998 provided a correctness proof of Ben-Or's algorithm . Above is the pseudocode for Ben-Or from that paper. From the pseudocode it looks like Ben-Or is a very simple algorithm, but in reality its behavior is not easy to understand and appreciate. But fret not, TLA+ modeling and model checking helps a lot for understanding the behavior of the Ben-Or algorithm. It is fulfilling when you finally understand how the algorithm works, and how safety is always preserved and progress is achieved eventually probabilistically. I had assigned modeling of Ben-Or as the TLA+ project for my distributed systems class . Here I share my PlusCal modeling of Ben-Or, and discuss how this model helps us to understand the algorithm better. Here is

How to ask better questions

This is a followup to my "Master Your Questioning Skills" post. There I gave some suggestions about how to ask better questions, but didn't engage with the topic directly. Coming up with the questions is not difficult, once you get out of your default mode and give yourself the license to indulge in the questioning frame of mind.  By calling these questions MAD questions, I gave myself the license/protection to go wild, uninhibited by traditional assumptions/constraints. This gave rise to reducing the bar, and approaching the topic with a beginner mind, as an outsider. This made it easier to ask more original questions. By detaching and discarding the baggage of tradition, you can start working from the first principles.  To get to the good questions, get over with the bad questions. Bad questions lead to good questions. So roll with them and exhaust them to get to the good questions.  A good question gives you a perspective change that is productive. It opens a

Moneyball, but for academia

"Moneyball: The Art of Winning an Unfair Game" is a book by Michael Lewis, published in 2003, about the Oakland Athletics baseball team and its general manager Billy Beane. Its focus is the team's analytical, evidence-based, sabermetric approach to assembling a competitive baseball team despite Oakland's small budget.  I had zero knowledge and interest about baseball, but the book was very engaging, and I could stop reading. Michael Lewis is one of my favorite writers. I had read Flash Boys, Big Short , and Next by him, and all of them were very good. Fundamentally, Moneyball is about making radical but rational choices to the face of flawed ways of the tradition. Where there is a tradition-ridden unoptimized market, there is potential for disruption: if you are brave enough to do things differently, you can benefit a lot from doing so.  Initially only a few people are daring enough to see this and ignore the tradition and status-quo to start doing things differe

I am looking for PhD students

I am trying to recruit PhD students and since this is the application season for graduate school, I thought I should also advertise here. If you are interested, please contact me via email. Below are the two projects I plan to get new students started on. Yes, it snows a lot in Buffalo. But the summer is beautiful, and Niagara Falls region is 20 minute drive from the university. Paxos Unpacked Due to their excellent fault-tolerance and consistency benefits, Paxos protocols are employed at the core of many distributed systems infrastructures at Google, Facebook, Amazon, and Microsoft. When properly tailored and optimized, Paxos family of protocols can deliver efficiency, performance, and scalability at par with weakly consistent protocols, while providing a stable and strong foundation to build services and applications on top. In this project, we will explore performant, scalable, practical and usable variants of Paxos. We already implemented the Paxi framework to facilit

On the advisor-mentee relationship

I had previously given advice about how to find your advisor . But I realize I haven't talked about the advisor-mentee relationship directly before. The advisor is the window through which the student gets to perceive research and academia from. Advisors imprint the students with their approach to research, taste in research problems, perspective on solutions. Advisors also act as role models to the students in a more general sense. I see that the advisor's qualities often live on the student for a long time --at least through the assistant professorship period. For example, if the advisor is kind and generous to students, the mentee is nice to students, etc. This is a big responsibility for the advisors, and they should be aware of their roles in their interactions with their mentees. The advisor's main job is to ask questions and guide strategically where and when it matters. Acting as the leader, the advisor should provide purpose, give direction, and occasionally mo

Year in Review: Best of Metadata

This year I bring the end of the year a bit early. Because this year, in addition to a "year in review", I will also get to post a "decade in review", and I should make space for it. Ok, here are some highlights among my 2019 posts. Distributed systems Reading list for our distributed systems class Cloud programming simplified: A Berkeley view on Serverless Computing Do leases buy us anything? Timely Algorithms Two-phase commit and beyond SOSP19 Day 1, Debugging session Sharding the Shards: Managing Datastore Locality at Scale with Akkio Paxos A generalized solution to distributed consensus Linearizable Quorum Reads in Paxos Dissecting performance bottlenecks of strongly-consistent replication protocols Is this consensus? Teaching Paxi Paxos jokes Blockchains Avalanche is a descendent of Texel SOSP19 Day 1, Blockchain session Threshold Logical Clocks for Asynchronous Distributed Coordination and Consensus Book review Range

Book Review. The Dark Forest (2008)

This book is the second book in the trilogy titled "Remembrance of Earth's Past" by Liu Cixin. I had reviewed the first book in the series, "The Three Body Problem", earlier.  I had listened to this book as audiobook. The reader of the audiobook, the voice-actor, was very competent, and also remarkably good in speaking in accents. This made the already engaging story more captivating. As in the first book, this book also introduces new concepts. The Dark Forest theory is one. The theory posits that the universe is like a dark forest, where everybody is out there to hunt anybody. Due to chain-of-suspicion, a civilization can never be certain of an alien civilization's true intentions. The extreme distances between stars creates an insurmountable "chain of suspicion", where any two civilizations cannot communicate well enough to dissipate mistrust, making conflict inevitable. Leaving a primitive civilization alone is not an option due to the e

Paper review. Threshold Logical Clocks for Asynchronous Distributed Coordination and Consensus

Image
This is a recent arxiv paper by Bryan Ford, EPFL. The figures I use are from Bryan's presentation . The paper introduces a threshold logical clock (TLC) abstraction and uses it to implement decentralized asynchronous consensus on top. In contrast to Ben-Or which implements decentralized asynchronous binary consensus, TLC based Que-Sera-Consensus (QSC) achieves consensus for arbitrary values proposed. After I summarize the paper, I will compare/contrast QSC with Paxos, Texel/Avalanche, and Ben-Or. Threshold Logical Clocks TLC ensures that a number of nodes progress through logical time in a lock-step fashion. On reaching logical time-step s, each node waits for a threshold tm of broadcasts received from s before it can proceed to step s+1. Different nodes may see different subsets of the tm messages. The adversarial network schedule ultimately determines this, but can we at least measure a-posteriori the success/failure of a given message's propagation to other nodes?

Popular posts from this blog

The end of a myth: Distributed transactions can scale

Hints for Distributed Systems Design

Foundational distributed systems papers

Learning about distributed systems: where to start?

Metastable failures in the wild

Scalable OLTP in the Cloud: What’s the BIG DEAL?

SIGMOD panel: Future of Database System Architectures

The demise of coding is greatly exaggerated

Dude, where's my Emacs?

There is plenty of room at the bottom