Showing posts from November, 2019

Book Review. The Dark Forest (2008)

This book is the second book in the trilogy titled "Remembrance of Earth's Past" by Liu Cixin. I had reviewed the first book in the series, "The Three Body Problem", earlier.  I had listened to this book as audiobook. The reader of the audiobook, the voice-actor, was very competent, and also remarkably good in speaking in accents. This made the already engaging story more captivating. As in the first book, this book also introduces new concepts. The Dark Forest theory is one. The theory posits that the universe is like a dark forest, where everybody is out there to hunt anybody. Due to chain-of-suspicion, a civilization can never be certain of an alien civilization's true intentions. The extreme distances between stars creates an insurmountable "chain of suspicion", where any two civilizations cannot communicate well enough to dissipate mistrust, making conflict inevitable. Leaving a primitive civilization alone is not an option due to the e

Paper review. Threshold Logical Clocks for Asynchronous Distributed Coordination and Consensus

This is a recent arxiv paper by Bryan Ford, EPFL. The figures I use are from Bryan's presentation . The paper introduces a threshold logical clock (TLC) abstraction and uses it to implement decentralized asynchronous consensus on top. In contrast to Ben-Or which implements decentralized asynchronous binary consensus, TLC based Que-Sera-Consensus (QSC) achieves consensus for arbitrary values proposed. After I summarize the paper, I will compare/contrast QSC with Paxos, Texel/Avalanche, and Ben-Or. Threshold Logical Clocks TLC ensures that a number of nodes progress through logical time in a lock-step fashion. On reaching logical time-step s, each node waits for a threshold tm of broadcasts received from s before it can proceed to step s+1. Different nodes may see different subsets of the tm messages. The adversarial network schedule ultimately determines this, but can we at least measure a-posteriori the success/failure of a given message's propagation to other nodes?

SOSP19 File Systems Unfit as Distributed Storage Backends: Lessons from 10 Years of Ceph Evolution

This paper is by  Abutalib Aghayev (Carnegie Mellon University), Sage Weil (Red Hat Inc.), Michael Kuchnik (Carnegie Mellon University), Mark Nelson (Red Hat Inc.), Gregory R. Ganger (Carnegie Mellon University), George Amvrosiadis (Carnegie Mellon University) Ceph started as research project in 2004 at UCSC. At the core of Ceph is a distributed object store called RADOS. The storage backend was implemented over an already mature filesystem. The filesystem helps with block allocation, metadata management, and crash recovery. Ceph team built their storage backend on an existing filesystem, because they didn't want to write a storage layer from scratch. A complete filesystem takes a lot of time (10 years) to develop, stabilize, optimize, and mature. However, having a filesystem in the path to the storage adds a lot of overhead. It creates problems for implementing efficient transactions. It  introduces bottlenecks for metadata operations. A filesystem directory with millions of

Book diet Oct-Nov 2019

Here are some books I listened to in the last couple months. These were all audiobooks that the Libby app enabled me to borrow from my public library online. It is convenient to listen to books rather than reading them. On the other hand, I think I don't retain as much information when I listen to books. It feels like I learn better visually than by listening. Feelings aside, as one concrete difference, I can't take notes when I listen. When I read a physical book, I use a blank letter page as page separator and on it I note down important concepts/ideas I encounter. When I read an ebook on Kindle, I can more conveniently highlight paragraphs, and have them available for me as highlights. The Kingdom of Speech (Tom Wolfe) Tom Wolfe wrote this book in 2016 , and died in 2018 at age 88. He was a master storyteller and journalist associated with the New Journalism style. ( New Journalism is a literary style reminiscent of long-form non-fiction and emphasizing "truth&qu

Seventh grade openhouse

Recently, I went to my son's 7th grade openhouse. It was a nice setup. They made me follow my son's daily routine, visiting the same classrooms in the same order, but with 10 minute class time where the teachers gave information about what they would be doing this year. Overall I am impressed by what I saw and heard. My main takeaways was that the school has embraced a hands-on learning curriculum, and is making very good/wise use of technology. My impression from Math, Science, ELA classes was that, the middle school is using the flipped-classroom model to a great extent. There is little lecturing and a lot of group work. The science class presentation was exciting. Instead of memorizing, the class emphasizes hypothesizing, testing, and reasoning. Art class also sounded exciting. They added new media modules. This is not surprising given that half the 12 years old population list being a YouTuber as their career choice. The art teacher said that their goal is to get t

Book Review. Digital minimalism: Choosing a Focused Life in a Noisy World

"Digital Minimalism: Choosing a Focused Life in a Noisy World"  is Cal Newport's new book. The topic of the book is clear from the title. You should be quitting your Facebook checking, Twitter chatting, Imgur browsing, phone fiddling habits. No more monkey business. It is time to get to work. Deep work. In Chapter 4 of the book, Calvin forms the term * solitude deprivation *. Solitude deprivation is on the other end of the spectrum to solitary confinement, but it can also be bad for you as well over a long duration. The book argues that today we all experience solitude deprivation. The smartphones, laptops, and screens do not give us time to be alone with our thoughts and process things in our speeds. I had heard a nice story, where the Amazon natives recruited for an expedition in to the jungle would take long breaks after doing some walking. They would say they are waiting for their soul to catch up to their bodies. Today we don't give time for our souls to catch

SOSP19. I4: Incremental Inference of Inductive Invariants for Verification of Distributed Protocols

This paper is by Haojun Ma (University of Michigan), Aman Goel (University of Michigan), Jean-Baptiste Jeannin (University of Michigan), Manos Kapritsos (University of Michigan), Baris Kasikci (University of Michigan), Karem A. Sakallah (University of Michigan). This paper is about formal verification of distributed systems. Writing proofs manually is cumbersome. Existing tools for formal verification all require the human to find the inductive invariant. I4 combines power of Ivy (a tool for interactive verification of infinite-state systems) and model checking in order to find inductive invariant without relying on human intuition. Ivy takes as input a protocol description and a safety property, and guides the user interactively to discover an inductive invariant. The goal for finding an inductive invariant is to prove that the safety property always holds. An inductive proof has a base case, which proves initial state is safe, and an inductive step, which proves if state k is sa

SOSP19 Lineage Stash: Fault Tolerance Off the Critical Path

This paper is by Stephanie Wang (UC Berkeley), John Liagouris (ETH Zurich), Robert Nishihara (UC Berkeley), Philipp Moritz (UC Berkeley), Ujval Misra (UC Berkeley), Alexey Tumanov (UC Berkeley), Ion Stoica (UC Berkeley). I really liked this paper. It has a simple idea, which has a good chance of getting adopted by real world systems. The presentation was very well done and was very informative. You can watch the presentation video here. Low-latency processing is very important for data processing, stream processing, graph processing, and control systems. Recovering after failures is also important for them, because for systems composed of 100s of nodes, node failures are part of daily operation. It seems like there is a tradeoff between low latency and recovery time. The existing recovery methods either have low runtime overhead or low recovery overhead, but not both. Global checkpoint approach to recovery achieves a low runtime overhead, because a checkpoint/snapshot can be ta

SOSP19 Verifying Concurrent, Crash-safe Systems with Perennial

This paper is by Tej Chajed (MIT CSAIL), Joseph Tassarotti (MIT CSAIL), Frans Kaashoek (MIT CSAIL), Nickolai Zeldovich (MIT CSAIL). Replicated disk systems, such as file systems, databases, and key-value stores, need both concurrency (to provide high performance) and crash safety  (to keep your data safety). The replicated disk library is subtle, but the paper shows how to systematically reason about all possible executions using verification. (This work considers verification of a single computer storage system with multiple disk --not a distributed storage system.) Existing verification frameworks support either concurrency (CertiKOS [OSDI ’16], CSPEC [OSDI ’18], AtomFS [SOSP ’19]) or crash safety (FSCQ [SOSP ’15], Yggdrasil [OSDI ’16], DFSCQ [SOSP ’17]). Combining verified crash safety and concurrency is challenging because: Crash and recovery can interrupt a critical section, Crash can wipe in-memory state, and Recovery logically completes crashed threads' operations.

SOSP19 Day 2, Scaling Symbolic Evaluation for Automated Verification of Systems Code with Serval

Verification session was the first session for Day 2. I like formal methods, and I did enjoy these papers. In this post I will only talk about the first paper in the session, the Serval paper. ( You can read about SOSP19 Day 1 here. ) This paper is by Luke Nelson (University of Washington), James Bornholt (University of Washington), Ronghui Gu (Columbia University), Andrew Baumann (Microsoft Research), Emina Torlak (University of Washington), Xi Wang (University of Washington). This paper received a best paper award at SOSP19, and the software is publicly available at . SOSP has a tradition of publishing systems verification papers, such as seL4 (SOSP’09), Ironclad Apps (OSDI’14), FSCQ (SOSP’15), CertiKOS (PLDI’16), Komodo (SOSP’17). A downside of systems verification is it is very effort-intensive. The Certikos manual proof consisted of more than 200K lines. To help address this problem,  this paper introduces Serval, a framewo

SOSP19 Day 1 wrap up

It was only 3 sessions into day 1, and my brain was fried. Machine learning session   Blockchain session   Debugging session   Conferences are tiring because you are exposed to so many new ideas in a short time. It was clear I would not be able to pay attention to the papers in the last session, so I skipped that session (the privacy session which included the following three papers) to go for a walk at the golf park behind the conference center. Privacy Accounting and Quality Control in the Sage Differentially Private ML Platform by Mathias Lecuyer (Columbia University), Riley Spahn (Columbia University), Kiran Vodrahalli (Columbia University), Roxana Geambasu (Columbia University), Daniel Hsu (Columbia University).  Honeycrisp: Large-scale Differentially Private Aggregation Without a Trusted Core. Edo Roth (University of Pennsylvania), Daniel Noble (University of Pennsylvania), Brett Hemenway Falk (University of Pennsylvania), Andreas Haeberlen (University of Pennsylvania). 

SOSP19 Day 1, Debugging session

This session was the first session after lunch and had four papers on debugging in large scale systems. CrashTuner: Detecting Crash Recovery Bugs in Cloud Systems via Meta-info Analysis This paper is by Jie Lu (The Institute of Computing Technology of the Chinese Academy of Sciences), Chen Liu (The Institute of Computing Technology of the Chinese Academy of Sciences), Lian Li (The Institute of Computing Technology of the Chinese Academy of Sciences), Xiaobing Feng (The Institute of Computing Technology of the Chinese Academy of Sciences), Feng Tan (Alibaba Group), Jun Yang (Alibaba Group), Liang You (Alibaba Group) . Crash recovery code can be buggy and often result in catastrophic failure. Random fault injection is ineffective for detecting them as they are rarely exercised. Model checking at the code level is not feasible due to state space explosion problem. As a result, crash-recovery bugs are still widely prevalent. Note that the paper does not talk about "crush" b

SOSP19 Day 1, Blockchain session

The second session on Day 1 was on blockchains. There were three papers on this session. Teechain: A Secure Payment Network with Asynchronous Blockchain Access This paper is by Joshua Lind (Imperial College London), Oded Naor (Technion), Ittay Eyal (Technion), Florian Kelbert (Imperial College London), Emin Gun Sirer (Cornell University), Peter Pietzuch (Imperial College London). Bitcoin's throughput is a measly 4 transactions per second as limited by Nakamoto consensus. This leads to work on off-chain scaling via payment networks, e.g. lightning network .  The payment network consists of nodes that establish pairwise connections. To establish a multihop connection these pairwise connections are utilized as intermediaries. There are three phases to a payment network: setup a multihop connection, process payments for some time, perform a settlement where the final balance is written back to the blockchain. To guard against a roll-back attack and ensure that the correct bala

Popular posts from this blog

Learning about distributed systems: where to start?

Hints for Distributed Systems Design

Foundational distributed systems papers

Metastable failures in the wild

Scalable OLTP in the Cloud: What’s the BIG DEAL?

SIGMOD panel: Future of Database System Architectures

The end of a myth: Distributed transactions can scale

There is plenty of room at the bottom

Distributed Transactions at Scale in Amazon DynamoDB

Dude, where's my Emacs?