Metadata

Posts

Showing posts with the label reading-group

Deep reading

- February 09, 2022

How many hours does it take you to read a ~10 page research paper? — Murat Demirbas (Distributolog) (@muratdemirbas) February 8, 2022 Twenty years ago, a well-known professor in computer networking field told me that he reviews any paper in 30 minutes. Not just read the paper, but also write the review, mind you. All in 30 minutes! I said "I am slow it takes me 4 hours to read a paper". I lied. It actually took me 8+ hours to read papers, because I was a graduate student and didn't have much background and paper reading experience. Things have improved, but it still takes me 4-8 hours to read a paper and understand everything so I can write an informative conference review or blog post about the paper. (Of course, I am talking about good research papers, not content-free publication for the sake of publication papers.) Maybe because of that early encounter with that flamboyant networking professor, I always felt I am very slow in reading a paper. I checked with fellow ...

Sundial: Fault-tolerant Clock Synchronization for Datacenters

- March 21, 2021

This paper appeared recently in OSDI 2020 . This paper is about clock synchronization in the data center. I presented this paper for our distributed systems zoom meeting group . I took a wider view of the problem by explaining time synchronization challenges and fundamental techniques to achieve precise time synchronization. I will take the same path in this post as well. It is a bit circuitous road, but it gives a scenic pleasurable journey. So let's get going. The benefits of better time synchronization For any distributed system, timestamping and ordering of events is a very important thing. Processes in a distributed system run concurrently without knowing what the other processes are doing at the moment. Processes learn about each other's states only by sending and receiving messages and this information by definition come from the past state of the nodes. The process needs to compose the coherent view of the system from these messages and all the while the system is movi...

Read papers, Not too much, Mostly foundational ones

- February 20, 2021

Here is my advice to people who want to develop competence and expertise in their fields. Read papers By papers, I mean technical research papers, not white papers or blog posts. By read, I mean read rigorously and critically . Not too much If you read rigorously and critically, you cannot read too many papers. Moreover, learning by doing is the only way to internalize and grok a concept. If you read papers all day, you don't have time to try things yourself. If you are a PhD student, maybe read two or three papers a week (but, remember, rigorously and actively). If you are not in academia, maybe read one paper a week or two. Mostly foundational ones While there are exceptions, it is better to prioritize: seminal work over incremental work, general principled work over point-solutions, work introducing techniques/tools over work applying techniques A big exception is good expository papers. Unfortunately, the academia treats them as something the cat...

High availability in cheap distributed key value storage

- November 13, 2020

This paper is authored by Thomas Kim, Daniel Lin Kit, Gregory Ganger, Michael Kaminsky, and David Andersen. It appeared in SOCC 2020. The paper talks about using NVMM for building a distributed key value store. NVMMs are a new technology. They are slowly rolling into the datacenters, but there are still questions about their performance and how many writes it could handle before a write wear. NVMMs have very low latency comparable to DRAM, yet they are 10 times cheaper. Awesome, right? Unfortunately they don't have the same high bandwidth as DRAM or SSDs. Also, they are still not anywhere as cheap as SSDs, and it may not be affordable to want to build all NVMM key-value stores. Before reading this paper, it is important to understand that this is all about money, money, money. The cost of NVMM and SSDs influences the design decisions, and lead to this non-straightforward heterogeneous design. If cost was not an issue, we could even have DRAMs for all replicas in K-V ...

Ocean Vista: Gossip-Based Visibility Control for Speedy Geo-Distributed Transactions

- November 01, 2020

This paper occurred in VLDB'19 and is authored by Hua Fan and Wojciech Golab. The paper is about providing strict serializability in geo-replicated databases. The technique it uses can be summarized as "geo-replicate-ahead transactions". First the transaction T is replicated to all the parties across the datacenters. The parties then check to see that the watermark rises above T to ensure that all transactions including and preceding T has been replicated successfully to all the parties. Then, the execution of T can be done asynchronously at each party. This should remind you of the SLOG paper : SLOG uses a deterministic architecture to move most of this communication outside of conflict boundaries, enabling these transactions to be processed at high throughput. More specifically, SLOG relies on deterministic processing to avoid two phase commit. Once all parties agree to the plan, processing occurs (mostly) independently on each node, with the system relying on the pl...

My Distributed Systems Seminar's reading list for Fall 2020

- August 28, 2020

For the Fall semester distributed systems seminar, we will discuss these papers: Bipartisan Paxos: A Family of Fast, Leaderless, Modular State Machine Replication Protocols eXtreme Modelling in Practice Starling: A Scalable Query Engine on Cloud Function Services Lambada: Interactive Data Analytics on Cold Data using Serverless Cloud Infrastructure Tiered Replication: A Cost-effective Alternative to Full Cluster Geo-replication Scalable State-Machine Replication Designing Distributed Systems Using Approximate Synchrony in Data Center Networks Armada: Low-Effort Verification of High-Performance Concurrent Programs Ocean Vista: Gossip-Based Visibility Control for Speedy Geo-Distributed Transactions Consolidating Concurrency Control and Consensus for Commits under Conflicts Tales of the Tail: Hardware, OS, and Application-level Sources of Tail Latency Near-Optimal Latency Versus Cost Tradeoffs in Geo-Distributed St...

The Impact of RDMA on Agreement

- August 07, 2020

This paper appeared in PODC 2019. PODC stands for Principles of Distributed Computing. It is a theoretical distributed systems conference, first established in 1982. I had published my first ever paper, " Resettable Vector Clocks " at PODC 2000. The conference was held in Portland and, as a fresh graduate student attending it, I was awe-struck at the conference. On day one of the conference, I saw Leslie Lamport interrupting a talk asking a question and protesting loudly. Then Keith Marzullo in the audience (I think) pointed out the misunderstanding and Leslie Lamport said "Nevermind" and calmed down. I also noticed that the person sitting next to me was, oh my God, Nancy Lynch! I couldn't believe my luck seeing all these distributed systems royalty in person. Also in this conference, on day three, Eric Brewer gave his CAP theorem talk. Good times! Anyways, back to the paper. Contributions of the paper Under the message-passing model, BFT consensus requires ...

Streamlet: textbook streamlined blockchains

- July 22, 2020

In this 2020 technical report , the authors claim to present a very simple BFT consensus protocol. I am taking it that "textbook" in the title is used as praise for simplicity of the algorithm. Nakamoto consensus is definitely simpler than Streamlet, but in contrast to Nakamoto consensus Streamlet provides deterministic finalization and does not suffer from the costly proof-of-work (POW) and the low-throughput POW induces. And, among the protocols that provide deterministic finalization, including PBFT, Tendermint, HotStuff, yes, Streamlet is the simplest protocol. So, yes, Streamlet deserves your attention. Streamlet protocol Streamlet is an amalgamation of existing techniques. It borrows stuff from Casper, HotStuff, and Elaine Shi's earlier work. But that's OK. The goal is to show the simplest protocol, not necessarily the most novel or the most efficient protocol. Here is the protocol, verbatim from the introduction of the paper. In Streamlet, each epoch has a de...

The FuzzyLog: A partially ordered shared log (OSDI'18)

- June 26, 2020

This paper appeared in OSDI 18, and is authored by Joshua Lockerman, Jose M. Faleiro, Juno Kim, Soham Sankaran, Daniel J. Abadi, James Aspnes, Siddhartha Sen, Mahesh Balakrishnan. The paper does not suggest many novel ideas or algorithms, but rather shows a nice amalgamation of existing ideas and algorithms for providing a more scalable version of a shared log across multiple datacenters. Having a shared log, and funneling all updates through a globally shared log provides simplicity for building fault-tolerant consistent distributed systems. Tango paper from SOSP'13 , upon which they build here, argued for the simplicity benefits of such a system. Recently Jay Kreps wrote a nice and accessible explanation of the benefits of using globally shared logs for building distributed systems. Despite the simplicity it provides, maintaining a shared log with a system-wide total order is impractical, because it is expensive : At large scale, the single sequencer will be a bottleneck for the...

State-Machine Replication for Planet-Scale Systems (Eurosys 2020)

- June 21, 2020

This paper appeared at Eurosys 2020, and is authored by Vitor Enes, Carlos Baquero, Tuanir França Rezende, Alexey Gotsman, Matthieu Perrin, and Pierre Sutra. The paper introduces Atlas, a consensus protocol similar to EPaxos, that builds on the Fast Paxos idea. Most people call these protocols leaderless, but I think that is a misnomer. I call these protocols as opportunistic/any-leader protocols. In these protocols, any node can propose a command by becoming an opportunistic leader and using a fast path. If there is no other simultaneously proposed command by another opportunistic leader or if this command commutes with all simultaneously proposed commands, the fast path is enough for achieving consensus. In the presence of concurrent non-commuting commands, the protocol will have to take a slow path, the purpose of which is to establish (teach to the other nodes) the dependencies learned for this command in the fast path attempt. This opportunistic/any-leader family of protocols...