Posts

Showing posts from December, 2023

Nezha: Deployable and High-Performance Consensus Using Synchronized Clocks

Image
Nezha (VLDB'23) is a consensus protocol that leverages synchronized clocks to decrease latency and increase throughput. There is also a GitHub repo for the implementation of Nezha and TLA+ model associated with the protocol. Nezha's approach is to offload the traditional leader or sequencer-based ordering to synchronized clocks, achieving decentralized coordination without the need to rely on network routers or sequencers. Here, time synchronization is leveraged on a best-effort basis, with no impact on correctness. You guessed it right: there is a fast-path where the best-effort message ordering works, and the client waits for a super-majority quorum of replies ordered consistently. And then there is a slow-path that covers for the case where that fails. The evaluation suggests that Nezha outperforms previous protocols significantly, including an order of magnitude improvements in throughput. But the evaluations are performed with ideal conditions, and overloook the metastab...

Best of Metadata in 2023

It is that most wonderful time of the year again. Time to reflect back on the best posts at Metadata blog in 2023. Distributed systems Hints for Distributed Systems Design :   I have seen these hints successfully applied in distributed systems design throughout my 25 years in the field, starting from the theory of distributed systems (98-01), immersing into the practice of wireless sensor networks (01-11), and working on cloud computing systems both in the academia and industry ever since. Metastable failures in the wild : Metastable failure is defined as permanent overload with low throughput even after the fault-trigger is removed. It is an emergent behavior of a system, and it naturally arises from the optimizations for the common case that lead to sustained work amplification. Towards Modern Development of Cloud Applications : This is an easy-to-read paper, but it is not an easy-to-agree-with paper. The message is controversial: Don't do microservices, write a monolith,...

Lifting the veil on Meta’s microservice architecture: Analyses of topology and request workflows

Image
This paper appeared in USENIX ATC'23 . It is about a survey of microservices in Meta (nee Facebook). We had previously reviewed a microservices survey paper from Alibaba. Motivated maybe by the desire for differentiation, the Meta paper spends the first two sections justifying why we need yet another microservices survey paper. I didn't mind reading this paper at all, it is an easy read. The paper gives another design point/view from industry on microservices topologies, call graphs, and how they evolve over time. It argues that this information will help build more accurate microservices benchmarks and artificial microservice topology/workflow generators, and also help for future microservices research and development. I did learn some interesting information and statistics about microservices use in Meta from the paper. But I didn't find any immediately applicable insights/takeaways to improve the quality and reliability of the services we build in the cloud.     The con...

Popular posts from this blog

Hints for Distributed Systems Design

Learning about distributed systems: where to start?

Making database systems usable

Looming Liability Machines (LLMs)

Advice to the young

Foundational distributed systems papers

Distributed Transactions at Scale in Amazon DynamoDB

Linearizability: A Correctness Condition for Concurrent Objects

Understanding the Performance Implications of Storage-Disaggregated Databases

Designing Data Intensive Applications (DDIA) Book