Posts

Our K-Drama Journey

Finding a show that suits everyone in our family has always been a challenge. Intense action/stress is too much for our youngest daughter, and we prefer to keep things PG-13. We’ve finally found a great solution: Korean-dramas. We’ve been watching them back to back recently, and they’ve been a hit across all ages. Hometown Cha-Cha-Cha  (2021)  This was our gateway into K-dramas. Since it’s dubbed in English, it was easy to start. Set in a small seaside town, it’s a wholesome, family-friendly show with both comedic moments and touching scenes. It also gave us a glimpse into everyday Korean life. It's a solid feel-good watch. Queen of Tears  (2024)  We picked this next because it was also dubbed. This one was a commitment (16 episodes, each 1.5 hours long) but it was worth it. It had more romance, more drama, better cinematography, and a stellar cast. The intense drama in some scenes made our youngest bawling. We got so hooked that we ended up binge-watching the l...

Smart Casual Verification of the Confidential Consortium Framework

Image
This paper (NSDI'25) applies lightweight formal methods (hence the pun "smart casual" in contrast to formal attire) to the Confidential Consortium Framework (CCF). CCF is an open-source platform for trustworthy cloud applications, used in Microsoft's Azure Confidential Ledger service. The authors combine formal specification, model checking, and automated testing to validate CCF's distributed protocols, specifically its custom consensus protocol. Background CCF uses trusted execution environments (TEEs) and state machine replication. Its consensus protocol is based on Raft but has major modifications: signature transactions (Merkle tree root over the whole log  signed by the leader), optimistic acknowledgments, and partition leader step-down. CCF also avoids RPCs, using a uni-directional messaging layer. These changes add complexity, requiring formal verification. At the time of writing, CCF's implementation spans 63 kLoC in C++. The authors define five requi...

What makes entrepreneurs entrepreneurial?

Image
Entrepreneurs think and act differently from managers and strategists. This 2008 paper argues that entrepreneurs use effectual reasoning, the polar opposite of causal reasoning taught in business schools. Causal reasoning starts with a goal and finds the best way to achieve it. Effectual reasoning starts with available resources and lets goals emerge along the way. Entrepreneurs are explorers, not generals. Instead of following fixed plans, they experiment and adapt to seize whatever opportunities the world throws at them. Consider this example from the paper. A causal thinker starts an Indian restaurant by following a fixed plan: researching the market, choosing a prime location, targeting the right customers, securing funding, and executing a well-designed strategy. The effectual entrepreneur doesn’t start with a set goal. She starts with what she has (her skills, knowledge, and network), and she experiments. She might begin by selling homemade lunches to friends’ coworkers. If that...

My Time at MIT

Image
Twenty years ago, in 2004-2005, I spent a year at MIT’s Computer Science department as a postdoc working with Professor Nancy Lynch. It was an extraordinary experience. Life at MIT felt like paradise, and leaving felt like being cast out. MIT Culture MIT’s Stata Center was the best CS building in the world at the time. Designed by Frank Gehry, it was a striking abstract architecture masterpiece ( although like all abstractions it was a bit leaky ). Furniture from Herman Miller complemented this design. I remember seeing price tags of $400 on simple yellow chairs. The building buzzed with activity.  Every two weeks, postdocs were invited to the faculty lunch on Thursdays, and alternating weeks we had group lunches. Free food seemed to materialize somewhere in the building almost daily, and the food trucks outside were also good. MIT thrived on constant research discussions, collaborations, and talks. Research talks were advertised on posters at the urinals, as a practical touch of M...

Hanging in there

I have been reviewing papers for USENIX ATC and handling work stuff at MongoDB Research. I cannot blog about either yet. So, instead of a paper review or technical blog, I share some random observations and my current mood. Bear with me as I vent here. You may disagree with some of my takes. Use the comments to share your thoughts. Damn, Buffalo. Your winter is brutal and depressing. ( Others on r/Buffalo also suffer ; many suggest video games, drugs, or drinking.) After 20 years of Buffalo winters, I am fed up with the snow and cold. When I taught distributed systems course in fall semesters, I would ask the new batch of Indian students how many had never seen snow, and all hands would shoot up. I would warn them by winter's end they would despise that magic fairy dust. Ugh, sidewalks are already piled high with snow that freezes, muddies, and decays into holes. Forgive the gloomy start. I had a bad flu ten days ago. Just as I began to recover, another milder bout struck. My joint...

Intelligence wants to be everywhere

Image
Imagine a world where intelligence permeates every corner of existence, from the devices in your home to the trees in your backyard. This is a world where everything is alive with contemplation, purpose, and the ability to learn, adapt, and grow. A world where intelligence radiates from everywhere. Ubiquitous AI In mathematics, one way to understand a concept is to push it to its extremes. Let's apply that to AI. Enabled by the rapid advancements in LLMs, inference capabilities, chip efficiency, and energy availability, imagine a future where AGI is embedded in the fabric of our lives, radiating from everyday objects. Technology has always moved toward the ethereal. We went from horses to cars powered by liquid fuel, and then to electric vehicles that run on invisible currents of energy. Electricity is easier to transmit, store, and harness than gasoline. I was struck by this recently when I saw an electric car charging in a remote state park. No gas stations and no pipelines aroun...

GaussDB-Global: A Geographically Distributed Database System

Image
This paper , presented in the industry track of ICDE 2024 , introduces GaussDB-Global (GlobalDB) , Huawei's geographically distributed database system. GlobalDB replaces the centralized transaction management (GTM) of GaussDB with a decentralized system based on synchronized global clocks (GClock) . This approach mirrors Google Spanner's TrueTime approach and its commit-wait technique, which provides externally serializable transactions by waiting out the uncertainty interval. However, GlobalDB claims compatibility with commodity hardware, avoiding the need for specialized networking infrastructure for synchronized clock distribution. The GClock system uses GPS receivers and atomic clocks as the global time source device at each regional cluster. Each node synchronizes its clock with the global time source over TCP every 1 millisecond. Clock deviation is kept low because synchronization is achieved within 60 microseconds as a TCP round trip, and the CPU’s clock drift is bound...

Use of Time in Distributed Databases (part 5): Lessons learned

Image
This concludes our series on the use of time in distributed databases , where we explored how use of time in distributed systems evolved from a simple ordering mechanism to a sophisticated tool for coordination and performance optimization. A key takeaway is that time serves as a shared reference frame that enables nodes to make consistent decisions without constant communication. While the AI community grapples with alignment challenges, in distributed systems we have long confronted our own fundamental alignment problem. When nodes operate independently, they essentially exist in their own temporal universes. Synchronized time provides the global reference frame that bridges these isolated worlds, allowing nodes to align their events and states coherently. At its core, synchronized time serves as an alignment mechanism in distributed systems. As explored in Part 1, synchronized clocks enable nodes to establish "common knowledge" through a shared time reference, which is pow...

Use of Time in Distributed Databases (part 4): Synchronized clocks in production databases

Image
This is part 4 of our "Use of Time in Distributed Databases" series . In this post, we explore how synchronized physical clocks enhance production database systems. Spanner Google's Spanner (OSDI'12) implemented a novel approach to handling time in distributed database systems through its TrueTime API. TrueTime API provides time as an interval that is guaranteed to contain the actual time, maintained within about 6ms (this is 2012 published number which improved significantly since then) of uncertainty using GPS receivers and atomic clocks. This explicit handling of time uncertainty allows Spanner to provide strong consistency guarantees while operating at a global scale. Spanner uses multi-version concurrency control (MVCC) and achieves external consistency (linearizability) for current transactions through techniques like "commit wait," where transactions wait out the uncertainty in their commit timestamps before making their writes visible. Spanner uses ...

I Can’t Believe It’s Not Causal! Scalable Causal Consistency with No Slowdown Cascades

Image
I recently came across the Occult paper (NSDI'17) during my series on "The Use of Time in Distributed Databases." I had high expectations, but my in-depth reading surfaced significant concerns about its contributions and claims. Let me share my analysis, as there are still many valuable lessons to learn from Occult about causality maintenance and distributed systems design. The Core Value Proposition Occult (Observable Causal Consistency Using Lossy Timestamps) positions itself as a breakthrough in handling causal consistency at scale. The paper's key claim is that it's "the first scalable, geo-replicated data store that provides causal consistency without slowdown cascades." The problem they address is illustrated in Figure 1, where a slow/failed shard A (with delayed replication from master to secondary) can create cascading delays across other shards (B and C) due to dependency-waiting during write replication. This is what the paper means by "...

Popular posts from this blog

Hints for Distributed Systems Design

Learning about distributed systems: where to start?

My Time at MIT

Making database systems usable

Looming Liability Machines (LLMs)

Scalable OLTP in the Cloud: What’s the BIG DEAL?

Advice to the young

Foundational distributed systems papers

Distributed Transactions at Scale in Amazon DynamoDB

Linearizability: A Correctness Condition for Concurrent Objects