Posts

TLA+ modeling of a single replicaset transaction modeling

Image
For some time I had been playing with transaction modeling and most recently with replicaset modeling by way of a single log. While playing with these, I realized I can build something cool on top of these building blocks. I just finished building snapshot isolated transaction modeling that sit on top of a replicaset log. This is also a high level approximation of MongoDB-style snapshot isolated transactions on a single replicaset. I put this up on github at  https://github.com/muratdem/MDBTLA/tree/main/SingleShardTxn Next, I will work on extending this to modeling MongoDB multishard transactions. That will be a whole lot of more work, but I think I have a good foundation to take on the challenge.   Modeling process My starting point was these components. ClientCentricIsolation: This is a module for checking isolation levels on transactions using an ops-log per transaction. I discussed this here two years ago .   KVSnap: This is a high level snapshot isolated singlenode key-value store

Know Yourself

MongoDB has a nice leadership development program internally. They suggested that filling/sharing this questionnaire would be useful to get you acquainted with the people you work with daily. I am not super into being in-touch with your feelings stuff, so I thought I probably won't do this. But then, the questions were interesting, and I jot down my answers to them quickly. I don't think this took me more than 5 minutes or so, and I felt good after having answered the questions. Maybe this gave me a little bit of self-acceptance, and being kind to my innate nature rather than judging it all the time. The questionnaire description was spot on: "Increase awareness of your own preferences by answering the questions below." Here are my answers to these questions. What would your answers be? I think it is good practice to do this exercise once in a while to know yourself a bit better. What are you AMAZING at? Where do you want to improve? I am good at research. I enjoy bei

Understanding Inconsistency in Azure Cosmos DB with TLA+

Image
This paper, by Finn Hackett, Joshua Rowe, Markus Kuppe, appeared in International Conference on Software Engineering 2023. It presents a specification of Azure Cosmos DB consistency behavior as exposed to the clients.  During my sabbatical at CosmosDB in 2018, I was involved in a specification of CosmosDB as exposed to the clients .  The nice thing about these specs is that they didn't need to model internal implementation but just captured the consistency semantics for clients precisely, rather than ambiguously like English explanations do. They aimed to answer the question of what kind of behavior should a client be able to witness while interacting with the service? The feedback at that time was that customers found this very useful. This 2023 paper improves on our preliminary specs from 2018. It has this great opening paragraph, which echoes the experience of everyone that has painstakingly specified a distributed/concurrent system behavior. Consistency guarantees for distribu

Amazon MemoryDB: A Fast and Durable Memory-First Cloud Database

Image
Key-value stores have simple semantics which make it cumbersome to perform complex operations involving multiple keys. Redis addresses this by providing low latency access to complex data structures like lists, sets, and hashes . Unfortunately, Redis lacks strong consistency guarantees and durability in the face of node failures, which limits its applicability beyond caching use cases. Amazon MemoryDB is a fully managed in-memory database service that leverages Redis's performance strengths while overcoming its durability limitations. It uses Redis as the in-memory data processing engine but offloads persistence to an AWS-internal transaction log service (internally known as the journal). This decoupled architecture provides in-memory performance with microsecond reads and single-digit millisecond writes, while ensuring across availability zone (AZ) durability, 99.99% availability, and strong consistency in the face of failures. This Sigmod 2024 paper describes MemoryDB's arch

Recent reads (Mar-Apr 2024)

I am late for my book reporting. These are the books I read mostly in March.  Going infinite (Michael Lewis, 2024) I had the highest of praise for Michael Lewis several times , and called him one of my favorite authors. I feel sorry to write a bad review for one of my favorite authors. Lewis faced significant backlash from people who questioned his impartiality in portraying Sam Bankman-Fried (SBF), raising concerns about the fairness of his depiction. My complaint is not that. I came to this book for captivating story-telling, which, unfortunately, falls short. This book pales in comparison to Lewis's past works. We are talking about the guy who made me read thick books about finance and stock trading worlds which I had zero interest in as if they were fast-paced thriller novels. So I was really looking forward to reading this book to get that reader's high, but I was disappointed. I am familiar with the blockchain and cryptocurrency world through my research and my internal

Linux OSSNA'24

Image
Last week I attended the Linux Foundation Open Software Summit North America (OSSNA). The TLA+ Conference (which I wrote about here) was colocated with Linux OSSNA as a pre-conference event, and having made the continental cross, it made sense to stay for the first two days of OSSNA. Linus Torvalds would speak on day 2, and there were several sessions about different aspects of Linux  ecosystem. I had read that in one of the previous years 3000 people attended OSSNA, so it seemed like a good opportunity to learn about current state of the opensource ecosystem. The embedded opensource summit was also colocated with this event as well. So how bad could that be? It was pretty bad. I regretted staying for the conference. It was nice seeing Linus speak live, and I checked that out of my nerd bucket list, but otherwise this was an underwhelming and boring two days.  I will be whining a bit in the next couple pages, before I share you my notes from the Linus session. But you may learn some

TLA+ conference 2024

Image
The TLA+ foundation got established last year in April as an independent, non-profit organization under the umbrella of the Linux Foundation. This April 15th, we colocated our annual TLA+ conference with the Linux OSSNA event to reach out to developers attending this event. Being under Linux Foundation, we didn't have to pay for the venue, but we paid $3K for professional recording of the sessions, so we can share the talks with the world at large. The videos will be available within a couple weeks.  Marc Brooker: 15 years of TLA+ Marc Brooker gave the keynote this Monday in TLA+ conference 2024. His keynote was a good mix of inspiration for formal methods and challenging formal methods to do better. Brooker is a VP/distinguished engineer at AWS. He has been on-call for 15 years, and were involved with many popular AWS services, including Aurora, Lambda, EC2, EBS, API gateway, iot, bedrock. He defines himself as mostly a practitioner, building large scale distributed systems. Br

OLTP Through the Looking Glass, and What We Found There

Image
This paper appeared in Sigmod 2008.  The goal of the paper is to rethink the Online Transaction Processing (OLTP) database architecture (which remained largely unchanged from late 1970s) and to test out a streamlined/stripped-out in-memory database architecture. To this end, the paper dissects where time is spent inside of a single node OLTP database system, and carefully measures the performance of various stripped down variants of that system. Motivation OLTP databases were optimized for the computer technology of the 1970s-80s.  Multi-threaded execution was adopted to allow other transactions to make progress while one transaction waits for data to be fetched from the slow, rotating disk drive. Locking-based concurrency control was adopted since conflicts were inevitable in these disk-bound transactions, and since the small dataset sizes and the workloads of the time resulted in extremely "hot" keys. The computing world now is quite a different environment. Fast SSDs and a

Chardonnay: Fast and General Datacenter Transactions for On-Disk Databases (OSDI'23)

Image
This paper appeared in OSDI'23. I haven't read the paper, and I am going by the two presentations I have seen of the paper, and by a quick skim of the paper. The paper has a cool idea, and I decided I should not procrastinate more waiting to read the paper in detail, and instead just capture and communicate this idea. Datacenter networking got very fast, much faster than SSD/disk I/O. That means, rather than two-phase commit (2PC), cold reads from SSD/disk is the dominant latency. Chardonnay banks on this trend in an interesting manner. It executes read/write transactions in "dry run" mode first, just to discover the readset and writeset and pin them to main memory. Then in "definitive" mode, it executes these transactions using two-phase-locking (2PL) and 2PC swiftly because everything (mostly) is in-memory. Having learned the readset and writeset during dry run also allows definitive execution achieve deadlock-freedom (and prevent aborts due to deadlock a

Popular posts from this blog

Learning about distributed systems: where to start?

Hints for Distributed Systems Design

Foundational distributed systems papers

Metastable failures in the wild

Scalable OLTP in the Cloud: What’s the BIG DEAL?

SIGMOD panel: Future of Database System Architectures

The end of a myth: Distributed transactions can scale

There is plenty of room at the bottom

Distributed Transactions at Scale in Amazon DynamoDB

Dude, where's my Emacs?