Posts

TLA+ Conference and StrangeLoop 2022

Image
Last week I traveled to the Strange Loop conference. I chaired the TLA+ conference held as part of pre-conf events. I will write about that below. Strange Loop was my first developer's conference. I have mixed feelings. I will write about that as well. Travel to the conference was eventless via Delta connection through Atlanta. Atlanta to Sait Louis gave me one hour on the clock because I moved from Eastern to Central time zone. I had gotten exit row seats, so I had ample legroom. I watched TopGun in the first flight (which was good), and watched half of Jurassic World Dominion (which was very lame, I thought it was a parody). I arrived around 3pmish and dropped my backpack (I travel light) at my hotel room. I stayed at Drury Inn, which is right next to the conference hotel, Union Inn Station Hotel. The hotel was decent, they had decent hot breakfast. I walked to the Gateway Arch, and wondered around till the TLA+ Conference dinner. It was very hot at 95 F, but the temperature dro

Amazon Redshift Re-invented

Image
This paper (SIGMOD'22) discusses the evolution of Amazon Redshift since 2015 when it launched. Redshift is a cloud data warehouse. Data warehouse basically means a place where analysis/querying/reporting is done for shitload of data coming from multiple sources. Tens of thousands of customers use Redshift to process Exabytes of data daily. Redshift is fully managed to make it simple and cost-effective to efficiently analyze BIG data. The concept art in this blog post are creations of Stable Diffusion. Since its launch in 2015, the use cases for Redshift have evolved, and the teams focused on meeting the following customer needs High-performance execution of complex analytical queries using innovative query execution via C++ code generation Enabling fast scalability in response to changing workloads by disaggregating storage and compute layers Ease of use via incorporating machine learning based autonomics Seamless integration with the AWS ecosystem and other AWS purpose built serv

SQLite: Past, Present, and Future

Image
SQLite is the most widely deployed database engine (or likely even software of any type) in existence. It is found in nearly every smartphone (iOS and Android), computer, web browser, television, and automobile. There are likely over one trillion SQLite databases in active use. (If you are on a Mac laptop, you can open a terminal, type "sqlite3", and start conversing with the SQLite database engine using SQL.) SQLite is a single node and (mostly) single threaded online transaction processing (OLTP) database. It has an in-process/embbedded design, and a standalone (no dependencies) codebase ...a single C library consisting of 150K lines of code.  With all features enabled, the compiled library size can be less than 750 KiB. Yet, SQLite can support tens of thousands of transactions per second. Due to its reliability, SQLite is used in mission-critical applications such as flight software. There are over 600 lines of test code for every line of code in SQLite. SQLite is truly t

Memory-Optimized Multi-Version Concurrency Control for Disk-Based Database Systems

Image
This paper is from VLDB 2022. It takes the MVCC system developed in Hyper and adopts it to a memory-optimized disk-based (not the spinning type, SSDs really) database system so that the system can handle a very large database that won't fit in RAM while achieving very high performance.  Motivation and introduction As in-memory databases were conceived, it was assumed that main memory sizes would rise in accord with the amount of data in need of processing. But the increase in RAM sizes leveled off, reaching a plateau of at most a few TB. (This is likely due to the attractive price/performance benefits of SSDs.) Pure in-memory database systems offer outstanding performance but degrade heavily if the working set does not fit into DRAM. On the other hand, traditional disk-based database systems, being designed for a different era, fail to take advantage of modern hardware with plenty enough (but still lacking full coverage) RAM and almost limitless SSDs (which are awesome for random

Socrates: The New SQL Server in the Cloud (Sigmod 2019)

Image
This paper (Sigmod 2019) presents Socrates, the database-as-a-service (DBaaS) architecture of the  Azure SQL DB Hyperscale. Deploying a DBaaS in the cloud requires an architecture that is cost-effective yet performant. An idea that works well is to decompose/disaggregate the functionality of a database into two as compute services (e.g., transaction processing) and storage services (e.g., checkpointing and recovery). The first commercial system that adopted this idea is Amazon Aurora. The Socrates design adopts the separation of compute from storage as it has been proven useful. In addition, Socrates separates database log from storage and treats the log as a first-class citizen. Separating the log and storage tiers disentangles durability (implemented by the log) and availability (implemented by the storage tier). This separation yields significant benefits: in contrast to availability, durability does not require copies in fast storage, in contrast to durability, availability does n

Fast Serializable Multi-Version Concurrency Control for Main-Memory Database Systems

Image
This paper from Sigmod 2015 describes the addition of Multi-Version Concurrency Control (MVCC) to the Hyper database, and discusses at length how they implement serializability efficiently with little overhead compared to snapshot isolation. Coming only two years later, this paper seems like a response/one-up-manship to the Hekaton paper . In the paper, you can see at many places statements like "in constrast to/ unlike/ as in/ Hekaton" phrase. In a way, that is the biggest compliment a paper can pay to another paper. Hyper is also an in-memory database as in Hekaton. The paper says that Hyper is like Hekaton in that, in order to preserve scan performance, new transactions are not allocated a separate place. The most uptodate version of the database is kept all in the same place. But, you can reach older versions of a record (that are still in use by active transactions) by working backwards from the record using the delta versions maintained. The biggest improvement in Hyp

Popular posts from this blog

Graviton2 and Graviton3

Foundational distributed systems papers

Learning a technical subject

SQLite: Past, Present, and Future

Learning about distributed systems: where to start?

Strict-serializability, but at what cost, for what purpose?

CockroachDB: The Resilient Geo-Distributed SQL Database

The Seattle Report on Database Research (2022)

Amazon Aurora: Design Considerations + On Avoiding Distributed Consensus for I/Os, Commits, and Membership Changes

Warp: Lightweight Multi-Key Transactions for Key-Value Stores