Posts

Showing posts from September, 2022

TLA+ Conference and StrangeLoop 2022

Image
Last week I traveled to the Strange Loop conference. I chaired the TLA+ conference held as part of pre-conf events. I will write about that below. Strange Loop was my first developer's conference. I have mixed feelings. I will write about that as well. Travel to the conference was eventless via Delta connection through Atlanta. Atlanta to Sait Louis gave me one hour on the clock because I moved from Eastern to Central time zone. I had gotten exit row seats, so I had ample legroom. I watched TopGun in the first flight (which was good), and watched half of Jurassic World Dominion (which was very lame, I thought it was a parody). I arrived around 3pmish and dropped my backpack (I travel light) at my hotel room. I stayed at Drury Inn, which is right next to the conference hotel, Union Inn Station Hotel. The hotel was decent, they had decent hot breakfast. I walked to the Gateway Arch, and wondered around till the TLA+ Conference dinner. It was very hot at 95 F, but the temperature dro...

Amazon Redshift Re-invented

Image
This paper (SIGMOD'22) discusses the evolution of Amazon Redshift since 2015 when it launched. Redshift is a cloud data warehouse. Data warehouse basically means a place where analysis/querying/reporting is done for shitload of data coming from multiple sources. Tens of thousands of customers use Redshift to process Exabytes of data daily. Redshift is fully managed to make it simple and cost-effective to efficiently analyze BIG data. The concept art in this blog post are creations of Stable Diffusion. Since its launch in 2015, the use cases for Redshift have evolved, and the teams focused on meeting the following customer needs High-performance execution of complex analytical queries using innovative query execution via C++ code generation Enabling fast scalability in response to changing workloads by disaggregating storage and compute layers Ease of use via incorporating machine learning based autonomics Seamless integration with the AWS ecosystem and other AWS purpose built serv...

SQLite: Past, Present, and Future

Image
SQLite is the most widely deployed database engine (or likely even software of any type) in existence. It is found in nearly every smartphone (iOS and Android), computer, web browser, television, and automobile. There are likely over one trillion SQLite databases in active use. (If you are on a Mac laptop, you can open a terminal, type "sqlite3", and start conversing with the SQLite database engine using SQL.) SQLite is a single node and (mostly) single threaded online transaction processing (OLTP) database. It has an in-process/embbedded design, and a standalone (no dependencies) codebase ...a single C library consisting of 150K lines of code.  With all features enabled, the compiled library size can be less than 750 KiB. Yet, SQLite can support tens of thousands of transactions per second. Due to its reliability, SQLite is used in mission-critical applications such as flight software. There are over 600 lines of test code for every line of code in SQLite. SQLite is truly t...

Memory-Optimized Multi-Version Concurrency Control for Disk-Based Database Systems

Image
This paper is from VLDB 2022. It takes the MVCC system developed in Hyper and adopts it to a memory-optimized disk-based (not the spinning type, SSDs really) database system so that the system can handle a very large database that won't fit in RAM while achieving very high performance.  Motivation and introduction As in-memory databases were conceived, it was assumed that main memory sizes would rise in accord with the amount of data in need of processing. But the increase in RAM sizes leveled off, reaching a plateau of at most a few TB. (This is likely due to the attractive price/performance benefits of SSDs.) Pure in-memory database systems offer outstanding performance but degrade heavily if the working set does not fit into DRAM. On the other hand, traditional disk-based database systems, being designed for a different era, fail to take advantage of modern hardware with plenty enough (but still lacking full coverage) RAM and almost limitless SSDs (which are awesome for random ...

Popular posts from this blog

Hints for Distributed Systems Design

Learning about distributed systems: where to start?

Making database systems usable

Looming Liability Machines (LLMs)

Foundational distributed systems papers

Advice to the young

Linearizability: A Correctness Condition for Concurrent Objects

Understanding the Performance Implications of Storage-Disaggregated Databases

Scalable OLTP in the Cloud: What’s the BIG DEAL?

Designing Data Intensive Applications (DDIA) Book