Disaggregated Database Management Systems
This paper is based on a panel discussion from the TPC Technology Conference 2022. It surveys how cloud hardware and software trends are reshaping database system architecture around the idea of disaggregation.
For me, the core action is in Section 4: Disaggregated Database Management Systems. Here the paper discusses three case studies (Google AlloyDB, Rockset, and Nova-LSM) to give a taste of the software side of the movement. Of course there are many more. You can find Aurora, Socrates, and Taurus, and TaurusMM reviews in my blog. In addition, Amazon DSQL (which I worked on) is worth discussing soon. I’ll also revisit the PolarDB series of papers, which trace a fascinating arc from active log-replay storage toward simpler, compute-driven designs. Alibaba has been prolific in this space, but the direction they are ultimately advocating remains muddled across publications, which reflect conflicting goals/priorities.
AlloyDB
AlloyDB extends PostgreSQL with compute–storage disaggregation and HTAP support. Figure 4 in the paper shows its layered design: the primary node (RW node) handles writes, a set of read pool replicas (RO nodes) provide scalable reads, and a shared distributed storage engine persists data in Google's Colossus file system. The read pools can be elastically scaled up or down with no data movement, because the data lives in disaggregated storage.
AlloyDB's hybrid nature enables it ot combine transactional and analytical processing by maintaining both a row cache and a pluggable columnar engine. The columnar engine vectorizes execution and automatically converts hot data into columnar format when it benefits analytic queries.
Under the covers, the database storage engine materializes pages from logs and stores blocks on Colossus. Logs are written to regional log storage; log-processing servers (LPS) continuously replay and materialize pages in the zones where compute nodes run. Durability and availability are decoupled: the logs are durable in regional log storage, while LPS workers ensure the blocks are always available near the compute.
This is a nice example of disaggregation serving elasticity and performance: compute scales independently and HTAP workloads benefit from a unified, multi-format cache hierarchy.
Rockset
Rockset seems to be a poster child for disaggregation in real-time analytics. Rockset's architecture follows the Aggregator–Leaf–Tailer (ALT) pattern (Figure 6). ALT separates compute for writes (Tailers), compute for reads (Aggregators and Leaves), and storage. Tailers fetch new data from sources such as Kafka or S3. Leaves index that data into multiple index types (columnar, inverted, geo, document). Aggregators then run SQL queries on top of those indexes, scaling horizontally to serve high-concurrency, low-latency workloads.
The key insight is that real-time analytics demands strict isolation between writes and reads. Ingest bursts must not impact query latencies. Disaggregation makes that possible by letting each tier scale independently: more Tailers when ingest load spikes, more Aggregators when query demand surges, and more Leaves as data volume grows.
Rockset also shows why LSM-style storage engines (and append-only logs in general) are natural fits for disaggregation. RocksDB-Cloud never mutates SST files after creation. All SSTs are immutable and stored in cloud object stores like S3. This makes them safely shareable across servers. A compaction job can be sent from one server to another: server A hands the job to a stateless compute node B, which fetches SSTs, merges them, writes new SSTs to S3, and returns control. Storage and compaction compute are fully decoupled.
Memory Disaggregation
The panel also discussed disaggregated memory as an emerging frontier. Today's datacenters waste over half their DRAM capacity due to static provisioning. It's shocking, no? RDMA-based systems like Redy have shown that remote memory can be used elastically to extend caches. The paper looks ahead to CXL as the next step as its coherent memory fabric can make remote memory behave like local. CXL promises fine-grained sharing and coherence.
Hardware Disaggregation
On the hardware side, the paper surveys how storage, GPUs, and memory are being split from servers and accessed via high-speed fabrics. An interesting case study here is Fungible's DPU-based approach. The DPU offloads data-centric tasks (networking, storage, security) from CPUs, enabling server cores to focus solely on application logic. In a way, the DPU is a hardware embodiment of disaggregation.
Future Directions
Disaggregated databases are already here. Yet there are still many open questions.
- How do we automatically assemble microservice DBMSs on demand, choosing the right compute, memory, and storage tiers for a workload?
- How do we co-design software and hardware across fabrics like CXL to avoid data movement while preserving performance isolation?
- How do we verify the correctness of such dynamic compositions?
- Can a DBMS learn to reconfigure itself (rebalancing compute and storage) to stay optimal under changing workload patterns?
- How do we deal fault-tolerance availability issues and develop new distributed systems protocols that exploit opportunities that open up in the disaggregated model?
As Swami said in Sigmod 2023 Panel "The customer value is here, and the technical problems will be solved in time. Thanks to the complexities of disaggregation problems, every database/systems assistant professor is going to get tenure figuring how to solve them."


Comments