Can a Client–Server Cache Tango Accelerate Disaggregated Storage?
This paper from HotStorage'25 presents OrcaCache, a design proposal for a coordinated caching framework tailored to disaggregated storage systems. In a disaggregated architecture, compute and storage resources are physically separated and connected via high-speed networks. These became increasingly common in modern data centers as they enable flexible resource scaling and improved fault isolation. (Follow the money as they say!) But accessing remote storage introduces serious latency and efficiency challenges. The paper positions OrcaCache as a solution to mitigate these challenges by orchestrating caching logic across clients and servers. Important note: in the paper's terminology the server means the storage node, and the client means the compute node.
As we did last week for another paper, Aleksey and I live-recorded our reading/discussion of this paper. We do this to teach the thought-process and mechanics of how experts read papers in real time. Check our discussion video below (please listen at 1.5x, I sound less horrible at that speed). The paper I annotated during our discussion is also available here.
The problem
Caching plays a crucial role in reducing the overheads of disaggregated storage, but the paper claims that current strategies (client-local caching, server-only caching, and independent client-server caching) fall short. Client-local caching is simple and avoids server overhead but underutilizes memory on the server. Server-only caching can reduce backend I/O pressure but comes at the cost of network round-trips and significant server CPU load. Independent client-server caching combines the two but lacks coordination between the caches, leading to data duplication, inefficient eviction and prefetching policies, and causes fairness issues in multi-client environments.
The proposed design
OrcaCache proposes to address these shortcomings by shifting the cache index and coordination responsibilities to the client side. Clients maintain a global view of the cache and communicate directly with the server-side cache using RDMA, which enables bypassing the server CPU in the common case. Server-side components are minimized to a daemon that tracks resource usage and allocates memory based on fairness and pressure.
Discussion
OrcaCache stops short of addressing the core system-level challenges in a realistic multi-client deployment. A single server single client setup is used in experiments in Figure 1, and also for most of the description in the paper. The paper's solution to dealing with multiple clients is to use a separate namespace for each client, but then at the server-side this uses up a lot of resources, cause duplication of cached items. There is no mutual benefit and collaboration among clients in this setup.
The paper also mentions how clients could interact with a server-side daemon, how RDMA-based lookups and cache updates would be issued, and how resources might be allocated based on monitored pressure, but many of these mechanisms remain speculative. The authors mention about flexible eviction and prefetching but do not explore the complexity of maintaining consistency or fairness across diverse workloads. AI/ML workloads mentioned/alluded but not really tested in the paper.
In the end, the paper's contribution lies more in reopening a line of thought from 1990s cooperative caching and global memory management research: how to make cache coherence across disaggregated compute and storage both efficient and scalable. The idea OrcaCache seems to lean on is that rather than burden the server, it makes the client responsible for coordination, enabled by fast networks and abundant memory.
Comments