HPTS day 2, part 1

Continuing with our series. This is day 2, Tuesday morning. It had two session on Hardware. I wasn't exaggerating when I said hardware/software codesign was all the buzz at HPTS this year. It looks like future databases will be more tightly integrated with hardware capabilities and more responsive to user needs.

You may have gotten a bit tired of all these technical paper summaries, so let's start with a preamble on the venue.

HPTS has always been held at Asilomar, Pacific Grove. It is very close to Monterey Bay. The conference grounds is old, the Chapel being more than a century old. The rooms are built of wood, and walls are paper thin, with no sound insulation. What do you say, the place has a lot of character!

Joking aside, it grows on you. It is clean and the food is good. It is on the beach, which has surreal vegetation. Plants and occasionally trees grow literally on the white sands. The trees have all bent down tracing the blow of the strong winds from the sea to the inlands.  

During our stay, it was mostly overcast skies and in the 60s. On Tuesday noon, it was fantastic and sunny for a couple of hours, and I took this picture then.

My room was very close to the workshop location, the Chapel. That is the building you see in this photo I took from my room's balcony. You can also see the ocean. I used my room only for sleeping. Each day I left at 7:30am for breakfast, and went back again at midnight. There were so many interesting discussions to be had, and the nice view from the room was wasted on me. 


Session 5: Hardware 


Keynote: In Computer Architecture, We Don’t Change the Questions, We Change the Answers - Mark Hill (University of Wisconsin-Madison and Microsoft)

Mark Hill's keynote kicked off the second day. Mark had been a professor of computer architecture at Wisconsin-Madison for more than 3 decades, and also worked for Microsoft Azure. 

Computer architects transform components into systems. This includes designing datacenters. The computing stack has been traditionally layered, and this worked well for many decades. but now layer experts should work together because performance scaling gains slowed down Dennard scaling started fading (2x transistors fo 0.5 power) in 2000-2010 period.

As the figure shows, with the early computer architecture designs of IBM and Digital, we were seeing vertical integration. Then we got comfy with using layering for many decades. And now the trend is coming back to veritcal integration again.

Mark reminisced on an interaction with a fellow professor about qualifier exams for PhD students. He wanted to updated them, and asked his colleague, how do we update them? His colleague said: we don't change the questions, Mark, we change the answers.

This is funny (because professors are too lazy to change qualifying exam questions) but also profound indeed. The eternal questions in the field revolve around building cost-effective, performant, and reliable systems: computers, memory, storage, networks, and distributed systems. It's the job of architects to recognize how changing applications and technologies affect these answers.

In discussing compute, Mark focused on the rise of accelerators, particularly in deep learning. He acknowledged that Gen AI is at high point of hype, next we go into trough of disillusonment, followed by slope of enlightenment. Yet he predicted the widespread deployment of generative AI across various devices, from wearables to the cloud. He cited Jevon's paradox, and said that if we make inference efficient, it will be used in more places. He also highlighted new opportunities in Compute Express Link (CXL) and Universal Chiplet Interconnect Express (UCIE), noting the shift from monolithic chips to chiplets. He mentioned fab decks that stack 3d, and called 3d as the new frontier.

Regarding memory, Mark discussed the potential of two-tier memory systems enabled by CXL Type 3. He discussed emerging memory technologies and the surprising longevity of memory components. Hill emphasized the importance of efficient memory management, referencing his upcoming OSDI 2024 paper on managing memory tiers with CXL. He also drew attention to processing in memory (PIM), and identified moving comput to vast data in memory as a high pain, high gain opportunity.

For networking, Mark touched on Ultra Ethernet, which specializes Ethernet for datacenter and ML workloads. He noted trends like packet spraying, relaxed ordering, and phase-aware congestion control. He also mentioned the movement of optics closer to hosts, now at the Top of Rack level.

He briefly addressed security concerns, mentioning confidential compute. He also discussed the challenges of cooling as datacenters evolve into gigantic supercomputers. As an upcoming challenge he highlighted making computing more sustainable in a cost-effective manner. 

During the Q&A session, Mark expressed skepticism about rack-scale extension of memory but affirmed the potential of tiered memory through CXL. He explained that while some claim CXL is meant to disaggregate memory in the cloud, he believes pooling will be slow, requiring outstanding transactions to cover latency.


We really move YOUR tail for you - David Lucey (Salesforce)

David discussed the evolving needs of network infrastructure, particularly in relation to GPU workloads. He highlighted that GPUs have unique network requirements: they generate low entropy traffic (few talkers), are bursty in nature, and are both loss and latency intolerant.

David emphasized the shift towards Clos networks in data center architectures. He noted that circuit switching, once considered outdated, is regaining relevance due to its ability to provide guarantees rather than just likelihoods of performance. In modern data centers, an availability zone (AZ) typically consists of one or more Clos networks, with connections between these networks being oversubscribed and augmented based on demand signals.

A key point in David's presentation was the concept of Ultra Ethernet packet spraying. This approach prioritizes latency reduction over bandwidth efficiency, following the philosophy that abundant bandwidth can be "squandered" to save precious latency. Minimizing latency and jitter are crucial for consistent performance in large-scale AI deployments.


GPU Databases -- The New Modality of Data Analytics - Bobbi Yogatama (University of Wisconsin-Madison)

Bobbi presented on GPU databases as a new modality in data analytics. He argued that GPUs are superior to CPUs for SQL analytics due to their higher computational power and memory bandwidth. He noted that GPU peak performance and memory bandwidth have grown fourfold from 2020 to 2023.

He introduced Sirius, a multi-GPU execution engine for DuckDB, which achieves speeds up to 60 times faster than traditional CPU-based solutions. He provided a live demo of a single GPU version and mentioned ongoing work on multi-GPU acceleration, to be presented at VLDB 2024. This is still preliminary work, supporting only certain SQL operations, but it is and interesting idea. It is worth watching this space.


Session 6: Hardware (cont’d) and Embedded DBs and the Edge

Smart memories for vectorized data analytics - Helena Caminal (Google)

Helena presented on smart memories for vectorized data analytics. She emphasized that while SIMD (Single Instruction, Multiple Data) is important, it's not the only game in town. She pointed out that database vectors don't always align perfectly with hardware vectors, necessitating careful mapping to vector instructions. She outlined a spectrum of computation capabilities, ranging from CPU with SIMD to compute-capable storage, highlighting the potential for optimization at various levels of the memory hierarchy.


Object Storage and In-Process Databases are Changing Distributed Systems - Colin Breck (Tesla)

Unfortunately, I missed most of this talk as I had to step out for some phone calls. I think Colin presented on the topics he outlined in his blog.


60 Frames Per Second Cloud Databases - Peter Boncz (MotherDuck)

Peter presented DuckDB WASM, a version of DuckDB running in the browser. He demonstrated its impressive interactive data analytics capabilities, achieving latencies typically impossible for cloud-based databases.

Peter proposed a futuristic scenario of DuckDB embedded in VR headsets, generating queries at 60fps. He introduced the idea of analytical SQL systems as linkable libraries, supporting data science with hybrid/dual query processing. This approach allows processing on both server and client, enabling low-latency responses even on edge devices like cars. Client-side processing with an embedding database like DuckDB is an intriguing idea.

Comments

Popular posts from this blog

Hints for Distributed Systems Design

Learning about distributed systems: where to start?

Making database systems usable

Looming Liability Machines (LLMs)

Advice to the young

Foundational distributed systems papers

Distributed Transactions at Scale in Amazon DynamoDB

Linearizability: A Correctness Condition for Concurrent Objects

Understanding the Performance Implications of Storage-Disaggregated Databases

Designing Data Intensive Applications (DDIA) Book