Two paper summaries on scheduling in heterogenous computing

- November 24, 2017

Today, I have two short paper summaries from HotCloud'17 on heterogenous computing.

Heterogeneous GPU reallocation

This paper appeared in HOTCloud'17, and the authors are James Gleeson and Eyal de Lara, University of Toronto.

It looks like they have developed a GPU virtualization tool called Crane recently. "General purpose GPU (GPGPU) computing in virtualized environments leverages PCI passthrough to achieve GPU performance comparable to bare-metal execution. However, GPU passthrough prevents service administrators from performing virtual machine migration between physical hosts. Crane is a new technique for virtualizing OpenCL-based GPGPU computing that achieves within 5.25% of passthrough GPU performance while supporting VM migration. Crane interposes a virtualization-aware OpenCL library that makes it possible to reclaim and subsequently reassign physical GPUs to a VM without terminating the guest or its applications. Crane also enables continued GPU operation while the VM is undergoing live migration by transparently switching between GPU passthrough operation and API remoting." Here is the video of conference presentation at SYSTOR'17.

The HotCloud paper is a position paper motivating the use of the Crane tool. They say GPU workloads underutilize GPU device memory (using less than 37%). So there is an opportunity to reallocate heterogenous GPUs within existing VMs, upgrading and downgrading them. They employ Crane to support reallocation of heterogenous GPU models within a VM and also by allowing live migration of VM to a new host.

This may be something that can complement the Zorua GPU virtualization approach.

The HCl scheduler: Going all-in on Heterogeneity

This paper appeared in HOTCloud'17 and is authored by Michael Kaufmann, IBM Research Zurich, Karlsruhe Institute of Technology; Kornilios Kourtis, IBM Research Zurich.

The paper builds a new scheduler HCl that tries to fully model and support resource heterogeneity. HCl operates on 2 annotated DAGs: one representing the Spark application, and one representing the available hardware resources.

HCl uses task runtime estimations for each resource to find the optimal mapping of a task, and estimate future availability of resources. It also tries to estimate the cost of data transfer and includes this as a factor in where to schedule tasks. It uses lookahead scheduling to place parent tasks closer to their children's optimal resource instead of only placing child tasks closer to their parents. Finally, it performs path balancing and tries to balance the execution times of converging paths in a DAG such that the waiting time for tasks at a convergence points is minimized.

This is a complicated scheduling algorithm, and would likely have problems scaling as the size of tasks and heterogenous resources grow. Scheduling needs to be done quickly, and these calculations will take time. The evaluations don't test at scale.

There is not much detail about HCl implementation in the paper, but it looks like HCl is not a Mesos framework, it is a separate scheduler. It must be more like Borg, where the scheduler gets to see not only resources but also all the tasks.