Live migration of virtual machines

This is a 2005 paper by the same group at Cambridge that developed the Xen virtualization system discussed in the previous post. This review will be short since it is only based on the notes I took while listening to the paper presentation in the seminar.

Live virtual machine (VM) migration is the relocation of a VM from one machine to another while its applications continue to execute. The motivation for live migration is to perform load management across machines and even across data centers. Another motivation is fault-tolerance, if the original host needs to go down due to faults, maintenance, or power outage, migration can provide availability of the applications running on the hosted VMs.

The challenges with live migration of VMs is to minimize the downtime and to provide a seamless relocation of the VM so the VM can continue to operate normally in its new address.

One strawman for migrating the contents of VM memory is the stop-and-copy approach. This approach leads to a long down-time and hence not compatible with our objective. Another strawman for migrating memory is on-demand paging approach. Here we first freeze the VM at the source machine, copy minimal execution context to the target machine, restart the VM at the target, and pull memory contents from the source as and when needed. The drawback here is the slow startup of the VM at the target.

The approach that is taken in this paper is a hybrid of the two, and is called the pre-copy migration approach. We DON'T freeze the VM at the source machine, but start copying the VM'S pseudo-physical memory contents to the target machine over multiple iterations. Then when there is little dirty memory remaining that is still not copied, we do a short stop and copy. This way we get the best of the two worlds by avoid the drawbacks of both.

As for the networking of the VM, we need to make sure that the migrating VM will include all protocol state and will carry its IP address. We assume that the source and destination exist behind a single switched LAN, so by generating an unsolicited ARP-reply from the migrated host we can advertise that the IP moved to a new location.

The paper does not address the problem of migrating local-disk storage. One approach suggested for this is to use a network attached storage that is uniformly accessible from all host machines. Another approach can be to do replication of the local disk at other machines and choosing the target among the machines where the disk is replicated.

Comments

Popular posts from this blog

Hints for Distributed Systems Design

Learning about distributed systems: where to start?

Making database systems usable

Looming Liability Machines (LLMs)

Foundational distributed systems papers

Advice to the young

Linearizability: A Correctness Condition for Concurrent Objects

Scalable OLTP in the Cloud: What’s the BIG DEAL?

Understanding the Performance Implications of Storage-Disaggregated Databases

Designing Data Intensive Applications (DDIA) Book