By Ron Breault
It would be safe to say that an exposure of five long years to OpenStack would make an individual fairly familiar with the Live Migration feature, and it wouldn’t be wrong to say that this feature does exactly what it says and is good at it as well. But when given a deeper thought one would realise that OpenStack Live Migration is an extraordinary feature, especially with the latest enhancements that it has undergone, which we will be discussing during the course of this article.
Why is Live Migration addressed as extraordinary, you ask? Simply because of all that happens on the back end to make this function and because of what Live Migration enables. With just a few clicks of the mouse in Horizon, a VM running on one physical server can be automatically moved to another physical server. “Automatically” makes it sound simple, but there’s a whole lot of work going on to pull it off: replicating all the VM’s static and dynamic memory – while the VM is running; copying and establishing the VM’s complete network infrastructure on the target node; copying local block storage (if used) to the target node; and briefly pausing and then resuming the VM to complete the process. Depending on the size of the VM, the overall migration interval can be measured in seconds to minutes.
Live Migration enables a number of things that are important to the operation of an always on, production cloud. It enables the physical servers to be powered off gracefully and upgraded without the hosted virtual servers going offline. In a similar way, important host security updates or bug fixes can be delivered and deployed across servers without stopping any of the hosted VMs.
A recently issued report by the OpenStack Innovation Center titled “High Availability of Live Migration details a thorough study and testing on OpenStack’s Live Migration capability. The key line from the summary was, this statement: “In conclusion, we were able to prove that Live Migration works.”
We all would agree that Live Migration is getting even better than before. There are just two improvements that warrant particular attention:
• Performance Increase: Testing shows that Live Migration throughput has been significantly increased. We’ve seen throughput improved by as much as five times over prior releases! That kind of change can make a big difference with large VMs, resulting in a substantially reduced Live Migration interval. Faster migrations can mean reduced timing for planned maintenance activities – the operator simply spends less time waiting for Live Migrations to complete.
• Auto-Convergence: The new Auto-Convergence feature is an especially cool innovation. Some VMs can take a long time to migrate due to heavy memory write activities – as fast as OpenStack is able to copy the ‘dirty’ memory contents of the VM from the source to the target, the VM is able to ‘dirty’ its memory again. This means OpenStack might barely keep up, or in some cases, might never catch up – the VM is simply just too busy writing to memory. The new Auto-Converge feature changes that by intelligently slowing down the virtual CPU on the VM so that it can’t dirty its pages as quickly. Since its memory writes are slower, Live Migration proceeds without stalling and is able to stay ahead of the VM.
There are other interesting changes as well: the ability to dynamically update the maximum Live Migration interval (some VMs always take longer to migrate than others – this helps to avoid timeouts); periodic logging of Live Migration throughput and estimated downtime; reduced maximum default for timeouts from 800 seconds to 180 seconds to name a few.
With all these changes taken all together, Live Migration available in the market currently is the best delivered to date. If you manage critical infrastructure using the cloud, Live Migration is an indispensable feature.
(The author Ron Breault, Director, Product Management Wind River)