Dynamic Resource Load Balancing

I just finished writing all the content for my next book entitled  VMware ESX and ESXi in the Enterprise: Planning Deployment of Virtualization Servers (2nd Edition) which continues the discussion on Dynamic Resource Load Balancing (DRLB). DRLB is the balancing of virtualized workloads across all hosts within a cluster of virtualization hosts without human intervention. This is the ultimate goal of automation with respect to virtualization and therefore the cloud. In effect, with DRLB the virtualization administrators job has been simplified to configuration and trouble shooting leaving the virtual environment to load balance work loads on its own.
This is a lofty goal, and we are not quite there yet, but we are further along than when I wrote VMware ESX Server in the Enterprise: Planning and Securing Virtualization Servers when ESX 3.0 first shipped. But what has really changed, as I talk to people, much of the automation is still done by hand coding specifics to all environments. I think we are close, and take some of the real innovations as writ and move on from there.
All of the DRLB enhancements I have seen have come from VMware and its SDK environment. While many of the techniques for DRLB can also apply to Hyper-V, Xen.  Both the companies competing products are far behind where VMware is today with respect to automation and self-healing.
vSphere today provides DRLB using the following tools:

  • VMware Fault Tolerance (FT) – Automated failover for a VM with a single vCPU and a host of other limitations.
  • VMware High Availability (HA) – Reboot VMs on secondary hosts when a host fails for some reasons. The VMs are automatically started on a new host. If VM Monitoring is enabled, if a VM crashes, the VM is rebooted automatically.
  • VMware Dynamic Resource Scheduling (DRS) – DRS makes use of vMotion to automatically move VMs around a cluster based on memory or CPU contention but only if there is contention (not true load balancing of workloads)
  • VMware Enhanced vMotion Compatibility (EVC) – EVC enhancements make it easier to vMotion VMs from host to host when there are disparate CPUs within the same family of CPUs.
  • VMware Distributed Power Management (DPM) – DPM allows for hot-spare systems to take up the load if HA is required.
  • vSwitch Load Balancing (LB) – LB allows network traffic to be sent down different physical pipes.
  • Plus a host of new tools coming out soon – Which are cool but I cannot really discuss here, however while these WILL help with DRLB, there is still a long way to go.

There are many combinations of these tools that comprise a small subset of DRLB however, the real issue is whether or not we can actually code something generic enough to meet everyone’s needs? These are generic technologies, but I still see things that are missing from true DRLB, and this is where third parties like vKernel, Hyper-9, and others can help out to:

  • Automatically reconfigure VMs that can support Hotplug of memory and CPUs to be configured to do this. This unfortunately still requires a reboot.
  • Automatically hotplug/hot remove CPUs and hotplug memory at need
  • Reconfigure memory over-commit to use the best method possible which may require proper limits to be set.
  • Automatically adjust shares, reservations, and limits to gain the best performance  and disk, CPU, and Memory resource settings of critical applications.
  • Automatically adjust network traffic shaping controls to gain the best network performance of critical applications
  • Rebalance VMs across multiple LUNs as necessary via the storage vMotion mechanism (in other words Storage DRS).
  • Load Balance VMs not on contention but best use of physical resources according to security policies.

Any tool that does Capacity Planning has the necessary potential to automate many if not all of these functions. This would be a big win for any virtualization administrator but as of today it requires a significant amount of human intervention. Even tools such as those provided by vKernel and Hyper-9 require a human to make the final decision on whether to apply an optimization.  Automating optimizations of this type require you to fully understand your environment and what impact the changes will have now and into the future.  At the moment this aspect of DRLB is still an art form and requires a certain amount of comfort with the technology. What VMware has done is provide the basic tools of DRLB and more is coming.