What is Considered Too Big for Virtualization?

A customer recently asked me, can we virtualize our Tier 1 App that receives 7Billion requests per day? My initial response was, on how many servers? Their answer was 15. This is quite a shocking set of numbers to consider. Add into this numbers such as 150K sessions per second, the need for a firewall, and sub-second response time and you end up with a few more shocking numbers. So could such workloads be virtualized? or is it too big for Virtualization?In effect, this becomes an architectural challenge, one worthy of those who choose to become a VCDX, but the answer is not that simple. Consider their current configuration:

15 Dual Quad-Core nodes with 24G of memory in each and 300G of SAS drive space.
They constantly run low on memory and drive space (such that most of their tooling is there to remove unneeded data after rolling it up)
Add to this 4 network adapters (all in use)
And a inter-machine communication rivaled only by High Performance Computing (HPC) based systems.

So if we were to virtualize as is, we would need the following amount of resource:

15*8 or 120 vCPUs
360G of Memory
3.6TB of Disk
Extremely low latency network with 4 vNICs per VM each using VMDirectPath

The low latency network components will limit the number of VMs per host quite a bit, as the VMs in effect need to bypass as much of the hypervisor as possible to achieve the required levels of network IO. In addition, we need to be cognizant of the security requirements of a firewall that currently handles upward towards 150K sessions per second. So we are either going to have to continue to use a single physical firewall or multiple virtual firewalls. Virtual firewalls are restricted to handling around 2K sessions per second (sometimes more, sometimes less). Which means to handle the workload required, we may need 75 load balanced virtual firewalls to keep up with the requirements of 7Bn requests per day.
So how could one design such a configuration today with modern hypervisor technology? Well, for starters, our limiting factors appear to be networking and the fact that we never want to overcommit memory or CPU. This is in effect a HPC application, so we need to remove any function that would impede CPU performance. This includes some of the major features of virtualization: CPU, Network, Storage, and memory overcommit. So what would we need to do? This type of solution is all about the workload, and when you think of this particular workload we know that each VM needs the following resources to run the current workload:

8 Cores
24G of Memory
4 low latency vNICs
300G of low latency storage

This implies that in order to virtualize just like we were in the physical world we would need a host that could handle minimally 1 of these VMs, but ideally 2-4 of these VMs, so hardware choices come into play. In addition, we have to consider the base requirements of running a hypervisor, that one pCPU is dedicated to the hypervisor. This leads us to the following conclusion about hardware that is required, just to match the current physical environment.

Dual Hex Core CPUs.
32G of memory
4 Intel VT-d/SRIOV pNICs for the VM
2 pNICs for the hypervisor (redundant management)

Which would leave 3 cores for other uses, such as for the a per hypervisor firewall. If we wanted to double up VMs per node or move to 4 VMs per node, our hardware requirements change drastically and increase in cost. The ideal consideration for this type of workload would be:

Quad 12-core CPUs
144G of Memory
20 Intel VT-d/SRIOV pNICs for the VM
2 pNICS for the hypervisor (redundant management)

Given these numbers, it may be better to use Cisco VM-FEX technology or 4 port Infiband cards with EoIB configurations which, as I discovered at the OpenStack Conference, is the way HPC loads are being considered by researchers. However, this has gotten me thinking more about the workload than about the technology required to virtualize it. Would it not be better to build up a SaaS solution and design the code specifically to run within a cloud. Granted, given current technology, it would require many more boxes than 15 or even 30 to run.
So the real question is, why are we shoehorning into virtual and cloud environments our physical definition of a workload. Could we not, instead, redesign the workloads for the cloud? At the same time we could redefine the concept of HPC to be more cloud-aware? Which will require us to understand how other tenants within a cloud impact latency for HPC.
Even so, the limiting factor for virtualizing such workloads or placing them into the cloud is ultimately the cost to do so. But anything that impacts the bottom line will impact the ability to place such workloads into the cloud.

3 replies on “What is Considered Too Big for Virtualization?”

Alexandru says:

April 25, 2012 at 3:01 pm

So… which decision was taken? Go with 15 existing hosts, add more hosts, upgrade existing hosts or cloud?
And if you selected local solution could you please show the diagram how does it look? I’m excited to see the solution on paper… =)
Thanks.
1. admin says:
  
  April 26, 2012 at 7:23 am
  
  Hello Alexandru,
  For this customer they chose to continue to use physical hardware based purely on the cost. It costs far less to buy pizza boxes than it does to move into the cloud. However, there is work going on to eventually go the SaaS route (either convert what they have into a SaaS) or use some other tools to get there. It is a very exciting time for them. Lots of choices to be made.
  Best regards,
  Edward
Frank Lo says:

January 2, 2014 at 2:03 am

There are real-time cluster systems out there to handle thousands transactions per second for millions users in wireless roaming and registration or prepaid service. These are special purpose system for solving specific problem. There is no need for “virtualization” for these environment.
“Virtualization” should be considered in a heterogeneous environment.

Comments are closed.