40,000 Firewalls! Help Please!?

While at VMworld I was suddenly hit with a blast of heat generated by the 40,000 VMs running within the VMworld Datacenter of 150 Cisco UCS blades or so. This got me thinking about how would VMsafe fit into this environment and therefore about real virtualization security within the massive quantity of virtual machines possible within a multi-tenant cloud environment. If you use VMsafe within this environment there would be at least 40,000 VMsafe firewalls. If it was expanded to the full load of virtual NICs possible per VM there could be upwards of 400,000 virtual firewalls possible! At this point my head started to spin! I asked this same question on the Virtualization Security Podcast, which I host, and the panel was equally impressed with the numbers. So what is the solution?
First a few numbers with respect to VMware vSphereTM that need to be made available so that we can start this discussion. First as we have seen in the VMsafe – Vender Implementations at VMworld that a VMsafe based firewall sits between the vSwitch and the vNIC. For the Cisco Nexus 1000v this implies that there is a slight bump in the wire between the Nexus 1000v and the vNIC attached to a specific port on the vSwitch. For the VMware vSwitch this slight bump in the wire sits between the portgroup port and the vNIC.
The numbers we need are that there can be up to 10 vNICs per VM with up to 320 VMs per VMware vSphereTM host. Some simple math gives us a maximum number of 3200 vNICs per host. Granted we will hopefully never actually see this number. Now, there were close to 40,000 VMs within the VMworld 2009 conference datacenter, some most likely had more than 1 vNIC, and most had only one vNIC. You can see that there were close to 40,000 vNICs available when all VMs were turned on. If there were 10 vNICs per VM that number is now 400,000 vNICs or 400,000 bumps in the wire!
Okay, so we take 512 blades add 320 VMs to each blade with 10 vNICs each and you get a whopping 1,638,400 vNICs possible! This means there could actually be maximally 1,638,400 VMsafe firewalls active!   My head is starting to spin with the implications of managing this. The possible problems that arise!
How can anyone manage, using current tools, that many VMsafe based firewalls. Can you actually visualize that many appliances, VMs, vApps, etc. within any one tool?  40,000 VMs is hard to envision managing, but now add to that VMsafe appliances and management issues will arise. This does not consider the port security inherent within the Cisco Nexus 1000V or the dvFilter capability within the Virtual Distributed Switch, which add their own management layers to this problem. 40,000 or 1,638,400 ports and vNICs, I think the management problems dealing with security are the same.
The solution is to use one of the available VMsafe management tools and to apply codified security policies in a hierarchical method. If each of these 40,000 vNICs shared the same codified security policy then there would only be one policy to implement 40,000 times. Well actually once per host involved or in the case of the VMworld conference data center, that is 512 times. That is more reasonable to me. However, if there are 10 codified policies, then we need to implement this 5120 times. 10 codified policies per host.
Wait, did I not say there are 40,000 vNICs? How are we now done to just 512 VMsafe firewalls instances? Because VMsafe firewalls hook into the hypervisor directly and are called for every packet transferring to a vNIC. The real management headaches occur when there are many codified security policies. Not only will there be management issues, but I envision there would be severe performance issues as each of the VMsafe kernel modules tries to traverse each codified security policy to determine what to do with the packet of data.
How will this be an issue? Think of multi-tenant cloud providers where they need to keep company A from company B’s data and visa vera. Expand that to 40,000 tenants. You may then have 40,000 codified security policies that in their entirety may live on a single VMware vSphereTM host? Why is this case, because on of those 40,000 tenants could easily end up on any given host given the dynamic nature of vSphereTM servers. So now we have management and performance issue.
Is there any help out there to solve this problem? Yes, two vendors have products now that codify security policies for use with VMsafe and more are on their way. Those are Altor Network’s VF3.0 and Reflex Systems VMC w/vTrust. While they approach management differently they both provide the necessary framework that will aid in managing large array of systems. Altor’s is definitely not as graphical as Reflex’s VMC but both are strong contenders and should be considered if you are moving to VMware vSphereTM and wish to use VMsafe to improve your overall virtualization security.
Even so, it is important with 40,000 tenants that you split the problem into something easily manageable, that will NOT impact performance. To do this, you may need to manage the system differently and tag systems for specific tenants instead of allowing every tenant to move to any hosts. Such a tool exists today from HyTrust. HyTrust will allow you to create tags on all your objects such as VMs, Networks, and Hosts and only allow tagged VMs onto Hosts and Networks tagged with the same tag ID (which you create).
It may be necessary, therefore, to combine the currently available virtualization security products so that you can divide the problem into manageable pieces that can be managed by other products. One solution is to use HyTrust in conjunction with tools from either Altor Networks VF3.0 or Reflex Systems VMC with vTrust.