They say history tends to repeat itself, I am going to take that statement in another direction and apply that towards technology. Virtualization Technology Practices and Tendencies tend to flip flop over time. That in itself is a pretty general statement but I saw this video on YouTube 16 Core Processor: Upgrade from AMD Opteron 6100 Series to Upcoming “Interlagos”” and this really got me thinking about one of the very first questions presented to the Virtualization Architects when planning and designing a new deployment, for as long as I have been working with virtualization technology. To scale up or scale out, that is the question and philosophy that has flip flopped back and forth as the technology itself has improved and functionality increased.
When I first started in virtualization the processors were only single core and vCenter was not even an option yet to manage and/or control the virtual infrastructure. At the start, any server that was on the HAL would be great to get started and then VMware came out with Symmetric Multiprocessing (SMP) virtual machines, with single or dual virtual CPUs. This was great news and changed the design thought process with the new idea of getting the biggest host server with as many processors and as much memory that you could get and/or afford.
Technology then made an advance with the introduction of multi-core processors and now you could buy smaller boxes that still had the processing power of the bigger hosts but in a much smaller and cheaper package. As the technology changed the idea to scale-out seemed to overtake the idea of scale up, at least until the next advancement happened from VMware and/or the CPU manufacturers creating a see-saw effect back and forth between the two different areas of technology.
The see-saw will go back and forth over the years and if we fast forward to today we have a lot of exciting technologies that have been added to the mix. The introduction of blade servers a few years back was one of those key technology moments that helped redefine the future of server computing. Now, blade technology has taken a another big step with the release Cisco’s Unified Computing System (UCS). UCS has now taken the blade technology and turned it into the first completely stateless computing technology which currently is able to hold more memory than any other blade system and gives you the ability to run two quad-core processors in the half height blades and the four quad-core processors in the full height blade. Intel has invested time and money in the UCS platform and will remain the only processor available in the UCS chassis but as much as things have flip-flopped with the scale-up and scale-out question, the competition between AMD and Intel has been an exciting race with several back and forth’s going on between the two companies. With the video of AMD’s sixteen core processor making its way around the internet it is a safe bet to think that Intel’s equivalent or even better might not be that far behind.
Where do you think we are in the scale-up and scale-out question? In my opinion, I believe the scale-out option is the best way to go. As virtualization has been accepted as the way to move forward in the Data Center and more and more mission critical as well as beefier servers are now virtualized the need for 32 or 64 cores available per host becomes more and more prevalent to have the resources available for the next advancement that comes in play. Also to support the scale-out opinion it is worth considering VMware’s High Availability (HA) when deciding the number of virtual machines per host. In my years of designing systems and given the choice, I would want HA to be able to recover from a host failure in less than five minutes from the time the host goes down and all the virtual machines running on that host have been restarted and fully booted up. When you have too many virtual machines per host the recovery time during a host failure and the boot storm that comes with it tends to be dramatic and extreme.
That is my opinion and thoughts on the scale-up and scale-out question, so now let’s hear your thoughts and ideas to share with the class.
The see-saw effect is a by-product of the cost/value equation. With each new technology boost there’s a corresponding point where the diminishing returns kick in and it becomes cheaper to scale out. Most often it’s seen in the amount of memory you add to the physical host, but that’s only one example. So while the see-saw is a great description, the key is to balance your see-saw as much as possible.
Rob,
Good point about the technology and cost/value. Very true, but one thing that has not changed for me when designing is the amount of time HA takes to recover. That is one of the top things on my list to find my balance.
Steve
16 cores per package are needed in GRIDs and in core banking (low-latency messaging, Monte-Carlo simulations) more than in VMware. Today, 64 cores on x86 is a pain to get, – overpriced QPI hardlink solutions and still descent latency in-between. For VMware Interlagos case, we have to look into how NUMA node scheduling would be done – will they come up with 4-core or 8-core node for local memory? My wild guess is 4×8 (numa4) in full-height blade would be less efficient than half-height 2×16 (numa4) just to improved locality for SMP machines in latest, higher density and less cooling. I assume it’s only feasible for 4-vCPU and more workloads…