On Complexity - TVP Strategy

A few weeks ago, Hany Michael released a blog post on his NSX lab network. Embedded within is one of the most brilliantly clear diagrams of a very complex situation I’ve ever seen. It takes a level of skill to achieve the clarity of this diagram. What hit me, though, is the sheer level of complexity that Hany conveys in this document and how that complexity is inherent to the SDDC. It’s easy to argue that the diagram shows the smallest possible instance of an SDDC (except it skims over the storage). Not too surprising, as it’s an SDDC lab. It’s inherently VMware focused, but it could be applied to Hyper-V or OpenStack easily. Each function in the diagram would still be necessary, although some would switch or merge. This article will be quite VMware focused for this reason.

Ignoring for a moment the NSX aspect of the diagram, we see a multisite network, with a single campus at each side, in a WAN situation. We have directory services (AD—Active Directory), the compute management plane (vCenter), disaster recovery servers (SRM), monitoring servers, orchestration servers (vRealize), two separate routing protocols, multiple routers and firewalls, and multi-tenant control (vCD—vCloud Director). That’s nine functions spread over twenty virtual machines at two sites before we add a single tenant system that can do some useful work. If we start to include EUC, then we get another vCenter (probably two, one at each site), as well as having additional vCenters to separate out the management of the management domain cluster from the compute. We are now up to thirty virtual machines and multiple racks, again without a single “useful” VM carrying any workload.

This is a far cry from the beginnings of virtualization, when a single vCenter server would command and control all functions, and the complexity was in the application deployments. Now, we need a whole raft of infrastructure to make the command and control functions upon which we then build out systems. How and why have we come to this point? What do we gain from this massive increase in complexity?

The first thing to consider is how much of this is “increased complexity” and how much is “acquired complexity,” where we have taken on the complexity of other systems that would otherwise have been handled by different teams. The final aspect to examine is how much of this complexity is inherited by reusing complex protocols.

This diagram is heavily NSX focused, so we will start with the network layer. Much of the complexity is in firewalls and routing protocols. What we have inherited is the complexity of routing protocols, designed to run autonomously in hardware with relatively few resources. This makes sense, though: it means the SDDC can interface well with pretty much any network we might care to imagine. The second aspect is that much of this complexity would have been handled by the network, or even a dedicated firewall team. As server or virtualization admins, we would never encounter this. Pulling these functions into software gives us the ability to be much more functional—and much, much more agile. Changes that were complex in hardware are easier and quicker in software. In the same vein, we have reduced the complexity that the hardware team deals with by moving their underlay networks to simple leaf/spine models in an idealized data center.

Looking at compute, not a lot has changed from those first days: racks of servers are racks of servers. The complexity is layered over the top here, so we haven’t moved too much.

Storage, depending on how it is gone about, either has made little change or adds another layer of complexity. As a lab, the diagram only assumes some simple Openfiler storage. Production workloads would demand more: either the traditional SAN and the team that goes with it, or some hyperconverged system that bakes the storage into the compute. Continuing with our VMware flavor, it wouldn’t be a huge jump to include vSAN. That would require some simple network setup and configuration. This is almost the direct opposite of the situation with networking. We inherit almost no complexity, and remove some of the complexity of dealing with LUNs and backups. The problem here is that we lock ourselves in, either to a hardware vendor, or to the virtual SAN system (VMware VSAN or otherwise).

So, what can we take away from this? The SDDC is a complex beast. Without including EUC or container systems, keeping a single vendor system, and just building out the foundation to build our system on, we already have an incredibly complex beast. For many, this will be too complex. Moving to Google Cloud, AWS, one of the many other providers of public cloud, or managed private cloud will seem an obvious choice. Those of us who need to keep systems on-site and within total control must be ready to invest heavily. Getting off the ground is harder now than it ever was. But once the SDDC is in place, we can move and expand faster than ever before.