Managing and Monitoring Performance in SDN / NFV

We have all drunk the Kool-Aid. Software-defined networking (SDN), network functions virtualization (NFV), or both will save the world. They decouple us from the shackles of legacy networks to allow a utopia of business-driven requirements to freely flow, delivering value and freeing the network, application, storage, and infrastructure teams to have weekends off and time with their families.

OK, now that you’ve woken up from your dreams of this fluffy world of wonder, it’s time to go back to the real world: the one where you still have functional tier teams, and each team is eyeing the others as the harbingers of doom for their services.

We all have heard the horror stories: the internal fighting that happens on a sev-one, the we-said/they-said blame culture.

Storage blames infrastructure, infrastructure blames applications, and everybody blames the network. Then calm ensues, and you start to look at your logs. Yes, believe it or not, outside of the bubble that is Silicon Valley—or the cosseted ivory towers of large enterprise—the majority of people in tech still manually trawl through logs on disparate systems to build their picture of what is going wrong and where.

Also—shock and horror—there are still a heck of a lot of actual physical servers out in the real world where the dragons still roam. (Hint: not in California, Austin, Atlanta, or Boston.) These are not readily positioned to take advantage of the benefits of SDN or NFV without extra software like a VXLAN-VLAN bridge in NSX. Thus, these machines do not partake of the SDN goodness. (Though, if you are using ACI in a full Cisco environment where you have full network control, down to both physical access and virtual access port level, it’s theoretically possible. This is not exactly an SDN running on utility hardware, however.)

That said, to me the biggest area of worry is that of monitoring in this brave new world. The majority of the world still uses ping and traceroute to troubleshoot their networks. These are not exactly suited for an SDN/NFV world, where every server-to-server hop is exactly one jump, regardless of how far it is to the endpoint or how many physical devices there are to traverse.

Thus, new tools will be needed to manage your new SDN network. They need to be capable of managing both the new software-defined layer and the underlying physical layer. Why? Well, in most SDNs, you have no easy way of understanding your traffic flow. I think a diagram would make this concept easier to understand:

As we can see, the servers only logically see a single hop.

This is a rather simplistic point to make, but it shows that what you see between services doesn’t reflect what’s actually there. Now, throw in wireless, 3G/4G, and WAN/MPLS networks, all hidden under an SDN overlay or OpenFlow environment, and performance monitoring becomes even trickier.

How exactly do you measure performance of your network in an environment in which you are unsure where your traffic is flowing?
How do you monitor and identify physical failures when you are unaware of the flow of your traffic?

Yes, SDN offers the promise of a simplified and dynamic environment: nirvana for those large-scale cloud environments in the SDN. However, it also brings new complications that current network management tools cannot easily handle. Traditional monitoring uses SNMP, flow-based, and packet-based monitoring and analysis.

It could be argued that SDN diminishes the need for SNMP, as most information about individual devices is available at the central controller. However, from my perspective, there is still a valid need for SNMP. We still need to monitor physical hardware along the traffic path, even if the path is hidden from the services. In an SDN world, the need for low and packet-based monitoring increases, and as a result, SNMP will become an integral component of the SDN deployment. There are three major reasons for this:

The need to manage and monitor the SDN deployment
The need to manage and monitor the dynamic and fluid IT infrastructure that SDN enables
The need to manage and monitor the underlying physical devices.

The current breed of network monitoring tools handles reason number three very well, but reasons one and two not so well or even not at all.

A new breed of monitoring solution exists that monitors all three layers. However, it has deployment issues. Installing a new monitoring system is not a trivial task, whether with regard to budget or reach. Legacy hardware may not have reachable APIs that the SDN overlay can address, which makes information-gathering about performance tricky, or even impossible. You can gather some information about the hardware via SNMP, but it may not be enough data. There are additional issues regarding seeing inside the virtual infrastructure. VMware has done a lot of work in this area and provides strong addressable APIs. Other hypervisor providers, like Hyper-V and KVM, don’t yet. Add Docker into the mix with its Socket.IO product, and you can see how the problem multiplies.

This is a very interesting and developing space, and it’s clear that SDN/NFV is a growth area. Getting it right is essential to delivering on the promise of a software-defined data center.