In networking, as in life, we often use the same terms to mean many different things. One of the biggest culprits of this in networking is “edge.” An edge device is usually considered to be a device that connects into a network in only one place. Traffic can flow from an edge device, or it can flow to an edge device, but it can never, ever flow through an edge device. I say never—that’s not entirely true, but I’ll get back to that later. In a campus network, the edge devices are things like users’ computers, laptops, and printers; mobile phones; and tablets.
In data centers, the end devices are servers or, more than likely in the SDDC, virtual machines, or possibly containers. The exception to the rule about traffic not flowing through an edge device is the “edge router,” which more often than not takes the form of a firewall: a perimeter firewall. If we consider north/south versus east/west traffic flows, north/south traffic flows move between the edge and the core, and east/west circumnavigates the network, to take the globe analogy a step further. This distinction becomes important as we look at the direction that networking has taken, and the direction I believe it will continue to take.
Let’s take a step back. Looking at networking, switching is easy, and cheap. Routing is a little more complex, and a little more expensive. We can route quickly on modern hardware (be that ASIC or CPU), but we still take a performance hit compared to switching. Services such as stateless control lists (ACLs) are the next most expensive, with stateful services, load balancing, NAT, and stateful firewalls being the most expensive. The age-old three-tier network model was built around routing being expensive. With only switch interconnects and spanning tree blocking ports, combined with the expense of routing, it made sense to route in the core, or aggregation layer, where fewer expensive boxes could be implemented, thus keeping the edge switches cheaper. The physical constrains of most campus networks mean that this is still a good model for a campus in many cases.
The data center is different, and two developments have led to the emergence of rather different designs. The first development is that networking hardware has become inevitably cheaper. Routing functions that were once solely the domain of large core switches are now available on even the most basic of managed desktop switches. This means that the top-of-rack (ToR) switches can now route just as capably as their end-of-row (EoR) or core cousins. This leads to the propagation of leaf-spine networks, where the middle tier is removed and routing protocols (or in some cases fabric protocols such as TRILL) replace the spanning tree. The benefits here are many: fewer switches, lower cost, more resilience, and more bandwidth for interconnectivity. The final benefit is that all edge devices are the same “distance” from each other, taking the same number of switch hops and likely the same latency to reach each other. It doesn’t matter which ToR switch the host connects to.
On the software side, we saw first switching and then routing move into the hypervisor layer. This was, in hindsight, inevitable. Making the switching/routing decision close to the source of the packet flow can reduce the distance the traffic needs to traverse considerably. If two virtual machines are on the same vSwitch, on the same host, then traffic need never leave the host. With routing in the hypervisor, even the vSwitch limitation goes away. All traffic between VMs on the same host never leaves the network, and so is switched much faster and puts less load on the network. So, we have increased the potential of our east/west traffic no end by removing much of it from our network at source and reducing the work to move the rest. But what of north/south traffic?
North/south traffic, by definition, is the movement between security zones. It is the point at which ACLs, NAT, and those other, higher-cost actions take place. At this point in the game, those functions are limited to either hardware devices or virtual machines. NSX, for example, has a virtual machine called the “Edge Services Gateway,” which handles the advanced firewall, NAT, and load balancing services. Although simple ACLs can be applied at the vNIC level, these must still be generated first by the gateway. What is the effect of this? Well, it means that north/south traffic has many more hops to traverse to reach its goal. It means that we have “edge racks” through which a lot of traffic is funneled to allow security, or “performance” benefits. A lot of what we gain by moving to leaf-spine networks and hypervisor-based switching and routing is lost because that traffic has to be moved back out to servers or devices in a specific edge rack and back again.
What does this mean for the network? Well, in some cases, very little. If north/south traffic is limited, if upstream bandwidth is a few percent of east/west bandwidth, then this isn’t an issue. If you run a multitenant environment through—if most of your traffic is north/south—well, then you have problems. But it seems obvious to me that that problem will dissolve in the future. The trend of moving functions to the hypervisor will continue, I think, for a while, and those higher-level services will migrate. The hypervisor is becoming the point at which network services live, with only their control plane removed out to VMs.