When we think about network connections, our focus is usually on bandwidth. Bandwidth is the main metric in everyday use, as most connections are within a local site, where latency will be very low. There are a few specific instances in which this is not true and latency is the main target. High-performance computing (HPC) is one, and inter-site connections is usually another. As soon as connections touch the Internet, though, most thought of latency is out the window. There are too many factors beyond the enterprise’s control, and usually latency is not the most important factor.
Bandwidth, as it relates to network connections, is the amount of throughput a connection is capable of sustaining: the number of bits per second that can be pushed through the interface. Modern data centres work in the realm of 10 Gb, 40 Gb, and even 100 Gb links, with some 1 Gb legacy links still around. Latency, though, is the time it takes data to travel across the network. Usually, latency is measured as an RTT, or round-trip time: this is the time a packet takes to get from source to destination and back. Within the data centre, latency is measured in milliseconds (ms) and is generally in the less than 5 ms range. Over the Internet, a good rule of thumb is 25 ms within a given country, 100 ms within a given continent, and 150 ms intercontinental. These limits are very close to the speed of light limit, which is fundamental. The final consideration, even more esoteric than both of these, is the number of packets an interface is able to process per second (PPS). This is something that switches are rated on, and tends to be in the millions of packets per second range.
These three numbers, then, define our Internet connection and are closely related. If we put enough packets through an interface, we can saturate the number of packets the interface can process and start buffering before we have utilised all of the bandwidth. Buffering the packets will increase the latency, and our links begin to look slow, even though they are barely used from a bandwidth point of view. If we put enough large packets through the link, we can saturate the bandwidth, stopping the command and control packets from being allowed onto the link and so introducing latency. As we can see, latency is the first indicator of problems.
Latency can be introduced at the routing and firewall layers as well as at the switch layer. Routing, IDS, and firewalling all involve a lot of inspection of the packets, which can add to the latency. So, just monitoring latency doesn’t tell us the whole story. Storage connections tend to have jumbo frames enabled so that they can saturate the bandwidth of links with relatively few packets. IoT (Internet of Things) connections are at the other end of the scale, with many small packets. It becomes obvious, then, that where LAN and particularly SAN connections can be monitored easily and closely, and fixed with the addition of more bandwidth, IoT is going to be a much, much more complex situation to deal with.
In the case of IoT, the traffic always originates outside the data centre. Immediately, we have to deal with the highest latencies around, with devices likely scattered to the four corners of the globe. We have many, many small updates—almost certainly many millions of very small, time-sensitive updates. We have lots of routing and almost certainly multiple levels of inspection and multiple ISPs even before we get to the data centre. What’s more, almost all of these issues are outside the control of the enterprise, the devices being in customer homes and on mobile networks. Finally, as the number of packets that can be switched depends more on the capacity of the switch and router CPU than it does on the interface speed, adding in links to a switch will not help when a system is saturated with small packets. Only adding in more devices will, which in itself leads to scaling issues, as more logic is required to choose between the devices.
What is the solution? Peering will be one small step, with enterprises working more closely with consumer ISPs than has been the case in the past. This can only go so far, though. It would be impossible to peer with all ISPs, even if it would help. More distribution of data centres is the most likely case, with the workload becoming more distributed to distribute the load and be closer to the endpoints. Ultimately, however, this will be a problem that must be solved by the applications the devices run.