In my previous article, I introduced the idea of data locality in HCI. I also explored some basic math that illustrates the impact of scaling an HCI cluster. I compared a cluster without data locality to a cluster that does have locality. Today, I want to look at what happens when we need more than two copies of data, as well as to examine the impact of IO size on the storage network. My third article in this series will discuss the causes and effects of incomplete data locality, and it will also present a special case of data locality.
Three Data Copies
With a large cluster, the probability of a host or disk failure increases, as does the possibility of a double failure, which would make two data copies unavailable. This might mean that one host was down for maintenance and another host accidentally powered off. If the nodes don’t use local RAID, such a failure could be as simple as a hard disk failure while another node is shut down for maintenance. For most HCI products, every VM is spread over every host. Those products cannot tolerate one fewer host failing than there are data copies. To allow the cluster to operate with two hosts failed, we must keep three copies of the data. This makes the remote IO calculation look a little different. Changing the inputs to the formula in the first post, we get new numbers out. With data locality, we end up with double the remote IO, since two copies must go remote. Still, there are no reads across the network, as we have data locality.
80% Read IO, 3 Data Copies | Remote IOs | |
---|---|---|
Node Count | Data Locality | Without Locality |
2 | N/A, insufficient nodes for 3 copies | |
3 | 40% | 40% |
4 | 40% | 65% |
8 | 40% | 103% |
16 | 40% | 121% |
32 | 40% | 131% |
64 | 40% | 135% |
Without locality, we get a little benefit to the reads. There is a higher rate of coincidental locality because there are more copies of the VM data. Above sixteen nodes, we still end up with even more remote IO. With sixty-four nodes and three data copies, each node will do 135% network IOs. For every 100 VM IOs, the host will need to do 135 IOs to remote nodes.
Writes Multiply
You may remember from looking at RAID that writes multiply. A single write to a RAID set causes multiple IOs to the underlying storage. This is exactly what we are seeing here. The net effect of multiplying writes on your overall storage system depends on the proportion of IOs that are writes. Most server virtualization workloads are around 80% read and only 20% write. I used this for the table above and the one in the first article. One workload that is quite different is VDI. After the initial boot, most VDI environments are 80% write. The operating system cache handles a lot of the reads inside the VM. So, the really toxic case is if we use three data copies and have a VDI workload, which is 80% write. With data locality, the storage network does 160% of the VM IO. Without data locality, a large cluster can end up doing over 250% of VM IO. I doubt that many people would choose to build large clusters for VDI and run three data copies, however, so this toxic case isn’t likely to happen in the real world.
Size Matters, Too
All my math has been with IOs, but network capacity is usually expressed in Gbps—throughput, not transactions. As usual, this depends on the average (mean) IO size, which in turn is dependent on the application inside your VMs. Small IOs don’t get you out of trouble: they just mean that CPU can become a limit before network bandwidth. Keep in mind that pushing a lot of packets through the storage network will use a lot of CPU. Lots of little packets can lead to an overloaded CPU before the NIC gets overloaded. Either way, it ends up with slower storage for your VMs. Pushing fewer little packets will release that CPU time for running the workload VMs. So, data locality helps again when there is a lot of IO and a lot of nodes. The application’s read/write ratio also has a significant influence. Knowing your application is important when you are designing an infrastructure. Irrespective of the application, an HCI solution with data locality will have no more storage network traffic than an HCI solution without data locality.
In the third and (hopefully) final article in this series, I will look at how I have oversimplified the math. Some solutions with data locality don’t deliver 100% locality. Others may give up some VM mobility to achieve 100% locality.