VSAN Comes of Age with Version 6.2

On February 10, 2016, VMware announced VSAN v6.2. This is the forth generation of its flagship software-defined storage (SDS) product to be released. At the time of the release, VMware announced that it has more than 3,000 customers running the products; that is quite a number.

Now, to me, it is a misnomer for this to have been given a minor release notation, as there are a slew of new features, some of which are more than worthy of a major release cycle. I will examine the major ones in this article.

This first thing I will look at is what VMware has termed Space Efficiency. This consists of two distinct feature sets, the first being deduplication and compression. Initially, I thought that this had been thrown over the fence from EMC, but apparently VMware has written it all by itself. The deduplication process works on a fixed block length of 4 KB. It is undertaken when information is moved from the caching tier of the VSAN stack—which is implemented in a mix of main memory and flash-based PCI Express cards in servers—to the capacity tier in the VSAN. Once the data has been deduplicated, the compression algorithms kick in to further reduce storage utilization. There is a small issue with this: the process does not use any of the recent functions added to the Intel Xeon processors to aid it, so there will be additional overhead to the VMkernel. This, according to VMware, is in the region of 5% of CPU, not a significant amount. With today’s modern processors, it should not be much of an issue.

The the next feature is called Erasure Coding. This makes use of RAID 5 or RAID 6 data striping across the multiple ESXi hosts that make up the cluster. Before we move on with this feature, I need to get something off my chest: to me, RAID means “redundant array of inexpensive disks” and is not node based. Obviously, this is not the case with VSAN, as it is node based, so it really should be called RAIN (redundant array of inexpensive nodes). That is what I am going to call it.

Previously, VSAN used a mirroring technique across nodes to provide resilience against node failure. This was obviously expensive in terms of capacity. With the introduction of Erasure Coding, significant space savings can be achieved. For example, prior to version 6.2, a 40 GB VMDK would utilize 80 GB of space: 40 GB for the actual disk, and then another 40 GB for the replica. Post–version 6.2, a 40 GB disk that is stored on a RAIN 5 uses only 54 GB, and on a RAIN 6, it uses 60 GB. If you plan to move to the new architecture, then you need a minimum of four or six nodes, respectively. Otherwise, you need to continue with the standard mirroring.

The next feature is a surprise: a new Software Checksum. Why is it a surprise? Simply because I would have assumed that some sort of checksum calculations were already being done. If not, then it is a welcome addition.

Other additions included:

The ability to add QoS on a per-VM basis.
A new performance and monitoring database purely for VSAN that is integrated into the web client.
A Client Cache: a write-through read memory cache that is “local” to a VM. This brings data locality to a VM. It does add a little overhead on the host, but it will have a big impact on performance (especially for VDI).

Finally, Sparse Swap can reclaim space used by memory swap. This is an added advanced option on a host that enables setting policy for swap to “no space reservation.” Monster VMs with a lot of memory create a large swap file by default, which obviously consumes a lot of space on the VSAN datastore. By specifying the new advanced host parameter, you can make the swap file thin provisioned, thereby saving a significant amount of storage capacity. This has to be set per node/host.

The only downside is that these features are for the flash version of VSAN only. Hopefully they will filter down to the spinning rust version in time.

As I have already stated, VMware had more than 3,000 customers as of the end of 2015, and it is adding over 500 users per quarter. This is a significant run rate for a product that only launched in March, 2014. So, how do these stack up?

As we can see, there are three tiers: Standard ($2,495), Advanced ($3,995), and Enterprise ($5,495). These costs are per processor, so a standard build node—say DL380 G9—will result in costs of between $4,990 and $10,980 per host. This is just for the software; the node cost and the capacity cost has to be added to that outlay. Remember, to take advantage of RAIN 5 or 6, you will need a minimum of four or six nodes, respectively. Thus, this is not a cheap alternative to SMC’s four-node VSAN Ready FatTwin model, which currently comes in at a little over $113K and includes all VSAN and vSphere licensing and four nodes. This does compare very well against its major competitor’s most popular model, which comes in at well over $200K list, and that doesn’t include vSphere licensing. I know people do not pay list, but that is still a significant difference.

VMware has made a statement with VSAN. The amount of development going into this product is significant, and the differences between VSAN v1 and VSAN v6.2 are large. With the additional features added to this release, it is getting nearer to being production ready for critical workloads. I think that VMware is in it for the long run with this product. It now has a valid solution for the HCI market space. This space is VMware’s to win or lose.