vSphere Upgrade Saga: VSAN Upgrade Woes

The upgrade to VSAN 6.2 did not go as smoothly as I wished. It was possible to do but required me to rebuild not only VSAN but my cluster as a whole as a rolling upgrade did not work as expected. Perhaps this is just the way I have my VSAN configured.

I use blades. This implies to get the storage I need for VSAN and VSAs, I need to use storage blades. These expensive apparatus cost almost more than the compute blades and also take up necessary slots. The manual says you need a minimum of 3 nodes with storage to make up a VSAN cluster. I only have 2 nodes with storage, but three blades. In this case, I needed to use a VSAN witness, which works quite well for my small usage as long as the Witness is on some other non-VSAN storage. On this set of VSAN nodes I placed a small VMware Horizon View desktop pool.

Now, while I am sure this is not 100% supported, VSAN does seem to work in this mode. The data is accessible by the cluster and the witness does its job. Once more, the Witness should not be on one of the VSAN systems and not on VSAN storage. You can use anti-affinity rules for the Witness as well as place it on some other storage. I use an iSCSI StoreVirtual VSA presented from a different set of disks.

However, when I went to upgrade from 6.1 to 6.2 I ran into a problem. The VSAN storage would just not upgrade the filesystem. It kept complaining that there was no space left on the device. Now that is just not true. It was empty. I even moved the VMs to a different storage device. It was Empty.  Now perhaps this was because I had a node without any storage that was able to access my VSAN (a compute node). This is supposed to be possible, yet I did not remove that node from the cluster to try again. Granted, I thought the array was empty but further research showed that there were some files representing various temporary disks on the VSAN, yet, the free space was 90% greater than those disks.

Instead I solved the problem another way as I considered everything migrated off.

VSAN Upgrade Try 1

The solution I used was to disable the stretch cluster, remove the Witness, destroy the disk groups, and then disable VSAN. Once that was done I recreated everything.

Unfortunately, this method while it should work did not. I was confused at this point, but it had something to do with the vCenter database related to VSAN. The definition of the VSAN cluster was ‘stuck’ in the database and every recreate had the same problem. There was no way to clear this out short of editing the database directly (which is just not recommended).

VSAN Upgrade Try 2

Since VSAN is tied to the cluster, the next attempt was to recreate the cluster.  I did not delete my old cluster until I created a new cluster and then migrated into that cluster each host. In order to do this without rebuilding, the cluster needs to be setup identically to the original with respect to EVC mode (either disabled) or setup for the processors within the systems. The mode I use is IntelTM “Ivy Bridge” Generation. Once set, it was just a case of moving my nodes into the new cluster.

That completed, it was then possible to recreate VSAN which would create the new VSAN with the proper disk format. Which I proceeded to do. First by installing the Witness, Configuring it, then Create the new VSAN. Finally, migrating back my VMware Horizon View VMs.

VSAN Upgrade CLI Solution

This is actually a known problem documented by Cormac Hogan (which I did not know about until after I tried my upgrade). Cormac’s approach is to run the upgrade on a VSAN node which requires logging into the console of the ESXi host directly and performing the upgrade.  While documented for v2 it should work for all versions.

Common Problem

There is however a common problem with VSAN Witness. The password complexity is very hard to figure out. Actually, I think there is a bug in the complexity rules as a 22 character password with all the complexity rules met, just does not work. It says, the password does not meet the requirements. Eventually I found a shorter password that worked. I am still looking into what is causing the problem.

Conclusion

VSAN works quite well, but for blades is not a cheap option yet and unless you have the proper storage, not truly supported. So my solution going forward is to rethink how I have disks layed out internal to each blade so I can get the minimum number of nodes configured. This will include looking at USB, Micro SD, and other mechanisms to place more storage into my blades . Thankfully 1TB SSDs are not that expensive anymore.

Leave a comment

Your email address will not be published. Required fields are marked *

I accept the Privacy Policy

This site uses Akismet to reduce spam. Learn how your comment data is processed.