How much insight are we missing from our environments? That is a question I find myself asking after being bitten by a new “bug” found in VMware vCloud Automation Center (vCAC). There seem to be many people like me who discovered their morning was wrecked when the vCloud Automation Center 6.0 tenants became inaccessible and the identity stores disappeared. This sounds pretty ominous, doesn’t it? Here is the list of symptoms that would have appeared if you were affected by the bug:
- When you attempt to log into a Tenant, the page comes up blank, displaying a Submit button in the upper left corner.
- You get a System Exception error when accessing the Tenant identity store configuration page, and the identity store configuration may have disappeared.
- You cannot log into a Tenant using an LDAP account.
- You are unable to add a new identity store configuration to the affected Tenant.
- The Tenant identity store disappears from the SSO Administrator login.
What was the bug that caused this issue? VMware Engineering determined that it was the internal LDAP account that vCAC and Single Sign-On (SSO) used to communicate (administrator) with each other behind the scenes. The password they use (a system-generated password that no one knows) expired! This is not the first time we have seen accounts expire without any notice or warnings, sending our environments into a real “Danger, Will Robinson” zone. Which brings me to my point: How much insight are we missing from our environments?
The obvious solution to issues like this is that if an account is created that is going to automatically expire, then logic would dictate that there should be some kind of view allowing one to see, track, and act to make changes so as to not be blindsided. I am also going to take this a step further, in that any and all internally created system accounts need to be fully documented. How much insight is needed for internally product-created system accounts is debatable. However, using the internal vCenter vpxuser account as an example, there is plenty of information that is presented and documented on this account: how it works and what to do if something goes wrong or is inadvertently changed in any way. I am going to take a guess that the vpxuser account is so well documented because it cannot be completely hidden from view or discovery in the same way that the tenant administrator account can. If we can only get information on what we can see, how can we be expected to have a complete and total understanding of the systems that we maintain?
Consider this thought for a moment. This issue brings to light one specific bit of functionality that is hidden behind the scenes. What else is out there that is hidden? We should have solid documentation on the procedures and processes of any applications or products we support. Although my post is focused on the cause and effect of a specific VMware bug, I would have to be completely naïve to think that something like this is not happening in other products by other companies. How would we know until we get bit? This brings me back full circle to the original question I presented: How much insight are we missing from our environments?
One last thing to add regarding this specific VMware-related bug. This is a current known issue now with VMware. VMware support and engineering are finishing up documenting and creating a knowledge base article with all the steps and procedures necessary to resolve this issue. At the time of this writing, the new KB article has not been posted yet. If you find yourself in a position in which you think you might be affected by this issue, I would encourage you to take a look at one of the following posts with the needed fix. There are two different sets of steps to follow, depending on whether you are using VMware SSO or the VMware Identity Server: