Cloud Dependency: Automated Upgrades

In my last cloud dependency article, I reviewed the need for ubiquitous networking. In this article, I look at the need for automated upgrades. I do not mean the need for automation in general, but specifically the need to automate any upgrade or update behavior. There are two sides to every cloud story: what the tenant does and what the cloud service provider does. In both of these stories, there is a need for well-planned, automated upgrades. Also needed is very good documentation on how to upgrade if the automation fails or if there is no easy way to automate. Upgrades should be bulletproof. We trust, but verify.
Lately, I have been moving through the time-consuming upgrade required to fully fix the Shellshock Bash bug from the cloud service provider’s side. While portions of the upgrade are automated, I would like to be able to fully automate it. However, I’m not sure this is possible without making changes to the underlying tools.

Cloud Service Provider

For most clouds, fully fixing Shellshock is not a simple application of a single patch across all systems. I wish it were, but it requires the application of the patch in a specific order across management systems, storage systems, networking systems, hypervisors, and virtual machines. It necessitates patching the infrastructure that makes the cloud work before patching everything else. For example, for vSphere-based clouds, upgrades must be made in the following order:

vSphere Update Sequence — VMware offers the helpful KB2057795, which provides the supported update sequence outlined here

For Hyper-V, OpenStack, KVM, and Xen-based clouds, there are other such charts. Yet, most if not all of these charts are incomplete, do not cover third-party items, and do not equate to automated upgrades. I would like to have one upgrade script that determines if a particular tool (or one similar to it) is in use and then performs the full upgrade, working through each stage as required. For a cloud service provider, each of these steps will be time-consuming at best and will require a manual process to complete. Most require going to a web page to start the upgrade.
All of the others have their own issues, with OpenStack probably being the easiest to automate (unless you are running Mirantis or Piston Cloud, the upgrade is not yet automated). For purpose-built clouds such as Amazon, there are a huge number of automated upgrades around the hypervisor, but are there enough around the management bits and pieces? This we are unsure of. If the latest set of Xen and Shellshock upgrades are anything to go by, there may just be spots of automated upgrades, but not enough for each zone of the cloud.
Automated upgrades should not cause outages; to me, that is automation gone wrong. Automated upgrades should be well tested and should cover all those little systems and very major systems outside the purview of the hypervisor—the bits that make things work.

Tenant

Tenants should also have automated upgrades. If tenants are using a hybrid cloud approach, their data could be anywhere. Upgrades should protect their data regardless of where it resides and should provide a way to upgrade their tens—or tens of thousands—of systems regardless of where they live as well. Unless your cloud provider is also a managed service provider, you—the tenant—are responsible for upgrades. For this, automated upgrades are crucial. Your cloud systems should be running the same bits your on-premises systems are running. If this is the case, then automated upgrades will be a huge win.
However, if you are not using automated upgrades, or your systems are disparate operating systems and applications, then pushing out a set of patches, new security or data protection agents, or even new parts of the application will be more difficult. They can still be automated, but one-off automation tends to be left behind when it comes to major upgrades. Granted, tenants do not need to worry about the bits that make the cloud run, but they should know when major updates are happening so they can plan checks and tests around such updates.
If, for example, the tenant’s security is part of a multitenant installation of a tool, then when that tool is updated, the tenant should be part of any testing. If this is a firewall, a rule might perhaps reset incorrectly or simply no longer work. This would cause an outage. Automated upgrades should never cause outages.
Tenants should update all parts of their systems with the same set of automated tools. Having done so, they will constantly use and refine those tools. This is where one-off tools tend to be forgotten. The DevOps and Agile approach to development and operations should have your developers using the same infrastructure upon which the application will be deployed. This allows a single tool to deploy and update your environment. If you have not yet embraced DevOps, then ensure your deployment and automated upgrade tools work across all types of hardware, guest operating systems, and silos of IT.

Know Your Dependencies

The key to automated upgrades within the cloud or on-premises is to know your dependencies. For this, you need tools to help you. As our systems grow, the need to manage, upgrade, and control thousands of virtual and physical systems becomes commonplace. It is so very easy to stand up a new instance, which could have an impact on your cloud service provider, which might need more hardware, etc. There are both hidden and open dependencies, and you need to discover them all so that you can plan appropriately for any form of upgrade.
If you are a cloud service provider, you have hardware and other dependencies. If you are a tenant, your operations depend upon the cloud service provider as well as on your own systems. Automated upgrades need to account for both of these types of dependencies, perhaps even calling test harnesses automatically. It could be a big win for a cloud service provider to call a per-tenant test harness when the automated upgrades are completed.