A Redundant Array of (In)expensive Clouds

The cloud is all very well—buy resources as you need them, and don’t care about the infrastructure required to deliver those resources. But there is no cloud: it’s just someone else’s computer. So you should still think about what is required and what happens when things go wrong. In particular, what happens when a cloud service shuts down? If cloud services are known to fail or be withdrawn, then a good cloud strategy should consider alternatives to each service. This is one item of the service level that should get some attention before you rely on a cloud service.

There are a few obvious examples of large-scale cloud services being shut down. Right now, HP’s Helion Public Cloud is on its way out. HP has given customers three months to evacuate the platform before it is shut down. Apparently, the service level allows HP to give customers only two months’ notice. HP has generously given customers an extra month to make alternative arrangements and migrate workloads away. An older example is the Nirvanix shutdown in 2013. Nirvanix was a cloud Storage as a Service (STaaS) company. It provided a globally distributed storage platform with the capacity to hold billions of files. Nirvanix gave customers two weeks’ notice that it was shutting down. Two weeks to migrate all data off the Nirvanix platform. Customers with terabytes of backup data literally could not evacuate everything in two weeks. Customers who had come to rely on the geo-dispersed NAS were left with little time to find an alternative. As well as full service shutdowns, it is also possible that cloud providers will simply sunset unprofitable services. Google has been known to shut down wildly popular consumer services like Google Reader and Google Code. It isn’t hard to extend that policy to unprofitable enterprise services, although I’m not aware of any paid services it has shut down.

Does this mean you shouldn’t use cloud services? Should your applications be built to operate across cloud providers? Should you build a Redundant Array of (In)expensive Clouds? As with so many things, the answer is that is it depends. If you are planning to use a cloud service for a transient workload, like a marketing campaign for a product launch, then the risk is low. These sorts of transient workloads are ideal for cloud services. There is little financial commitment to use the services, and the requirement is limited in scope and time. At the other end of the scale, the use of cloud storage services can have a very long life. If you use the cloud as a backup destination, then you need a service that will be around as long as the data has value, or as long as regulations require you to retain data. The other factor to consider is blast radius. If you place 10% of the business in one cloud provider and that provider disappears, then 10% of the business is gone. It’s a very different story if you go all in and place 100% of your IT in one cloud.

It is also worthwhile to think about why cloud services get retired. The fundamental reason is economics. I suspect that the Helion Public Cloud failed to get enough customers to justify the huge spend required to run a public cloud. I’ve written that I think HP found it impossible to compete with AWS. That got me thinking about AWS and the issue of service retirement. I couldn’t find any sign that AWS has ever retired a service. It retires old EC2 instances when the hardware gets old, but that seems to be it: no rungs pulled out from under anyone. Another thing we now know about AWS is that it makes money—a lot of money. That means AWS is likely to stay in business, perhaps unlike some smaller public cloud providers.

Of course, intentional service interruption is only one part of cloud service shutdowns. There is also a risk of unintentional service loss. Every cloud platform has had unplanned outages. It is a fact of life that complex computer systems have failures. Knowing your cloud provider’s failure domains and history is important. This should drive your availability design in the application.

Putting some or all of your business in the cloud is a risk. Like other risks, it should be evaluated against the potential benefit it brings. Service levels in public—and private—clouds are important, as is a profitable business for your cloud provider. Part of your discovery with a cloud provider should focus on the long term. Look beyond ease of onboarding: look for financial sustainability. Look for past behaviors on retiring service, and examine the service level closely.