There is a lot of talk of having enterprises build and operate IT infrastructure the same way hyperscalers do. AWS, Google, and Microsoft can build and operate cloud platforms that are very cost effective. The logic is that enterprise businesses can use the same techniques to build and operate their own efficient data centers. I believe that there is some merit in large enterprises trying to follow the hyperscalers’ methods and models. I also think that the nontechnical parts are far more important than the hardware and software selection. We come back to the three parts of a solution: people, process, and technology. Most enterprises look only at the technology part of hyperscale and miss the place where the real efficiency occurs. Hyperscalers are all about minimizing the people and optimizing the processes.
I’m not sure that the technology side is even relevant. I compare cloud services like AWS to a train network. They are extremely efficient at moving large amounts of cargo from one place to another. The places are stations: you must bring your cargo to the station nearest your source and collect it from the station nearest the destination. The key here is that you must fit into the train system; it does not fit to you. In the same way, cloud services provide fixed services and you fit your application to those services. Most enterprise IT is more like a courier service. Your cargo is collected from your location and delivered to the destination. It is far less efficient for bulk transfers compared to a train, but the courier pickup is fitted to your business. Enterprise IT is usually customized to suit the individual business requirements. To extend the metaphor, hybrid cloud is like calling a courier to take your freight to and from the train stations but using the train for the bulk hauling. Suggesting that enterprise customers should adopt Open Compute Project (OCP) hardware and other hyperscale technologies is like suggesting that they build their own railroad. For a very small proportion of very large enterprise, it makes sense, but the clear majority of businesses do not need that kind of bulk capability. Enterprises have their own needs. What works for Google will not necessarily work for a pharmaceutical company, a bank, or a law firm.
Returning to the hard stuff: the people and process. Enterprise IT typically expects the operations teams to keep their IT running. Operations teams look at alerts and help-desk tickets and act to resolve issues. A good operations person does write a few scripts to resolve issues and generally make their life easier, but mostly they use their judgment to decide on an action and resolve the issues at hand. Hyperscalers don’t have a lot of use for these kinds of operations people. Hyperscalers want their environments run by automation. The operations function is to create that automation. Google calls professionals in this role site reliability engineers (SREs), and it expects its SREs to be software developers for about half of their time. Crucially, the SREs are expected to build the platform they operate. There is no vendor with a shrink-wrapped platform for Google to run its search. Part of the role of the SRE is to continue to build the Google infrastructure. Any repeated human action needs to be automated. The only way to operate hyperscale infrastructure is for computers to take most of the actions. The SREs essentially need to be trying to do themselves out of a job by automating their own work.
This pattern of having operations teams develop software to automate the work of the operations team does not seem to translate into enterprise IT, yet if enterprise IT is going to keep growing, automated systems will be crucial. Most enterprise organizations expect to buy IT infrastructure software. They only expect to configure it to their needs. At the most, they expect to use a domain-specific language to customize the software. They are happy creating workflows and codifying business processes. When the software needs to be modified, it is usually the role of the software vendor to write the Java and C code that makes up the software. The operations teams then deploy the new version. To operate effectively at large scale, enterprises need to adopt development within the IT teams. This usually also means adopting open-source software. CA, IBM, HPE Software, and the like will not let you have their source code. It is a massive change to own and develop your own IT software, but it is the only way to truly replicate how the hyperscalers operate. If enterprises want to replicate the success of hyperscalers, they need to replicate the people and processes, too. Enterprise needs to stop operating IT systems and start building them.