My friend and colleague Alastair Cooke recently published an article entitled Advanced Simplification: You Want It. In it, he argues that simplification is something that “deep geeks tend to scoff at” while at the same time embracing the wizards and silent install scripts that help them manage applications and platforms and deploy them into their environments.
Alastair said they tend to think simplification is “easy to do, so anyone can do it. If anyone can do it, then it cannot be sufficiently powerful for our complex use case.” Truth be told, I have found myself uttering those very words. Why? Most large environments I have worked in have strict procedures for almost any kind of work performed in our day-to-day operations, with all the work logged and tracked in the CMDB. It is mainly this integration that makes creating custom workflows and automation an absolute necessity.
To achieve simplified management, Alastair said, “we need to move away from handcrafted perfection and toward policy.” I agree that policy-based automation is needed in larger environments, but I would like to enhance Alastair’s argument and add my own take to it.
In my humble opinion, three different types of automation are needed, not just for operational tasks but also for all of the second-day operation needs. Policy-based automation is the first pillar. I usually consider it to be the proactive automation that starts when a deviation from a given baseline or known good configuration has been found. This type of automation’s main purpose is to maintain and enforce standards. Define your standards, scan for the values, fix when needed, rinse and repeat.
The next type of automation needed is what I like to refer to as reactionary automation. This type of automation is mainly used with the alerting systems; it takes action based on an issue or problem that has occurred. This type of automation is usually built and enhanced over time based on the number and types of alerts that are triggered. When an issue is encountered enough times, code is written to take specific action to resolve it when it happens again.
Proactive and reactive automation have the potential to maintain an environment, but as we all know, in the world of IT, it is not just maintaining an environment. We get multiple requests to do something to some system sometime in the middle of the night. This brings me to the next type: request automation, sometimes called self-service automation. Request automation is based on the most-requested operations for things like increasing compute or storage resources. The name of the game is to create automation based on customer needs until you get to the point when most requests can be handled via the automation.
In closing, I would like to point out that two of the three types of automation—reactive and request automation—tend to be developed dynamically based on the needs that the environment presents in the way of alerts and requests. This is another reason I have been found guilty of thinking that if “it’s easy to do…anyone can do it. If anyone can do it, then it cannot be sufficiently powerful for our complex use case.” Each data center tends to be its own individual beast with its own unique issues and quirks, although this is not due to a lack of effort on our part. How many different data centers have you worked with that operate the exact same way? You could apply that logic to almost any type of grouping, whether the cluster level, the app level, or something else. As a rule, no two environments are exactly the same. Even though we, as support staff, have made mental notes that are acquired from experience, the deviations come from a different area. This is why, in my opinion, to be successful with your automation, you need to do a lot of building your own automation to match the needs in your environment. You may get away with some out-of-the-box automation, but it will never be enough. Customization is what you want.