Let’s start the new year right with one of my current favorite topics for discussion: automation. In this article, I concentrate on the second-day operations type of automation. Second-day operations is quite a different beast from build and decommission automation, in that it incorporates several different approaches to automation.
The first things that come to people’s minds when you mention automation are standard requests for items like changing the number of processors, changing the amount of memory for a virtual machine, adding a new virtual disk, or extending a virtual disk. I like to refer to this as “request automation.” Request automation encompasses the day-to-day tasks that end users request to support their applications and services. This is one of the first and most common workflows developed to cut back on the number of requests and tickets that are generated and have to be addressed.
Another type of automation is what I call “scanning automation,” which could also be called “find-and-fix automation.” This type of automation is achieved by using scheduled scans that find any targeted settings that are not presenting the expected value and then by taking the proper automated action to correct the discrepancies. An endless supply of scripts and workflows is readily available on the web to perform all kinds of different scans and queries in an environment. These reporting and health scripts can become the foundation for any find-and-fix automation. Half the work is already done, in that the scripts, workflow, or both are already looking for issues to report on. All that is left is to develop the actions to correct.
The flip side of the coin from scanning automation is what I call “alerting automation.” Alerting automation is exactly what it sounds like: the automation takes action upon a triggered alarm or alert from another system. In most data centers, this kind of automation has been in place for quite some time, utilizing SNMP to a monitoring server that automatically opens an incident in the change management system. False alarms from the monitoring system are one of the biggest complaints when relying on SNMP, but now before a ticket is opened, a workflow can be triggered to verify whether the alert is valid, and the automation can try to address the issue. That action could restart a service or a server itself; an attempt can be made, and if that does not resolve the issue, only then will an incident be opened.
Both scanning and alerting automation use what I consider “reactionary automation.” This is automation that reacts to an event or to the output from a workflow or script scanning the environment. The real magic with automation is with what I call “analytical automation.” Analytical automation brings to the table proactionary automation, by which some kind of data can be analyzed and logic applied to determine the best possible action to take. When it comes to computer hardware, failure is expected at some point, and you are stuck waiting for something to break. However, if you are collecting data, say performance data and other metrics, you now have the ability to monitor and trend the amount of storage available. Analytics could determine the expected amount of time until the storage is full and add more storage to the systems automatically. This is just one example of analytical automation. Let your imagination run wild, and think of the possibilities.
Performance is just one area in which analytical automation applies. Security is another. For example, what if data is collected from an application or servers that are Internet-facing, and while monitoring these servers, the analytics determines there has been a breach to a web server? The automation then automatically disables write access to that web server, powers the server off, moves that virtual machine to some kind of secure forensics area, and finally, deploys a new web server to replace the one compromised.
Is anyone out there using analytics automation in their environment today? If so, how and what are you doing? If you’re not doing this now, there is a good chance that you will in the future. Big data and analytics have been rapidly growing areas in information technology. I think we might start to see a lot more intelligence in different products and services in 2016. One of the companies I have my eye on for the year ahead, Primary Data, is an example of a business that is putting the logic and intelligence into the product. Goodbye 2015, and here we go, 2016!