In all of life, we try to avoid the difficult things and handle the easy things first. Sometimes, leaving the hard things is a good idea. We sometimes realize there is an easy way to deal with the hard problem, or someone else deals with it. Sometimes it’s a bad idea. Leaving a sore tooth until it needs a root canal is a bad idea that causes lots of pain.
Generally, when you know for sure that you must do the hard stuff, it is a good idea to do it first. Like dismounting a server from a rack: one of the bolts is always much harder to reach than the others. Taking out the difficult bolt last, when you’re using one hand to stabilize the server, is a bad idea. There are a number of places in enterprise IT where we defer those hard things. Often, doing the hard things first is a good way to prevent a lot of pain. Software upgrades tend to be like a toothache: a slow pain that builds until we are forced to do something about the big problem. The longer we leave the pain, the bigger the disruption when we do need to do something about it. If we ignore an upgrade for a while, there will be an upgrade to the upgrade that we are ignoring. Eventually, the upgrade to the latest version isn’t supported from our current version. Or the upgrade is going to take so long that we need a large application outage. I’ve seen a similar thing happen when VDI environments use View Composer. The recompose process is so painful that it’s deferred for months and only happens once a year. In these environments, a recompose causes a noticeable drop in desktop productivity. When avoided for a while, what should be a routine activity becomes a huge operational challenge and business impact.
One of the principles of DevOps is to do these difficult things so often that they are routine. If you upgrade software every week or every day, then the changes in each upgrade are small. Small changes are less likely to cause big problems. In addition, small changes require far less testing than large changes and are simpler to roll back if they do cause problems. As you repeatedly make small changes, you get better at managing the changes. The same approach can be applied to a few other places. By committing to do the difficult thing often, we give ourselves a reason to get good at it. If we recompose the desktops every week, then we have to make sure it is a low-impact operation. By regularly applying software updates, we are always taking the most-tested upgrade path.
One of the overlooked aspects is the cultural impact of change. Humans are change-averse, or at least averse to large changes. But all of human life is a collection of small changes, and IT is no different. A collection of small application changes is easier to accept and adopt than a single major change. This is a little like boiling a frog. If the frog is placed in hot water, it will climb out of the pot. But if the frog is placed in cool water and heated over a fire, it will not notice the heat. In the same way, you can boil your users by continually making small changes until you have made a change they would not accept in one piece.
If it is so obvious that we should be making small, frequent changes, then why isn’t that what we are doing? Most enterprise IT organizations spend a lot of effort controlling change. They even have change control boards that meet every week. These boards exist due to the interconnected nature of enterprise IT. Changes in one area can impact systems in another. The natural reaction is then to slow change, holding back the small changes and accumulating them into much larger changes. Good change governance is important but can cause as many issues as it seeks to avoid. A good change management system does not impede change but does make change visible. Simply having a change diary or notification system can make changes more visible. This visibility makes troubleshooting change-related issues much simpler.
One factor is infrastructure designs that prioritize rapid deployment over operational ease. Much modern IT change is project based, with the project incentivised to deliver on time and under budget. This drives decisions that expedite the project even if they result in operational challenges later. As an example, a project may choose to reuse existing application or database servers for a new application. This might be quicker than requesting new servers and waiting for them to be deployed, but by putting multiple applications on the existing servers, we have made updating the servers and applications harder. A more operationally appealing choice would be to have separate servers for each application, allowing updates to be scheduled independently.
There are times when ignoring difficult tasks won’t lead to big problems. At other times, it is best to attack the hard things head on and make them into easy things. As you approach challenges in your infrastructure, consider whether you should speed up addressing the challenges to reduce their impacts.