Support in the 21st Century. What Works? What Doesn’t?

In my last article, I laid out the baseline expectations for the support model and structure at most companies. In the past twenty years or so, these have been my observations and expectations any time I have started any new assignment in a new company. When starting a new position in a new company, there is a certain level of comfort, which comes from experience, in having at least a basic understanding of what to expect. There is always a technical learning curve that comes with anything new, but it makes the transition easier having a basic understanding of how things will be supported. That concept has served me well over the years. However, just as virtualization and cloud computing have changed the data center landscape, I think change in the support structure is well on its way. Now would be a good time to have a look at what works and what doesn’t.

In my baseline analysis, I established that IT support structures are made up of different support silos and that the number of silos is a direct reflection of the size and scope of the corporate IT department as a whole. In the smallest of companies, the IT department may consist of only a couple of people. In that environment, I would presume that the IT department would be based on a single-silo support model. On the flip side of that coin, larger IT departments are made up of many different support silos, which are usually based on the different technologies used. For example, there could be a silo for servers and another for workstations. There could very well be several sub-silos for Windows and another silo for Linux. Before you know it, you have a long list of different silos, and with that, different teams for each.

I am going to assume that most of you who are reading this article have heard the term “virtual sprawl.” Considering the speed and ease that virtualization brings with the deployment of virtual instances, it is very easy to imagine uncontrollable growth in the number of virtual instances. Following this same thought process, I propose that the same concept can be applied to what I like to call “silo sprawl.” To expand on that, at least in the larger companies, there are so many different silos and sub-silos that it can be difficult to understand where the lines are and where responsibilities are established. As a result, the speed and agility that virtualization brought to the data center is quickly negated, what with all the different support teams and the number of different requests that have to be made to get even the most basic operational needs met and incidents resolved.

Let me give an example of this in what could be considered one of the most extreme cases, as we take a look at how different standard operational tasks could be implemented. The server build process is a great example to use to make my point. A request for a new virtual server build is submitted, and the first task is for one silo to find and reserve the next IP address and server name that are available. Once that is done, the task is closed and the request updated with this information. The next team uses that information to deploy a server from a template, using the server name and IP address provided in the previous task. Once the cloning and configuration is complete, that task is closed and the request updated. The next task might be to install the applications and tools that are needed, based on the purpose of the request. The tools and applications teams then continue in the process. Then you could add the security team, the backup team—the list could really go on and on. What are the different steps and processes used in your environment? From that, I believe you get my point.

Each team usually has a service level agreement (SLA) that comes with each of these different tasks. When it is all said and done and the task is complete, how many days did that take? What if something goes wrong along the way? Perhaps the team that builds and maintains the virtual images updates the virtual hardware and tools to the latest version for the host they are stored on, but the server build request has the server being deployed to a cluster that is not on the same version level, and as such, the image cannot not be deployed. The build process must then be put on hold as another request is made for the image team to correct this, and this request has an SLA of, say, twenty-four or forty-eight hours. Request after request after request: how in the world can things get done efficiently? The truth is, they get done eventually. However, there is nothing about that process, in my opinion, that is anything close to being efficient. To add to that inefficiency, most silos and teams are measured by the number of requests and the time it takes to resolve those requests. If someone on another team tries to be more efficient by taking care of the issue themselves, this opens up a can of worms. Without receiving requests, how could the first team justify its existence? When you get to that point, you know you have achieved silo sprawl.

In closing, I would like to make the point that too many silos are like having too many chefs in a kitchen. It doesn’t work well and is something that needs to be reeled in and controlled. What makes this difficult is that when evaluating the way silos and support teams are established, there is usually a review of the number of people who might be needed going forward. This is especially true for companies that utilize managed services for their environment. The managed service team is looking to justify the headcounts to meet the required SLAs, and the company that the managed service team is supporting is looking for ways to do more with less. This presents conflicting agendas between the company and the service teams that will not lead to any real agility or efficiency going forward. Unfortunately for service teams that are looking to justify the current headcount, the future is going to mean automation being developed and applied to virtual environments as these environments become more like a cloud and less like a virtual infrastructure. Automation is the next topic in the continuation of these discussions, as we look to exchange what doesn’t work with what will work and examine the changes needed to get there.