DevOps and Bottlenecks - TVP Strategy

One of the main goals of DevOps is to streamline the software development lifecycle (SDLC) by removing waste from the system. Waste is often found in the form of bottlenecks, things within the system the slow down forward progress and introduce unnecessary wait time or tasks. This waste can be caused by inefficient processes, technology issues, and organizational or people issues. Successful companies are able to look at the entire value stream to identify the waste and then systematically work on reducing that waste from the SDLC to continuously improve, resulting in better speed to market, improved quality, and higher reliability. Companies the can continuously improve in this fashion become high performing companies which often results in improved customer satisfaction, better productivity, and improved financial results. This is the ultimate dream of the C-level types who are looking to transform their companies with DevOps.

The problem I see in many organizations is that this dream that I described above is not shared across all people within the organization. When DevOps is not driven from the top with that shared vision, everyone winds up with their own vision of what the future state looks like. When people are all marching to a different drumbeat, bottlenecks tend to get shifted instead of removed. In the Goldratts’s classic book “The Goal”, we learned that if a company does not focus on the top bottleneck first, it does not improve productivity. Instead, it shifts the bottleneck to someplace else in the system. Too often people equate DevOps to automation which narrows their vision of the entire system. They systematically start automating systems that are not optimized for performance. In reality, they are simply automating waste.

Shifting bottlenecks

Here are some common examples of how companies unintentionally shift bottlenecks.

1) DevOps silo – In this scenario, a company forms a new silo called DevOps which is usually made up of only just sys admin types. They start automating everything with configuration management tools like Chef, but do it without analyzing the entire value stream. Often, they do not collaborate enough or at all with the developers and often constrain the development process. All of the sudden the company has 100,000 lines of Chef code and thngs that should not be done in configuration management are now being done by the “DevOps team”. This creates unnecessary handoffs between development and the “DevOps team” and creates a new bottleneck.

2) Little picture thinking – As opposed to big picture thinking, teams tend to only solve problems within their control. In this case, there may be some huge bottlenecks that are impacting a team. Instead of working across silos or trying to break down the silos to fix these problems, they just work around them because it is easier. What happens here is that they build new wasteful processes that give them temporary relief to this issue, but creates new waste by adding unnecessary processes into the value stream. These processes often live on for years and years and nobody knows why, but the legacy lives on. One common example is long provisioning times. In many organizations, provisioning is performed by a separate group often totally disconnected from the development team. I have seen provisioning times range from days to 6 months. Instead of addressing the long provisioning times, teams work around it. I have seen where a company orders infrastructure 6 months in advance to try to accommodate for the lead time it takes to get a system provision. That is a classic example of creating new bottlenecks instead of fixing the primary bottleneck.

3) Full stack engineering – There is this myth that DevOps means that you can hire a team of full stack engineers who can do it all: coding, networking, security, operations, etc. The ease of use of public clouds have helped fuel this thinking. A talented engineer can do parts of all of it, but hardly anyone on this planet is an expert in all of those areas. What happens is this team of full stack engineers starts delivering code at rapid speed to production. At first this looks like nirvana because they are able to quickly take the business’s requirements and deploy new features into production. The reality is the team is introducing new security vulnerabilities with each release. The network architecture is suboptimal which introduces technology bottlenecks into the production system. Users are calling developers directly because they no longer have a help desk to call. All of the sudden, development comes to a screeching halt as the full stack engineers resort to fire fighting. Lesson learned, we still need all of these roles to be filled by the experts. The real bottleneck is often the collaboration and processes between these groups. Fix that, don’t remove the experts.

4) Lack of Metrics – Successful DevOps initiatives thrive on metrics to point to the next high value bottleneck. If you are not armed with data, it is likely that the biggest bottlenecks are not being addressed. In order to have the right metrics that drive the optimal behavior, you must first eliminate the previous three issues listed above. The “DevOps team” will only consider metrics that address their silo. Little picture thinking only looks at metrics in your sphere of control. Full stack engineering lacks the feedback from the appropriate peer groups. What is needed is a systems thinking approach to metrics gathering. What are the key metrics for the entire organization that will drive the appropriate behaviors to optimize the SDLC? How can we provide feedback early in the SDLC? Metrics are not just for developers or operators. What does the product owner need to know so that he or she can balance the right mix of features so that the product is not only fast to market, but also is reliable? What does the audit and compliance team need? Security? What metrics can we put in place to automate decision-making processes so we no longer need all of these manual review gates? Metrics collection and reporting are critical for enabling activities that lead to continuous improvement.

Summary

DevOps is hard. To achieve the ultimate dream of becoming a high-performance company, we must look at the system and the organization as a whole. Fixing bottlenecks in our own little silos is nice, but if the major bottlenecks within the entire SDLC are not being addressed, you are just moving bottlenecks from one place to the next. To become a high performing company, change must be driven from the top and everyone must have a shared vision of what DevOps is and what the desired future state is. Not all companies want to sign up for a transformation like this. If a company just wants to optimize things within a silo, that’s fine. Not every DevOps initiative needs to be an enterprise-wide transformation. But if leadership wants to transform the company, we must stop treating bottlenecks like hot potatoes and move them from one place to the next.