As we wind our way deeper into the holiday season, I am reminded that all it takes is one IF clause to ruin your day. Or even to ruin your holiday, if a change was made during a continuous integration cycle. Just because we can speed up and automate tasks does not mean we always should. A case in point is the work I do writing a high-performance computing package—not Hadoop, but one that was developed by several folks for a specific bit of work that occurs billions of times a day. A fully threaded application that is massive in scale for this company, it is the fastest package in the company’s industry, and this customers lifeblood.
An IF clause may seem innocuous, but when it is run a billion times a day, a million times a minute, and more than 16,000 times a second across an entire cluster of hosts, a bad idea can multiply to phenomenal levels. Loads on hosts can increase by sixty times in a heartbeat, which would set off red flags in most performance management metrics. This is why most companies only run continuous integration (CI) on a subset of the environment, not the entire environment. A live test is almost required to see if new code can handle the required volume.
But so is understanding the code—knowing how different parts of the code interact with the underlying system, regardless of whether that system is virtual or physical. Knowing how the code interacts all the way down to the hardware can help you make better decisions when you code. How instructions are issued and pipelined through CPUs (or GPUs) can mean the difference between an IF statement that does a straight “equal to” comparison versus one that does a “greater than and less than” comparison. How each of these fetch data from memory changes the performance of the entire application.
Changing one IF clause to a simpler form, one that is better understood by the system and better optimized by the compiler (or byte-code encoder), is the difference between misery and happiness—between joy and depression.
Testing Is the Difference as Well
The best test is “twelve inches to the foot.” In other words, you need to test with the exact same setup you run in production. However, some code requires you to test in only a part of production, as simulating the full quantity of traffic may not be possible. Telcos and other high-volume sites have a need to test against their volume. This is why you deploy to a portion of your network only after you have run regression and unit tests and the like. If you do not have a full understanding of what is running in production, it is hard to determine what is necessary for regression and unit tests. Testing code in a way in which it is not called or accessed normally is great for security testing (and should be done), but for functional testing, you may wish to test the functions and behavior through the entire application, not just a portion of it.
In working with complex high-performance computing systems, it is important to unit test to ensure you have implemented the functionality correctly, but you also need to do a full test from beginning to end to ensure the output is correct. Then, you should also run a self-made regression test to see if things break. This regression test should pull in all issues you have found in the past and test to ensure they do not occur again. Once that is done, it is time to load test.
Fully understanding the performance of the application is the first step, as you need a metric to go by to understand the impact of a code change on the application. This allows you to understand the impact of one IF clause on the code. Testing requires you to understand your application so that you can then understand the impacts of subtle changes in code. It allows you to think through why those changes could impact your code and to make corrections as needed.
A Bit of Whimsy
Understanding the impact of an IF clause is like understanding the Santa Clause: always read the fine print. In this case, the fine print is the code itself and the impact it has on the system. Plan to ensure all code is well tested before major holidays, and do not put such code into production before those holidays. In this case, while we are load testing the code on a subset of the network, we are taking no chances of letting our IF clause interfere with our holiday. There will be no impact on Santa, so the Santa Clause will not be in effect, and we can enjoy our holiday as expected.
This is crucial to good planning and absolutely vital as we move further along the rapid deployment that agile grants us. Keeping up good practices or, more to the point, imposing good practices on business decisions can also keep you from invoking a detrimental IF clause that would cause the Santa Clause to come into effect, requiring all hands on deck over Christmas or any holiday. This is not an outcome anyone wants.
As a developer, have you asked if that deployment is absolutely crucial just before a holiday break? As a business manager, have you asked the same thing? Do you invoke your own Santa Clause, or were you Santa Claus, spreading joy to your family?