Holding Out for a Hero Number

So, who needs a million IOPS? Or the ability to deploy a million containers? How about a VM with a terabyte of RAM? We are all fairly sure that very few organizations have a workload that actually needs these performance numbers. So why do vendors continue to publish ridiculous numbers? We call these hero numbers. Vendors spend a lot of money getting hero numbers and want people to look up to these heroes. A lot of the time we dismiss hero numbers as irrelevant, but sometimes they may actually be useful.

Hero numbers are a part of marketing for every product that can be quantified. The old disinfectant killed 99% of germs. A new disinfectant has to kill 99.9%. A hero number tries to make a purchasing decision one dimensional and easy. For people with only a shallow understanding of the differences between products, the hero number makes the decision easy. However, hero numbers don’t tell the whole story. Technology selection is not one dimensional, and hero numbers are not all created the same: some are less trustworthy than others.

The problem with hero numbers is that making a higher hero number can become the target of product development. Higher numbers should be the result of making the product better for customers, not an objective for product changes. Also, the hero number definition tends to get detached from how customers actually use the product. In storage, we see hero IOPS numbers based on an all read workload with tiny IO sizes and a working set that can fit in RAM cache on the storage. No real customer has a workload like this. But it does give great IOPS numbers. This hero number becomes a reflection of the storage network rather than any array. Or we get an average (median) latency number for an array that conceals the worst-case latency, which is a hundred times higher. A hero number that does not reflect production use of a product is no hero.

So, how can we be responsible about how we use hero numbers? First off, realize that the hero number is the start of the story, not the end. You need to know how the number was achieved. You also need to know what you need to achieve. What does your workload require? Oh dear, it’s not a one-dimensional decision. If the hero number uses a workload that reflects your applications, then it can be helpful. But you may find that the very public hero number is irrelevant to you, a false hero. Don’t despair, there may still be a real hero for you. Look for a reference architecture (RA) for an application that reflects your workload, maybe a database reference architecture that includes details of the database workload. Their RAs often have application performance metrics and resource utilization. This is even more useful than a lone hero number—a collection of better hero numbers that might translate into what your real applications could see. The RA is also a good place to look for platform tuning information. Just be careful: some RAs exist only to justify a false hero number; the RA workload needs to resemble your workload.

What can we do with hero numbers? Good hero numbers help you identify whether you need to worry. If a reliable hero number says that a platform can handle more than five times your requirements, then there should be little reason to worry. If the best hero number is only 50% above your requirements, then maybe you need to spend some time investigating. As an example, for a long time, VMware ESXi had a maximum virtual disk size of 2 TB. If I knew that my database size were going to grow to 1.5 TB in the three-year system life span, I would be concerned. Now that vSphere supports a 62 TB virtual disk, I don’t need to worry about that limit, assuming I have a recent vSphere release in use.

Hero numbers tend to make technology decisions appear one dimensional. As you know, technology decisions are more complex than that, and the final choice is always a compromise among a set of criteria. Good hero numbers will help you understand where to focus your investigation, and where you do not need to worry.