It’s 2015, but you would think it was 1995 based on what we’re still using in our data centers for enterprise storage. We still have gobs and gobs of spinning disks, sucking power and boring us to death while they find our data. Convergence is largely unconverged—we still have separate Fibre Channel and IP data networks, and the only things that got converged were our bills of materials and the sides of our wallets. And for some inexplicable reason, we’re still debating how and when to use flash.
Here are six of the most important things that you—if you decide to purchase a modern, twenty-first-century disk array this year—should absolutely insist on and score vendors on in your upcoming RFP process.
Quality of Service
Most enterprises, as they’ve centralized on enterprise storage arrays, have encountered a form of the “noisy neighbor” problem. You know, the problem where your development workloads starve your production workloads for I/O capacity from time to time. We’ve all run into this. The almost-universal fix is to segregate our production workloads onto specific groups of disks, and perhaps onto their own front-end ports, too.
But how is that approach any better than direct-attached storage? It’s just as hard to manage, in that you still have to have a crystal ball to predict capacity and performance needs. Most of the array-level features you have access to, like snapshots, can be done locally. The SAN between your storage and your compute just drives up your latency, complexity, and costs. You can fix latency by adding local SSD cache, but that drives up complexity and costs, too. And with ever-increasing drive sizes (8 TB) and I/O capabilities on flash and RAM (50,000+ IOPS per device), we can’t afford to dedicate whole drives to individual workloads anymore, or to dedicate staff to manage all this complexity.
What do we do about this? We buy arrays that enforce quality of service at the hardware level. And not just I/O limiting—that isn’t QoS. We want QoS that is configurable and flexible, done on a per-allocation (per-LUN) basis, and has options to always cap workloads or let them run uninhibited when there’s no contention. A great example of a vendor leading with this feature is SolidFire, which has wonderful QoS baked natively into its all-flash arrays.
Flash as Primary Data Storage
Flash and SSD technologies have come down enough in price that we can stop using them as array-level caches and start affording to put them in as primary data storage. Caches complicate I/O and performance tuning and can be dangerous in emergency situations. They are best when implemented as close to the workload as possible (which is at the host, not the array). When data is written to a cache, it will always need to be rewritten again into its permanent home. Why not just write it into its permanent home to start with? The best I/O is the I/O you don’t have to do, especially with limited write cycles on flash memory.
A good example of a vendor leading here is Dell, with its Dell Compellent models. When Dell lowered the pricing on flash to be consistent with the pricing for 15,000-RPM disks, it changed the economics of the whole industry. Compellent Storage Centers don’t have very large write caches, precisely because of the benefits of writing straight to primary storage to start with.
Array-Wide Deduplication and Copy-on-Write Clones and Snapshots
You’re probably familiar with deduplication, or the process of removing duplicate blocks from stored data. Or maybe you aren’t, because your arrays don’t support it. It is surprising to me how many mainstream arrays don’t deduplicate. Vendors that don’t support it spout all manner of FUD when asked about it. “You don’t want that,” they’ll say. “It’s risky.” Or perhaps, “it hurts performance.” The truth is that in 2015, you do want it, or at least the option to have it, because it drastically saves storage space and write cycles on all that new flash storage you’re going to buy. Yes, there are management techniques you need to use to make sure you’re not wildly overcommitted, but these techniques aren’t new. Just talk to anybody who’s run a NetApp array in the last decade. Don’t listen to a vendor sales rep tell you about another vendor’s product line.
You also want copy-on-write clones, which make clones and snapshots not by copying all your data, but by checkpointing the way the storage looked at a particular time. It saves a great deal of disk space and time. Changes to the storage then get written somewhere else, so it preserves the way that checkpoint looks. In 2015, snapshots and clones should be instantaneous, resulting in no data movement or copying whatsoever.
Many of the newer storage vendors use these techniques. A great example to start with is the Tintri VMstore.
There are three more things you absolutely need to insist on in your new storage array, which will be covered in part 2. Stay tuned!