Flash Acceleration Gains Intelligence

At VMworld last week walking the solution exchange, apart from getting very sore feet, I reintroduced myself to the near-line storage acceleration and flash-cache vendors: companies like Infinio and PernixData, and even HCI vendors like SimpliVity, Pivot3, etc.

Yes, all-flash arrays (AFAs) and hybrid arrays are interesting, but I firmly believe that performance should be at the compute layer, not the storage layer. AFAs and hybrids have their place and will become more dominant as flash price-to-density ratios drop to spinning rust levels. But performance is a compute thing. Accelerating the back-end storage will increase IOPS, but we all know that is not the be-all and end-all of storage performance. Bandwidth and latency are also key factors in storage performance, and sub-microsecond seek time is not going to do anything for performance if your iSCSI or Fibre Channel stack has too high a latency or your bandwidth is contended due to other issues. This is why I consider HCI vendors in the same breath as traditional storage flash accelerator vendors: they move the storage to the compute layer and remove the bandwidth and latency issues inherent with traditional arrays.

This is not a question of which technology is superior in its form or approach. Both have their benefits and both have the issues, and in all cases their use depends on the businesses requirements.

This article will investigate how acceleration is achieved. The first thing to note is that acceleration, in this case, is really caching. This is true with both nearside acceleration and HCI flash-based technologies. They are not actually speeding up disk access, but merely caching recently used and most-used blocks of storage locally in a flash-based SSD, PCI card, or even DRAM to reduce the actual calls required to the underlying spinning rust devices.

This is not a new concept. ZFS utilizes this concept to speed up its file system; it uses ZIL to speed writes and L2ARC for reads.

That said, there is one issue here that is missing. All this technology only looks at the block level. There is no real intelligence involved here. It is only looking at the data blocks for commonality. There is a new kid on the block (no pun intended). The companies I looked at whilst at VMworld are Cohesity (it was not displaying at the conference) and PrimaryIO. Both companies approach acceleration from a different viewpoint.

Instead of looking at the IO stream, they actually look at the machine itself. Cohesity actively looks at what files are being opened in a Windows machine and intelligently provides the exact size of storage to allow a file to be laid down in a contiguous manner. Let me explain this further:

In a normally operating Windows environment, the operating system will note that you want to save a file to its storage location and will proactively select a block of storage in which to lay down that file. This is the important point here: it selects a single block, and once that block is full, a second block is selected, and so on until the entire file is written to disk. Each of these block selections takes a single write IO. Each also increases the chances of a file being laid down to non-contiguous sectors on the drive, leading to further performance degradation. What Cohesity’s product does is note the size of the file and then select a section of the disk to enable the file to be laid down in a single write instruction. You could even say that it defrags your disk on the fly. Veterans may recognize this paradigm: it is an evolution of the old Diskeeper product (in fact, Cohesity actually is Diskeeper—it rebranded). So, Cohesity has serious longevity in the Windows performance space. In fact, one of the only issues I see here is that it does not have a Linux play as yet.

PrimaryIO, on the other hand, does. Although PrimaryIO pitches its acceleration as Application Performance Acceleration, it is very similar in its approach in that it investigates the actual processes going on in the virtual machine and caches those that are pertinent to the application.

I like this. It is not a clunky, and I can see that with its being focused and targeted, the nearside flash requirements will be reduced. Traditional LUN-based flash acceleration has no in-guest intelligence; it has no knowledge of what is going on inside a guest and what is actually important for the performance of the applications running on that guest.