SSD options for Virtual (and Physical) Environments, Part II: The call to duty, SSD endurance

By Greg Schulz, Server and StorageIO @storageio

Part I of this series provided the basics around nand flash SSD. Let’s continue by taking a look at endurance for storing and retaining data using nand flash SSD. The importance of the first part of this series is to understand the basics of nand flash based SSD in order to make informed decisions on what is the best for your virtual or physical environment. In addition to SLC (high cost, improved duty cycles) and MLC (higher capacity, lower cost), there is also EMLC or Enterprise MLC which is striving for a balance between SLC and MLC characteristics.

How manufactures implement their flash translation layer (FTL) along with associated controllers and firmware has a bearing on the endurance of different flash media. Hence not all implementations of SLC, MLC or EMLC are the same even if they leverage common dies (chip packages). This is also where vendors can add values with their controllers and drivers or software tools. Some metrics along with terms involved with gauging nand flash SSD durability include total bytes written (TBW)  and the number of program erase cycles (P/E cycles). TBW represents the total number of bytes that can be written to a nand flash SSD during its lifecycles. P/E cycles are the number of times that memory cells can be modified before deteriorate. The Joint Electron Devices Engineering Council (JEDEC) has established standards (JESD218A and JESD219 ) for measuring nand flash SSD endurance (www.jedec.org).

If your applications do more writing vs. reading, being aware of SSD endurance, duty cycles and techniques such as wear leveling become important to protect your data as well as maximize your investment. In addition to there being different types of nand flash media or what are known as dies (e.g. the actual silicon) that are packaged and combined with a low level controller, there are also various ways they are put together for various markets.

Storage optimization: performance (time) vs. capacity (space)

When compared strictly on a cost per gigabyte or terabyte basis, HDDs are cheaper. However, if compared on the ability to process I/Os and the number of HDDs, interfaces, controllers, and enclosures necessary to achieve the same level of IOPS or bandwidth or transaction or useful work, then SSDs should be more cost effective. The downside to DRAM compared to HDDs on a capacity basis is that electrical power is needed to preserve data. Some early generation SSDs were based on DRAM combined HDDs for persistence with battery backed power supplies to enable memory to be written to disk when powered off. Today the most common form of SSD is based on NAND flash in a variety of different packaging and solutions.

A point to consider with flash memories is that their performance, particularly on writes, while faster than HDDs, is not as good as DRAM-based memories. Another concern with flash based SSDs has been that of duty cycle, or cells wearing out over time. With current generations of enterprise-class flash, these duty cycles are much higher than consumer flash products. In addition to newer generations of flash having longer duty cycles, storage system and controller vendors have also been optimizing their solutions to reduce wear or decreased lifespan.

In an attempt to reduce the excess storage capacity, consolidation is sometimes done without an eye on performance, looking only at the floor space, power, and cooling benefits of highly utilized storage. Then, to address storage performance bottlenecks, the storage is reallocated across more storage systems, and the cycle starts again.  The left side of Figure 1 shows 16 HDDs attached to a storage system or controller configured to meet an application’s performance requirement of at least 3600 IOPS. In this example, the available performance may not be enough if controller optimization or caching is not providing an additional benefit.

A by-product of the configuration shown in Figure 1 is underutilized storage capacity and missing quality-of-service SLAs. As a solution, I/O consolidation is shown on the right side of Figure 1. I/O consolidation involves using a high performance storage device, such as an SSD, capable of exceeding current IOPS and QoS requirements. The benefits, in addition to meeting QoS or performance requirements are less wasted space (capacity as well as power, cooling, and physical footprint) as well as reduced complexity and cost.

Example of storage performance optimization (IO consolidation)
Figure 1: Example of storage performance optimization (IO consolidation)

Figure 2, while similar to Figure 1, is focused on consolidating storage space capacity instead of performance. In Figure 1 the focus is on consolidating I/O or performance where there is a relatively small amount of data onto lower capacity, high performance devices while considering cost per IOP or IOP per watt of energy. In Figure 2 the goal is consolidating storage devices, such as many smaller capacity HDDs, where there is unused capacity and low performance requirements, onto fewer, high capacity 2-TB (or larger) SAS and SATA HDDs.

Example of storage space optimization (capacity consolidation)
Figure 2: Example of storage space optimization (capacity consolidation)

Read more in part III of this series (SSD options for Virtual (and Physical) Environments