Software-Defined Storage or Data Services

Software-defined storage (SDS) is about data services. Many think it is about automating storage. Yes, I can see that, but it is about what storage can deliver. So, what is the basis for SDS? There are four critical components. These components are analytics, augmentation, aggregation, and security. These four elements wrap storage to become data services. Data services and control thereof are therefore the key components of SDS. What data services can SDS provide that do not already exist? Is it just enough to add deduplication, or is more necessary? Let us look at these data services in detail.

Before we delve into services, we need to define our components more.

Augmentation is the addition of data services and features to existing storage that may not already have those services. In essence, if deduplication, for example, is not available, then we can augment the storage by adding it in the data path. We could also add data path optimization and other services.
Aggregation is the combining of other storage services into one control plane. It can also combine unlike storage into a larger storage pseudo device. Aggregation ingests various storage protocols (iSCSI, NFS, FC, SMB, Memory, Object, SCSI) and presents as another storage protocol (usually iSCSI, NFS, or SMB).
Analytics is more than just determining the best cache for data to use, but also looking at the underlying storage capabilities and matching them to the goal set by an application. Such a goal could be latency, IOPs, encryption, or pretty much anything storage can do these days. It could include other data services.
Lastly we have security: security, which these days is much more than just encryption. We extend security to include layered role-based access controls, namespace control, encryption in motion, encryption at rest, and use of a key manager, as well as data tokenization and investigation.

Now that we have defined our terms, what data services could SDS provide?:

Data Resiliency, or how to protect your data using synchronous and asynchronous replication, recovery points, and recovery from backup.
Data Path Optimization, or which is the best way to reach data from the application? This includes adjusting the data path on the fly based on changes based on various network and storage failures.
Data Locality, or how to move data closer to the application. Unlike tiering, which moves data between slower and faster storage, data locality is about moving data closer to the application to reduce network and other forms of latency while increasing IOPs.
Data Migration, or how to move data between locations.
Data Tokenization, or how to protect PII, PHI, and PCI data via tokenization, redaction, or encryption.
Data Investigation is analytics around the type of data in use to ensure that the proper goals are met while protecting, reporting, or applying other data services: in essence, letting the data define the service, not the service define the data.

In order to do all these items, we need to consider what is out there today. At the moment, we have many storage solutions that augment and aggregate, yet there are not many that perform analytics or provide advanced forms of data services. We need to move these past deduplication and encryption to where the data determines the services along with the application. We have spoken many times on this site about being application-centric. Our data services provided by SDS must also be application-aware.
Policy and services can be attached to data, not data attached to a policy due to the type of application only. We need a good mix between policy and services.
Can modern SDS provide any or all of these features? Not yet, but it is abundantly clear that some are striving to achieve this goal. If one looks at ioFABRIC, Primary Data, and Hedvig, we see the start of data services, analytics, and some level of security being implemented. If we look at HPE, DataCore, and FalconStor, we see more about augmentation, aggregation, and other traditional mechanisms. There are few data services available outside of resiliency from using clusters.
Where will our storage lead us? What do you wish to see as a part of SDS?