Software-defined storage (SDS) within the container realm often ignores storage itself. In essence, the SDS platform assumes some chunk of storage is mapped to a container host. Then, it takes over from there. SDS for containers is the orchestration through which persistent storage is mapped to a container. This gives it a unique ability to provide a mount point the SDS layer can control. It also provides a unique view of the world. SDS for containers bypasses traditional storage yet provides for retention, replication, and erasure coding. These are just some of the features, but it does not care what storage is underneath the container host. This assumption could lead to issues down the road, but how does this work?
Every container host, whether physical or virtual, has some amount of storage. Storage is presented to these hosts via NFS, iSCSI, FC, or direct attach. The assumption Docker containers make is that all storage is transient. Nontransient storage is the exception in a stateless container environment. The need for persistent storage has driven many companies to create products for Docker. Docker also allows mounting parts of the underlying storage into a container. This creates persistent storage.
Portworx and others manage those mount points and add necessary functionality. The addition of data services improves on the protection of persistent data, persisting the data outside of the container host. As a participant in the Docker Tech Field Day Extra, I received a firsthand view of this class of storage.
In reality, Portworx and others that provide Docker persistent storage are just not about storage: they are about adding data services to any storage presented to the container host. Currently, these data services are about replication, erasure coding, etc.—elements of storage that are ideally pure software plays. These data services are incredibly important in a secure hybrid cloud.
What many folks did not seem to realize at DockerCon is that data movement is crucial to the secure hybrid cloud. While presenting storage persistently to containers is great, if the data does not exist wherever my container lands, it just will not work. The application needs data. Moving data between clouds becomes crucial. These tools solve most of those issues.
They do this by ignoring storage completely. However, performance often requires us to worry about the storage layer. Tools that pick up storage analytics such as IOPS, failure rates, and latency allow the presentation of the required storage to containers, yet Portworx and others just ignore this data. This leads to underperforming containers, finger pointing, and confusion. A developer has a goal for their storage; anything that persists storage to containers should also need to worry about storage goals.
The future of storage is not replication and erasure coding; it is about analytics and goals. Our goals need to fit within a budget. This moves storage and the business closer together. As we piece together storage across a hybrid cloud, we use different types of storage that provide different performance goals. They also provide different costs and scaling mechanisms. How do Portworx and others bring this to my attention?
The cloud is not necessarily inexpensive. This implies cost is an issue even for the largest enterprises. Storage is a large cost. It is a business cost. This is why copy data management is very important. This is also why I need to know the goals of the application. The cost of doing business changes when I use clouds or even containers.
We seem to be at the beginning of storage for containers. Why did they not learn from what we have learned with virtual machines? We are already solving those issues. Containers need tools like Portworx, but they also need more advanced tools to crack down on cost.
Where are you on your container management? Where are you with storage? How do you present your existing storage into containers? What data services are a must for containers? Do you think storage is agnostic? What are your objectives with storage?