There has been plenty of discussion about what the Internet of Things (IoT) means for IT and for storage vendors. The usual answer is that IoT will soon be the largest consumer of storage. Our basic expectation is that IoT will feed a lot of object storage in the cloud or in central corporate data centers. Personally, I am deeply suspicious when I’m told that there is only one way to solve an IT problem. We are starting to see that IoT data is often processed near the IoT device, and only a subset of data is transferred to the central object store. I think that this will drive a lot of compute and storage to the edge of the network, close to the IoT devices. I think we will see a whole new category of products that will be deployed close to IoT sensors.
I wrote a while ago about the cost of storing your IoT data, but data storage is not the only cost. There is a cost to moving data from your IoT devices to the cloud, and there is a cost for processing that data, too. If the data doesn’t provide enough value, then why centralize it? You can think of IoT data as being like the ore that is extracted from a gold mine. Most truckloads of ore have a tiny amount of gold in a huge amount of ore. The ore is refined to an impure form of gold at the mine before being shipped away for more refining. The value of a kilogram of gold is far higher than the value of a kilogram of ore, so the ore should not be transported far before being refined. IoT data has a low value per MB of data, but some analytics will refine that value into a far smaller volume of data. Some local storage and processing will reduce the amount of data to be transferred and stored in the central object store.
Businesses are starting to ask for a way to process IoT data close to where it is generated. Locations such as wind farms, oil rigs, and even manufacturing plants have hundreds, or thousands, of sensors. All these sensors are spewing out data that needs to be collected and analyzed. The data is not refined enough to be worth sending to the cloud and analyzing there. It needs to be captured and analyzed right where it is generated. Then, the valuable parts can be sent to the cloud, where the final value can be extracted. Slow and expensive networks drive the need for local storage and processing. In some environments, all the data must be retained.
AWS has certainly been asked for local storage and compute for IoT data. Snowball Edge is a physical appliance with an object store and compute engine. Snowball Edge is designed for permanent deployment to customer sites. The idea of physical AWS resources remaining at customer sites is completely new and very foreign to AWS. Each Snowball Edge delivers 100 TB of S3 object storage plus a local AWS Lambda service for the compute part. The Lambda functions do the local analytics and then send the refined data up to the AWS cloud. Lambda functions suit IoT uses; the code is run when the IoT sensor sends data. The Lambda service is highly parallel to handle data from many IoT sensors.
I fully expect to see object storage and hyperconverged vendors getting in on this sector. Object storage vendors simply need to add a compute engine to their products. This could be something like IBM’s OpenWhisk, which provides a service similar to AWS Lambda. Alternatively, Kubernetes and Docker support would provide a very flexible compute environment alongside their storage. Object storage vendors could even bake an entire PaaS, like Cloud Foundry or RedHat OpenShift, into their products. Hyperconverged vendors would need to strip features out, probably removing the ability to run general-purpose operating systems, like Windows. HCI vendors could even add container support to their storage appliances and then run the appliance directly on the physical hardware, without a hypervisor. There are already HCI products that are just for running containers. I saw Robin Systems at Tech Field Day 13. Its software-only HCI brings the simplification of an HCI product. It delivers a container platform on whatever physical servers you already have.
Data is not uniformly valuable; there is definitely a use for refining IoT data before it is transferred to the cloud. A whole new category of scale-out infrastructure will evolve to fill this niche. Both object storage and HCI vendors are well placed to adapt their existing products to do so.