The Data Dilemma: Part 1

A growing data dilemma is brewing. Businesses are caught between the need to have data available to improve the business and the need to keep that data private—and not only private, but secure, nonvolatile, protected, and available. At the same time, our data is neither invulnerable, nor, apparently, private. Organizations that collect data cannot keep that data safe, hence the dilemma. “Safe” means many things, but for the first part of this series, we use the word to mean “at low risk of being damaged by malware, ransomware, or exfiltration by bad actors.” However, this is at odds with even collecting the data in the first place.

The bad actors of the world are not looking to destroy corporate data; they really want it for themselves or to make it impossible for you to use. Now, think about this: why are people subject to attack? There are usually a few reasons, and understanding those reasons leads us to the data dilemma.

Espionage (corporate, nation-state, etc.)
Money (ransomware, etc.)
Activism (cause, expose data)
Havoc (ransomware with no keys)

Depending on the reason for the attack, the bad actor may gain access to your data. Can you keep your data safe? At low risk? What is the solution?
This is the real question. Many folks will say that we need to encrypt our data end-to-end. However, laws are being passed that no longer allow end-to-end encryption for messages, etc. Most pieces of IoT data are passed via messages. Do these laws apply to that type of message as well? We do not know the answer. However, my guess is that any encrypted network, regardless of use, will be under fire. That issue aside, how can we make our data low risk?
One of the myriad Twitter statements that came out of DEF CON this year struck me as interesting and useful.

If you do not want to lose your data, do not collect it.

Collecting data implies storing it somewhere for long term. This does not apply to short-term use, say one day only. In many systems, there are vast quantities of transient data. Transient data that can be staggering in size. Data we do not collect. Many companies dealing with billions of transactions per day may choose not to store the transient data. Some companies have chosen not to keep data unrelated to financial data for more than its needed time frame. Yes, they throw out more data at the end of the day than they keep. These companies have decided they do not need it for the business at all. They do need the data for some aspect of the daily transactions, but they use microservices to do analysis as it comes in. They do this perhaps at most on an hourly schedule, then toss the data when done.
The ability to not collect data could be a savvy data management solution, as long as the data is:

not needed to recreate the business
unnecessary to recreate the financial information
not highly regulated
truly transient data

Could these companies do more with their data? Probably, but they have chosen not to do anything with it outside of their limited scope. They have chosen not to save, store, or otherwise be burdened with a large data management problem. It is a burden for some; storing data is not cheap. What they do store is sufficient. Should this be the goal of any organization? Perhaps. It truly depends on the application. However, since these companies do not store data for more than a day, they have limited their need for expensive storage systems and can concentrate on managing their data.
This is the crux of the matter. They do not worry about storage; they worry about their data. Since they have cut their data down to size, it has become quite manageable. Are we storing transient data—data we do not use, and data we may never use—in hopes that one day we can glean just one more iota of information out of it?
We are being inundated with data, and the surest way to protect data is not to collect it, but to manage it, either with short retention schedules or by making some business decisions. Business decisions are now important to make. We are not talking about storage subsystems, but data management systems that include the ability to retain data, dump data, transform our data (encrypt, redact, etc.) and apply policy.
This is our data dilemma: do we collect or manage? How?
Where are you on the data management path?