A bane of having data is the need to know: the need to know where all your sensitive data resides, what that data is, who has accessed it, and how it was accessed. Managing the who, what, where, why, and how of data is a struggle that’s as old as time. Scale changes this struggle. We continue our scale discussion on the Virtualization and Cloud Security podcast by delving into data management. Paula Long, CEO and cofounder of DataGravity, joins us to discuss data management at scale. How do we answer these questions?

The how, where, what, when, and why of data management has been tricky from the beginning. It relies on the often underutilized concept of data classification. To protect our data, first we need to know which data to protect. Data classification helps us determine what we need to protect, but it’s far from easy. It is even harder at scale. Most people would balk at classifying a petabyte of data, not to mention data that is spread across several clouds and devices. Even if we did classify that much data, our data still moves around. Our data moves from site to site, from cloud to cloud, and between devices. Our data may travel over improper networks and into other jurisdictional controlled regions. The data is not lost, per se, it’s just not where we thought it was anymore. Compliance says we need to know where our data is and how it is protected.
To meet compliance requirements, there is a growing mentality of “encrypt everything.” But even if you encrypt everything, mistakes still get made. Data still goes where it shouldn’t. So, you might claim the answer is to encrypt everything at rest. Sure, that works, but it does not handle data in flight. Once the data is in use, encryption no longer applies. Encryption at rest is purely for handling disk destruction and lost disks or tapes. We all agree that encryption at rest is crucial, but it is not a replacement for solid knowledge of which kinds of sensitive data reside where.
Sensitive data has a variety of definitions, many of them falling into the realm of regulatory compliance, such as PCI, HIPAA, etc. Sensitive data can also include any kinds of information that are proprietary and crucial to your business, such as designs, architectures, bids, parts, and source code, among others. My first actionable advice on the podcast is to discover what your organization considers to be its sensitive information. This requires talking to the business side of your organization. Granted, this is also the job of the CISO; you may have to figure it out by discussing it with the business owners. IT security, operations, development, and administration all need to have this discussion. They all need to be on the same page. When they aren’t, data sensitivity tends to default to regulatory compliance requirements, which is often not enough.
Once you agree on which data is sensitive, and we know this is a difficult process, you can proceed to find it. You can use various tools, such as a copy data management platform, DataGravity, or even data loss prevention tools. The key is to have a well-known set of queries prepared to use. This isn’t easy, and it is why tools exist. A good tool will:

  • Tell you where your data resides by type of data (PCI, HIPAA, intellectual property, etc.)
  • Tell you when your data was created, moved, copied, deleted, and last accessed
  • Tell you how your data was copied, created, deleted, moved, opened, and modified
  • Tell you what was deleted, moved, opened, copied, etc. within the data and give you a head start on finding out why data was modified or deleted; this requires deeper event integration, such as the when of a new contract, etc.

Closing Thoughts

Scale changes how we do things. Scale changes how we think about data. Scale does not change what the definition of sensitive data is. The business still needs to be involved to provide definitions outside of regulatory compliance. Each group, from legal to IT, has its own definitions of data sensitivity. The where, what, when, how, and why of data (the five W’s) are crucial to understand. Once we have an understanding, we have a better chance of securing the data. We also have a better chance of meeting our compliance goals. This is another opportunity for security to get involved—not by draconian approaches, but through approaches that the rest of IT needs or already uses on a daily basis. Lightweight approaches are a must: this is where copy data and data management platforms really shine.