Knowing where your Data is: Backup Security

On the second Virtualization Security Podcast of 2011, we had Doug Hazelman of Veeam as our guest panelist to discuss backup security. Since most of backup security relies on the underlying storage security, we did not discuss this aspect very much other than to state that the state of the art is still to encrypt data at rest and in motion. What we did discuss is how to determine where your data has been within the virtual or cloud environment. This all important fact is important if you need to know what disks or devices touched your data which is an auditing requirement for high security locations. So we can take from this podcast several GRC and Confidentiality, Integrity, and Availability elements:

  • Backup Integrity and Confidentiality State of the Art is Encryption of Data at Rest which is in many cases handled by the underlying storage security.
  • Virtualization Backup tools can only track where data has been based on what it sees. Since data is contained within virtual disks generally, the Hypervisor is responsible for tracking a virtual disk’s location.
  • Availability of data once backed up is in flux, and per vendor. Some vendor’s such as Veeam, provide the data via an NFS share mechanism while others require you to restore the image first and still others have file level restore capability built in for specific operating systems.

So why is GRC so important within the backup space within the virtual environment? It is because virtual environment and cloud backups are performed quite a bit differently than the traditional mechanisms. In traditional backups the steps are usually:

  1. Via an Agent within the Target OS pull data across to a backup server
  2. Eventually place the data on tape

So generally at all times we know where the data has been. It was on the target server, the backup server, and then finally on a well labeled tape device.
These steps change when we introduce backup to the virtual environment and in fact introduced with such products as EMC FAST, EMC GDA, and HP RISS. That is the proliferation of data across many disks within the environment. This proliferation could cause a massive compliance problem or an auditing headache for those who need to know exactly where data has ever resided. This is often the case when disks need to be destroyed due to high security measures. The steps translate now to:

  1. Via vStorage of some other API transfer the data to another datastore within the virtual environment
  2. Copy the data from that Datastore to your backup server
  3. Eventually place the data on tape.

Here we know the data has been placed on a datastore somewhere within the virtual environment, then to a backup server, and eventually to tape. In addition, the data has been ‘seen’ by the virtualization backup virtual appliance. However, what complexes this even more is the introduction of Storage vMotion and cold migrations which allow me to move data from one datastore to another at will.
The audit trail of knowing where your data resides within the virtual environment will also help with digital forensics if this is ever performed as we all know that when a virtual disk moves from LUN to LUN, the data is copied from one LUN to another but not fully removed from the original LUN. This implies that the data could be recovered until it is overwritten. There is nothing a backup tool can do to change this, the hypervisor must implement secure deletes of data on such moves. However, since this data lives on the original LUN and could persist if the LUN does not change by very much, we need to know it was once on that LUN so that we can either investigate the underlying disks, or if necessary destroy those disks (in the case of getting rid of data completely).
Auditing has not caught up with this all important aspect of understanding where your data has been, but the hypervisor vendors need to be proactive and allow us to get this information for investigative and other compliance requirements. We have security covered as best we can with the state of the art, but now we need to improve GRC with respect to virtualization backups.