Data Protection is not just about backup these days, but instead concentrates on two all important concepts for a business: disaster recovery and business continuity. While backup is a part of Disaster Recovery, restoration is all important. If it is not possible to restore your data in a timely fashion the backup has failed. So technologies that allow us to access our data immediately provides a level of business continuity. But how is this achieved?What brings on this discussion? I recently suffered a critical storm of failures on an internal yet highly important system which lead to data corruption within an important database. Replacing the corrupted virtual machine was simple, but repairing the data was impossible for me to do. So instead I went to my backup to restore it from the previous day just to find out the backup failed. Actually, the backup failed for the last 6 months, yet I never new. The software never informed me of any failures either via email or big red warning labels somewhere within my monitoring software.
The ultimate solution for this problem was to send the data out to be repaired by the company that owned the software which took a few days and I was back and running. So during that time I concentrated on why the backup system failed and how to prevent such failures in the future.
The critical storm of errors were:
- A login to the VM hung-up bringing down the application.
- The solution was either to login remotely and free up the hung session or reboot the VM which caused the application to crash and corrupt the data.
- A review of the backup showed it worked exactly once when the backup software was first installed
- Failures were not emailed to the designated mail recipient
- Restoration was impossible as the data was too old.
So several issues need to be fixed. First and foremost the procedures for backup software needed to be updated, with the critical update being that physical verification of the backup sub-system needs to be employed. We can no longer trust that the backup software will react as it was designed or that a failure mode would not break such communication. Business Continuity suffered due to bad practices. So that is fix #1. The team-up between Acronis (for backup) and VMturbo (for monitoring) pulls backup failures into monitoring software for me. But even so the daily check of backup success is still required.
The other fixes were about the backup software itself, the requirements have changed to be the following list:
- Simplest file restore possible: one-click is best. (provided by just about all vendors)
- Simplest form of VM restoration: launch the VM and go (Veeam, Quest, PhD Virtual)
- Backup integrity checks: restoration testing of backups (Veeam)
Becoming more important is a way to perform local and offsite backups just in case the backup data repository is also corrupted so multiple backup points become crucial. Any backup plan requires you to account for how to restore critical VMs quickly in minimal storage requirements (perhaps using local storage or reduced performance storage devices), how to perform offsite restorations (perhaps into the cloud), or perhaps the use of a hotsite (perhaps a hot-ready cloud).
This failure and the power failure my region experienced pointed out to me the need for better integration between backup software and existing monitoring tools for virtual environments as well as the need for restoration testing and deployment into a replication receiver cloud. Being able to quickly restore into a cloud environment as part of your backup process is becoming more critical. But the ability to get professional help to fix critical data is also very important. So where do you save your critical data is is readily restorable? Is your backup integrated into your monitoring software? Have you tested your restore today?