As we all know, data protection is not really about how we back up or replicate data. Instead, it is about how we recover our data. Recovery is not just about a disaster; it is also about files and continual testing. Data protection must not be “set and forget.” Our ever-changing hybrid cloud environments require proactive data protection. We need to detect changes to applications. We need software that adjusts backup or replication to pull in more and more of the application. In essence, data protection should not require a human to be involved. Where are we in relation to this goal?

The goal is a human-less form of data protection. This implies that disaster recovery must be automated. It also implies that business continuity actions should be automated. Testing for both of these should be automated as well. Some would argue that business continuity is more a process and requires documentation. I agree to a point. Business continuity is a technical reaction to a business impact. Such an impact could be an outage. The solution is to restore or start up in a new site or new part of the cloud. That last action requires a human to push the button, but the rest is automated. We are quite far from that goal. Some vendors do this better than others. Some do it poorly. Automation reigns in the hybrid cloud. Recovery testing and feedback to the data protection process is crucial.
[table id=9 filter=’Recovery’ filter_columns=’Product,Veeam,Cristie Software,Zerto’] To understand the table above, we need to define the requirements:

  • File Level recovery is the recovery of individual files either through a catalog or direct filesystem access. In addition, file level recovery should happen within any recover point.
  • Synthetic Full (collapsed inc) is a form of incremental backup in which the full backup is available without the need to recover the full backup and then each individual incremental. Essentially, each incremental is collapsed into one file (or set of files). This is considered a synthetic full backup or replication.
  • No Vendor Software Required is about being able to recover even the data protection server without the need for the data protection server to exist. Some get around this by having a duplicate server elsewhere (as is the case for replication). However, in a disaster, first restoring the backup server could be time consuming if the software is not readily available. For site-to-site backups, this is often not an issue; however, within the same site, it is unless you are using replication software that requires two components. Even so, if the data is not stored in a native (to the hypervisor) format, there needs to be some tool to restore.
  • Recovery Tool Stored with Backups: A tool that can restore the critical components with ease needs to be stored within the data protection repository (whether backup or replication). This could be a preconfigured container that contains the data protection software, a recovery tool, or some other mechanism to recover direct from the repository. In many cases, this is not necessary unless there is a true disaster where all you have left is a replica of the repository.
  • Scripted (F)ull/(P)artial Recovery is the use of a script to restore the backup or replicate from the repository. Such a script could exist within the software or directly within the repository. The goal is to have a means to export or store the script in a useful place: with the protected data. “Full” means that everything—including network, storage, CPU, and other profiles—is also restored as part of this script. In other words, it correctly recreates the full environment for the restore target. “Partial” means that some setup is required before you restore.
  • Agent-less means that not only is the data protection agent-less, but the restoration process does not require any agents. This is not the case for many bare-metal restorations, but for all other forms it is a possibility.
  • Automated Recovery Testing means the data is properly recovered and then a test is run against the data. A simple checksum isn’t sufficient; the test must also check that the data is usable by the application. This requires knowledge of the full application, not just the individual pieces. For example, SQL is a piece of an application, not an application unto itself.
  • Instant VM Recovery is where the recovery can happen in such a way that the VM is ready to be booted immediately, whether off a disk presentation, or via some other means. The goal is to recover and use the protected data immediately.
  • Present as iSCSI is where the repository is presented as an iSCSI target through some gateway provided by the data protection software.
  • Present as FC is where the repository is presented as FC through some gateway provided by the data protection software.
  • Present as Memory is where the repository is presented as a memory construct (such as a cache layer) through some gateway provided by the data protection software. Many instant VM recovery mechanisms use caching to deliver data.
  • Present as Local is where the repository is presented as a local disk through some mechanism by the data protection software. This could be a local disk within a hypervisor.
  • Present as NFS is where the repository is presented as an NFS share through the data protection software.
  • Present as SMB is where the repository is presented as an SMB share through the data protection software.
  • Present as Object is where the repository is presented as an object store through the data protection software.
  • Disaster Recovery Plan is an automatically created plan using the results of the recovery testing. The plan needs to be automatically updated as new data is protected. The technical side of business continuity planning also falls under this category. The use of boilerplate for common actions and processes outside the technical is also expected to be part of such plans. For example, if there is a call tree for a disaster or continuity event, it would be included in such plans.
  • Feedback from Recovery Test means that the results of the recovery test are fed back into the data protection process in such a way that missing elements are added automatically to the data protection of the application. For example, if there is a missing SQL database, that database would be fed back into the front end of the data protection tool so that next time, it will be there for testing. Minimally, a report of what is missing needs to be provided.

These are the basics of data protection recovery for the future as we move away from “set and forget.” The goal is to limit human interaction to just decision points. Should that missing bit be added or not? Ultimately, the goal should be to eliminate human interaction during the actual recovery process except to monitor and organize as needed. Without automation, this will be impossible. We are far from this today. Some data protection vendors feel that this is outside the realm of data protection. I disagree. Dynamic environments require dynamic data protection. We use analytics for nearly everything. It is time analytics was brought to data protection for the benefit of all.

Recovery: Closing Thoughts

The real question is whether or not your recovery mechanisms will work with very little human intervention. When to invoke this level of automation should be a human decision, and monitoring such recovery should be done by a human. Other than that, the data protection of the future needs to be responsive to the highly changing environment of the secure hybrid cloud. New applications will be created and torn down quickly. Data protection needs to keep up. Since data protection is often an afterthought, the tools need to catch up with the needs of a truly dynamic environment.
Currently, what needs to be protected is specified by container, not by application. The determination of what is to be protected is left up to people, not the tools. I should be able to say “protect application A” to have the data protection tool, working with other tools in the ecosystem, automatically determine what needs to be protected within A—to protect that data and be fairly intelligent about it. Then, the application should be recovery tested on an ongoing basis. Issues would then be brought up and the data protection administrator would either approve the changes to the data to be protected or send on for more research.
Yet, we always need to remember how to recover if there are no tools readily available from a raw repository. That is why we need tools stored with the protected data if necessary. Human-less data protection recovery is the ultimate goal.
For that, we need adaptive data protection. Where are you on this journey?