This is the first of many comparisons and commentaries on data protection within the hybrid cloud. We are looking at the mechanisms used to achieve data protection. Mechanisms—how boring—yet from an architectural and data management view, mechanisms become increasingly important. The mechanisms available can impact the costs of your data protection. One example: it is often thought that data protection is instantaneous. It isn’t. It has a window of execution measured in hours, not microseconds. If you need microsecond data protection, you may need other tools to fill that need.
The first things to decide are what you need in the way of time to recover your application (recovery time objective, or RTO) as well as how much data loss you can stomach during recovery (recovery point objective, or RPO). RPO determines how often data protection should be used, while RTO governs how soon recovery will be completed once started. This pair of critical factors will control what mechanisms are important within your organization. Beyond those two, there are other, equally important mechanisms that influence the types of recovery mechanisms in use.

In the following table we look at mechansims for many data protection tools. Most of these mechanisms are expected to always be available within any data protection tool and are more about backup than data protection. When we branch into looking at recovery, we start thinking about where our data resides. Can it be sent to a cloud or clouds? Is this a feature of the data protection software or the archival unit? My feeling is it is part and parcel of any data protection plan.
[table id=7510  filter=’Mechanisms’ filter_columns=”1-3,K,P” /] To understand the table, we need to define these mechanisms:

  • Full Backup is the backup of the full target, including operating system, application, and state. This could be an entire container, virtual machine, or physical machine. The boundary of a full backup is the source container.
  • Incremental is the backup of only those things within the source container that have changed. This is usually a much smaller subset. Incrementals are restored in the order they are taken in to create a full. Some backup tools have gone so far as to create synthetic full backups. These backups already layer the incrementals such that only one restore is done, not many. This saves on restoration time.
  • Multiple Recover Points is the storage of synthetic or full backups from multiple points in time, providing a choice of how far back in time one wishes to restore. This is required for ransomware data protection, but it is also useful if you are protecting while upgrading parts of the operating system or application.
  • Application (A)ware/(P)artial is a new mechanism that addresses how the backup is done with respect to the application within the container or containers. Application Aware data protection understands the application and uses application mechanisms to ensure all data within the application is written to disk so that the backup is as complete as possible. This includes applications that span containers. A Partial application aware status would occur if not all applications were understood or if no mechanism within the data protection tool were to call a script to allow for full application awareness. Partial also indicates that something may still need to be done by hand.
  • File System Indexing is the creation of a file level catalog of what is backed up, where it is located within the incrementals, full, etc. repositories and, in effect, the data needed to restore just one element or file as necessary. This becomes a crucial part of knowing where your data is at any given time.
  • CDP is continuous data protection, which has limitations on distance between source and target. This often requires specialized drivers to make happen. In essence, the source and target are in lockstep with each other.
  • Near CDP is almost CDP, or CDP that is asynchronous. In other words, what is written to the data protection repository is often slightly out of sync with the source container. Data may still be in flight, etc. Over hybrid cloud distances, this may be the closest and lowest RPO available between clouds.
  • N-Source to Target Replication is the ability to replicate data from many source containers to one target repository or system. One such target system could be another cloud.
  • N-Source to Multiple Target Replication is the ability to replicate data from any source to multiple target repositories or systems at the same time or in a cascade approach. In other words, we can replicate source containers between one cloud and another, then between one availability zone and another, and then or at the same time to the data center

These are the basic data protection mechanisms. There is much more involved with data protection, but these are the basic mechanisms to achieve data protection. These are so basic that nearly every tool should have them or something like them within their products. As we move forward, File System Indexing (or indexing of what is backed up) is a necessity for knowing where your data is at all times. The indexes held within the data protection tools would be filtered up to a catalog of catalogs; however, if the basic ability to index is missing, then the higher-order functions are also missing.
For mechanisms like these, the implementation method is unimportant. We do not need to know if data protection is achieved using agents, out of band mechanisms, or even within a hypervisor. That is a detail for another comparison and discussion.

Mechanisms: Closing Thoughts

The real question is how you can use this information to strengthen existing data protection within a secure hybrid cloud. When we look at the hybrid cloud, we also need to consider how to protect our data. How do we access critical information? How do we get it all back under our control? Data protection in the hybrid cloud is a “there and back again” implementation. Get your data into the cloud or clouds, but have it available at your fingertips as needed.
It is also not about any single cloud but all clouds. Can data protection be used to replicate data between clouds to be available as needed. Yet, why copy data between clouds that already exists in the other cloud? For this we need knowledge. We need to know where all our data is at any given time. We need to understand our data better, or more importantly, the tools need to understand our hybrid cloud and let us know which mechanisms are available, which will not work, and why. However, which have lowest RPO, RTO, and costs is still left up to the data protection manager.
Please contact us at elh@virtualizationpractice.com if you wish to see your products in our comparisons.

3 replies on “Comparison – Hybrid Cloud Data Protection – Mechanisms”

  1. Good article and overview. One interesting thing to note in a future article might be the degrees of application awareness around block/image level backup at the HOS level vs that capability natively at the application level. I’m hearing from quite a few IT professionals who are focused on application protection that are increasingly wanting to hear specifically about how Microsoft SQL, Exchange, SharePoint, Oracle, and other applications are protected and the pros and cons of block/image level backup vs application-native backup.

    1. Hello Mark,
      Thank-you. We have also gathered information about various degrees of application awareness and will be discussing those as well. It is a very important area within data protection. However, it seems application awareness has several different meanings. For example, some would consider SQL an application, others would consider it a part of an application. Each consideration needs to be addressed by data protection these days.
      Best regards,
      Edward L. Haletky

Comments are closed.