Replication Round-up - TVP Strategy

Whether you use replication as a means of disaster avoidance or disaster recovery, replication of your virtual environment between hot sites has always been a win. With current technology it is even possible to replicate to a replication receiver cloud which could provide a measure of business continuity as well. So who are the players and who provides what service, and how do they do it?
There are several different ways of performing replication:

As part of a regular backup

Replication in this form is a side effect of moving virtual disk data around during a backup cycle. Instead of not only storing data on backup volumes it is possible to send the virtual disk data to a hot site where the virtual disk data is either stored in a native format or within the format of the backup software, waiting to be restored. Those backup tools that replicate ready to run systems have an advantage here. Backup tools are optimized to reduce the overall network load but also designed to produce non-crash consistent versions of the VMs for most if not all operating systems. Many have tools to work with applications to quiesce the data before transfer.

By presenting replicating storage to the hypervisor

When you present replicating storage to a hypervisor in general, the hypervisor is not involved in the replication, the storage layer is wholly involved. In general, this requires like hardware on both sides of the replication link and generally produces crash consistent copies of virtual disk files. To aid in producing non-crash consistent copies, VMware developed Storage Replication Manager, which interacts with many hardware vendors, but not all and never between disparate storage devices. Non-VMware hypervisors have no means to manage this type of replication.

By tying into the virtual SCSI layers of the hypervisor

There is a new breed of replication tools that go underneath the virtual machine, but above the physical storage to replicate running VMs by intercepting all blocks of data as they are written to the storage device. In addition, such tools can query/inspect the storage allocated to the virtual disks without the VM even knowing this happens as it happens outside the VM. Yet, this form of replication would require tools to talk to and quiesce the application data. Be aware, however this form of replication is limited to running VMs only.

These three mechanisms do their work in entirely different ways and as such use different paths to achieve replication. But all replicate blocks of a virtual machine to another site and when it comes to a single site disaster, any of these tool types would work. However, if you wish to achieve business continuity where the VMs have either zero or no downtime, then you are looking currently at By presenting replicating storage to the hypervisor. Business continuity in this form provides a mechanism to replicate data but keeps the VMs running during failover and failback movements. Which can keep a business running during a planned or unplanned site outage.

Disaster recovery provides a mechanism to recovery from a failure either by restoring or powering on a hot site after or during a disaster. Your data is safe and secure outside the reach of the disaster. This should be a regular activity which includes backups and replication.
Disaster Avoidance is the act of planning for the impending disaster by either bringing up the prepared hot site created during your standard disaster recovery mechanisms, moving hardware, copying data, or some other means to get your data out of the disaster area. This is a planned activity with a firm deadline to complete.
Business Continuity is a method to keep a business running before, during, and between a disaster or other event. This is generally where hot-sites shine in any plan, however, it may not be a hot-site but a well designed application or hardware that can withstand many failures.

Some of the players in this field are:

	vSphere	Hyper-V	Xen KVM	vSCSI	Hardware	Present LUN	Failback	BC	DR	DA	IP Fix
VMware SRM
VMware Replicator
DataGardens		1	1								2
EMC VPLEX
ZeRTO
Veeam											3
Quest
FalconStor		1	1				4

¹ No integration with management tools
² VIA DNS
³ VIA Sandbox
⁴ VMs Rebooted (downtime)
Of the ones listed above, EMC VPLEX and DataGardens handle the most as they provide a way to synchronously and asynchronously move data from hot-site to hot-site while also allowing for business continuity. In other words, VMs can be moved to the hot-site while a VM is running. Each does this however in different ways however.

EMC VPLEX uses hardware to replicate blocks as they are written through the VPLEX hardware to like devices synchronously or asynchronously. If the second device is within 100KM (60 miles) it is possible to use vTeleport to vMotion VMs en mass from one data center to another, then back again. This relies on the fact that the data already exists at the hotsite due to using VPLEX.

DataGardens presents a replicating LUN to your hypervisor (much like VPLEX) and creates hidden replicas on the target LUNs. These hidden replicas are kept up to date and can reside in up to 64 different hot-sites/datacenters. So replication is 1 to many. At anytime a VM can be migrated to a hotsite using standard vMotion. At the end of the vMotion, Datagardens detects this and instead of using the remote replica turns to the hidden replica already residing at the hot-site and hooks it in as the running virtual disk.

ZeRTO won TechTarget’s Best of Show at VMworld US 2011 with there vSCSI based replication. Not only do they replicate to hot-sites (using their own disk format) but can restore directly to the hot-site. They also can replicate directly to various partner clouds who have ZeRTO as well.

Veeam Backup and Replication is a combined product that not only backs up your virtual machines using standard virtual machine backup techniques but also can restore to a hot-site and test those restorations using its Sure Backup technology. This restoration however is to a sandbox which may be just fine given IP restrictions at a hot site. With the proper firewall you would have an NFS datastore in which you could run your VMs with no IP changes. With the proper scripts, Sure Backup can not only test restore of the VMs but integrity of the applications.

VMware SRM provides a mechanism for hardware replication and mirroring to occur while using VMware vSphere’s built in snapshot quiesce mechanisms (without taking a snapshot).

VMware Replication uses the vSCSI mechanism to replicate only running disks to a hotsite, as this is part of SRM it also ties into the same quiesce mechanisms which at this time work best with windows.

Quest vReplicator takes the vRangerPro backup technology and adds the ability to create runnable VMs at a hot site.

FalconStor is a LUN presentation mechanism that replicates a presented LUN to a hotsite by using all the standard backup techniques to limit data transfer amounts. FalconStor’s provides BC but there is a hiccup when a VM fails over or back to and from a hotsite due to its ability to boot the VM at the hot-site.

Disaster Recovery is no longer just about recovering from disaster but avoiding them as well as providing continuity for the business. Businesses are inherently dependent upon the internet, as such we cannot afford to have our data unprotected from disasters. Our businesses need to continue to run. These tools provide a way to build out hot-sites easily.

However, before you go off and purchase one of these tools, you need to understand whether or not it is integrated with your set of applications. Most have integration with Microsoft VSS, and as long as the application integrates well with VSS, your replications will be non-crash consistent, however, not many support Linux very well or integrate into Linux applications. As such, replicated disks could end up being crash consistent which implies that they may not even boot. As such, better integration with your applications is warranted, which could be simply providing a script to quiesce your application from within the replication tool. As such, it is paramount that hot-sites be tested regularly.

The final issue with replication is the maintaining of IP at the hot-site. This ends up being one of the more difficult tasks as IP networks at one site do not often match at another site, as such IP Fix up has to be addressed within any BC software. Currently there are 3 ways to do this, by manipulating DNS, by providing a sandbox, or by manipulating IP addresses within the VM. All have their drawbacks. Which to use depends on your needs and the type of software used. All these solutions need properly designed networks at the hot-site to either allow for IP changes, using stretch layer-2 technologies to have a single IP subnet stretching across the hot-sites, or other techniques such as sandboxing.

Choosing a product for replication will require fully understanding the business continuity needs of your environment as well as the data protections required for the data to be moved between sites. In addition, any such plan must address IP network issues, crash consistency and application quiesce issues, as well as plans to test hot-sites on a regular, perhaps automated method.