IN THIS ARTICLE
Outlines how to use failover and failback with Continuous Replication in Qumulo Core 2.11.2 and above
- Admin privileges required
- Source and target cluster running the same version of Qumulo Core 2.11.2 or above
If you are running Qumulo Core 2.11.1 or below, check out the article here for details on Qumulo's legacy failover/failback UI.
Failover to the Secondary Cluster
To utilize the target directory on the secondary cluster for writes, you can use the Make Target Writable option in the Actions menu on the target relationship listing. Once the process is complete, the target directory will become read-write and the relationship can be deleted. If the relationship had incomplete replication data, the target directory will be synchronized with the most recent recovery point.
Planned failover: You should disable Continuous Replication and wait for the last job to complete before deleting the relationship. This ensures that the target directory is in a point-in-time consistent state matching the last source snapshot taken.
As an extra precaution, you can make shares/exports read-only on the primary cluster to ensure that no writes are lost. You can also use the Make Target Writable action to ensure that the target is consistent with the latest replication, though there should be no difference.
Unplanned failover: If the last replication was incomplete before the primary cluster became unavailable, it may have left the target directory in an indeterminate state with some files truncated or otherwise inconsistent.
Follow the instructions below to utilize failover with Continuous Replication:
- Click the button on the replication relationship listing.
- Select Make Target Writable.
- Wait for the directory to be reverted to the last recovery point. Progress can be monitored by clicking Details.
- Click the button and select Delete relationship once the relationship status displays Disconnected and target writable.
As the target directory is being made consistent, migrate any of the following configuration necessary to the secondary cluster if it doesn’t already exist:
- NFS exports
- SMB shares
- AD/LDAP server(s)
- Snapshot policies
Remount all clients previously connected to the primary cluster that require access to the secondary cluster.
Re-enable Continuous Replication after Failover
If you bring the primary cluster back online, you may want to re-enable Continuous Replication after a failover. Continuous Replication can immediately be re-established in the original direction (primary to secondary) if there is no need to keep the completed writes on the secondary cluster during the failover period.
You may want to do this if:
- The primary cluster came back online before client write traffic was redirected to the secondary cluster
- The writes to the secondary cluster were done as part of a test failover or DR readiness test and can therefore be discarded
To re-enable Continuous Replication in this case, you can simply re-create the original relationship. Replication will take ownership of the directory and sync the current version of the source directory to the target, overwriting any changes on the target directory.
Note that a warning dialog will display to proceed:
The initial replication job after re-creating the relationship will complete a full tree walk in the replication directories on both sides to ensure the target is brought up to sync. Files that have been replicated successfully and haven’t changed since the original replication will not be resent.
Failback to the Primary Cluster
To save writes on the secondary but continue using the original primary cluster, you can set up replication from the secondary to the primary cluster by recreating the relationship in that direction. Any changes on the primary that were made after the last successful replication will be overwritten. As stated above, the initial job will do a tree walk.
Follow the instructions below to utilize failback with Continuous Replication:
- Create a new replication relationship for each directory you would like to restore from the secondary cluster back to the primary cluster.
- Begin replicating from the secondary cluster to the primary cluster.
- Discontinue writes to the source directory(s) on the secondary cluster.
- Start and complete a final replication job. The final replication job prior to the cutover should be as small as possible to minimize the duration that no I/O will be allowed into the directory.
- Delete the relationship for each directory failing back once the final replication job completes.
- Create the following data (if it doesn't already exist) on the primary cluster:
- NFS Exports
- SMB Shares
- AD/LDAP Server(s)
- Snapshot Policies
- Remount all clients previously connected to the secondary cluster that require access to the primary cluster.
- Re-create all relationships (primary to secondary) to re-enable Continuous Replication.
Since the primary and secondary clusters were fully in sync when the relationship was deleted and the secondary cluster was in an inactive state, no data will be lost or overwritten on the secondary cluster.
You should now be able to successfully utilize failover and failback with Continuous Replication in Qumulo Core 2.11.2 and above
Like what you see? Share this article with your network!