IN THIS ARTICLE
Outlines how to use Qumulo's backup tool with cloud clusters in AWS
REQUIREMENTS
- Cloud Cluster with Qumulo Core 3.0.3 or above
- AWS Console access
- SSH key-pair for accessing Qumulo instance
- IAM permissions for full access to EC2
- Command line (qq CLI) tools installed via API & Tools in the Web UI
- The qumulo_aws_backup tool downloaded from the release folder that corresponds to the Qumulo Core version of your cloud cluster
- Linux environment to run the tool
IAM PERMISSIONS
The table below lists the required IAM permissions for utilizing Qumulo's backup tool with cloud clusters in AWS.
ec2:CreateNetworkInterface | ec2:CreateTags | ec2:CreateSnapshot |
*ec2:DeleteNetworkInterfaces |
ec2:DeleteSnapshot | ec2:DescribeImages |
ec2:DescribeInstances | ec2:DescribeNetworkInterfaces | ec2:DescribeSnapshots |
ec2:DescribeVolumes | ec2:RegisterImage | ec2:RunInstances |
* The ec2:DeleteNetworkInterfaces permission is only required if you are running in a lambda (for VPC access).
DETAILS
Qumulo's command line interface tool, qumulo_aws_backup, is designed to backup and restore an AWS cluster for disaster recovery scenarios via EBS volume snapshots. This collection of EBS volume snapshots are identified as being a part of a specific backup by their tags that include the backup ID, time they were created, node they belong to, block device mapping for original volume, and other pieces of information needed to restore a cluster.
EBS volume snapshots are backed by S3, so they have S3’s high durability, high availability, and low cost, making them an effective location for backup data. EBS volume snapshots do not appear in S3, but can be viewed in AWS’s Web UI under EC2 > Elastic Block Store > Snapshots. Users can search for snapshots belonging to a backup by their backup ID through this view, though the tool has commands to do this in a more user-friendly way.
Both the cluster’s data and configuration will be backed up and restored. Multiple backups of a cluster can be made, allowing the user to restore from different points in time. Backups can be listed and validated to know if they are usable for restoration.
IMPORTANT! This tool is not designed to be used to make multiple copies of a cluster; only the original cluster or one cluster restored from a backup may exist at a time.
Create a Backup of your AWS Cloud Cluster
The backup command creates a point in time backup of the entire cluster that can later be used to restore the cluster to that state. It takes an IP address of one of the nodes in the cluster and the path to the SSH key for the nodes. Optionally, additional tags for the EBS snapshots created as part of the backup can be specified:
./qumulo_aws_backup --region us-west-2 backup --ssh-key-path /home/me/.ssh/my_aws_key_file --tag User=Myself --tag "Reason=Felt like it" 1.2.3.4
The command will return a JSON containing the identifier for the backup and the time at which the backup was taken:
{"QumuloBackupId": "34329b13-82bc-4182-b0b1-f347954a5f29", "QumuloBackupTime": "2020-03-02T20:05:52Z"}
The QumuloBackupId can be used with the other commands to interact with this backup. The EBS snapshots are also tagged to allow them to be discovered via the AWS console. This output will be displayed even if the command fails, as there may be a partial or complete backup with EBS snapshots that can be cleaned up using the delete command. The backup command will always attempt to make sure the file system has returned to a usable state when the command has finished, regardless of a failure occurrence. If this fails, the command will notify you and the user will need to intervene to restart the instances.
The backup command should take a couple minutes to complete, during which time the cluster will become briefly unavailable. The cluster will be usable while the EBS snapshots complete asynchronously in the background (this can take as little as a few minutes or up to several hours, depending on your cluster). Once complete, you will be able to successfully restore a cloud cluster. If the EBS snapshots process fails to complete, the backup will be invalid and unusable for restoration.
IMPORTANT! Backups cannot be taken on clusters in a replication relationship. See the “Considerations with Replication” section below for more details.
Restore from a Backup
The restore command creates a running cluster from a backup using the backup ID, the instance type of the nodes in the restored cluster, and the AWS SSH key pair name to use for the restored cluster. Optionally, additional tags for the AMIs, ENIs, and instances created as part of the backup can be specified:
./qumulo_aws_backup --region us-west-2 restore --instance-type m5.4xlarge --aws-key-pair-name my_keys --tag User=Myself --tag "Reason=Felt like it" 34329b13-82bc-4182-b0b1-f347954a5f29
The command will return a JSON containing the AMIs and instances created:
{"ImageIds": ["ami-06648dc0b47dea6d1", "ami-0621e948e12572d17", "ami-00082215a1044b724", "ami-0c9c9f7ce28f788ac"], "InstanceIds": ["i-0fafac1794d1866eb", "i-068afcff49d05e166", "i-077ebb8e96540785b", "i-06561699cb4b7185b"]}
This output may not be presented if the command fails. If AMIs and instances were created before the failure, they can be cleaned up by searching for AMIs and instances tagged with the backup ID.
Once the restore is complete, the cluster will be nearly identical to the original cluster. The configuration will include:
- All Files and Directories
- All Qumulo configuration, such as cluster name, quotas, snapshots
- AWS configurations including:
- Subnet
- VPC
- IP addresses
- IAM roles
- Security groups
Keep in mind that the restored cluster will not reflect other configurations such as instance IDs, tags, or instance protection. The backed up cluster and the restored cluster cannot exist at the same time for the same reason that two restored clusters cannot exist at the same time.
If an unrelated instance takes any IP addresses used for the original cluster, the backup will be unrestorable. To protect against this event, configure the Elastic Network Interfaces (ENIs) of the cluster to be backed up to not delete on termination. In this case, the restore command will reuse ENIs with the expected IP addresses, or it will create new ENIs if they are not found.
List/View Existing Backups
The list command provides a short description of all backups that can be found in the specified region for the AWS account. The command will return a table with the cluster name, creation time, backup id, and backup state sorted by creation time:
./qumulo_aws_backup --region us-west-2 list
Cluster Name Creation Timestamp Backup ID Backup State
-------------- -------------------- ------------------------------------ --------------
qumulo 2020-02-27T18:36:46Z cc2376fa-632c-42dd-bfac-b9e7aa868fb6 invalid
target 2020-02-27T18:52:37Z 0a0fd077-507d-432f-a598-518060eca91c valid
qfsd 2020-02-27T20:42:34Z 144ccd5f-ce03-4925-994a-8797807b9960 valid
qumulo 2020-03-02T20:05:52Z 34329b13-82bc-4182-b0b1-f347954a5f29 valid
The list command can optionally return only backups from a specific cluster by providing a cluster name to search for:
./qumulo_aws_backup --region us-west-2 list --cluster-name qumulo
Cluster Name Creation Timestamp Backup ID Backup State
-------------- -------------------- ------------------------------------ --------------
qumulo 2020-02-27T18:36:46Z cc2376fa-632c-42dd-bfac-b9e7aa868fb6 invalid
qumulo 2020-03-02T20:05:52Z 34329b13-82bc-4182-b0b1-f347954a5f29 valid
Additionally, the list command can return the output in JSON format:
./qumulo_aws_backup --region us-west-2 list --json
[{"BackupState": "invalid", "QumuloBackupId": "cc2376fa-632c-42dd-bfac-b9e7aa868fb6", "QumuloBackupTime": "2020-02-27T18:36:46Z", "QumuloClusterName": "qumulo"}, {"BackupState": "valid", "QumuloBackupId": "0a0fd077-507d-432f-a598-518060eca91c", "QumuloBackupTime": "2020-02-27T18:52:37Z", "QumuloClusterName": "target"}, {"BackupState": "valid", "QumuloBackupId": "144ccd5f-ce03-4925-994a-8797807b9960", "QumuloBackupTime": "2020-02-27T20:42:34Z", "QumuloClusterName": "qfsd"}, {"BackupState": "valid", "QumuloBackupId": "34329b13-82bc-4182-b0b1-f347954a5f29", "QumuloBackupTime": "2020-03-02T20:05:52Z", "QumuloClusterName": "qumulo"}]
The view command provides a detailed description of a specific backup that includes node details like the node ID, IP address, subnet, VPC, IAM role, and security group. It also includes details about the individual volume snapshots and reports if the individual snapshots in the backup are malformed to a degree that they cannot be properly categorized.
NOTE: The absence of malformed snapshots does not indicate that the backup will be restorable; use the validate command to confirm, as detailed in the next section.
Include the backup ID in the view command to display:
./qumulo_aws_backup --region us-west-2 view 34329b13-82bc-4182-b0b1-f347954a5f29
Backup Metadata (retrieved from snapshot snap-0861b5668b9f1fb55):
------------------- ------------------------------------
Backup ID 34329b13-82bc-4182-b0b1-f347954a5f29
Backup tool version 1
Cluster name qumulo
Number of nodes 4
Backup time 2020-03-02T20:05:52Z
------------------- ------------------------------------
Nodes Metadata:
Node ID IP address Subnet ID VPC ID IAM role ARN Security group IDs Number of volumes
--------- ------------ --------------- ------------ -------------- -------------------- -------------------
1 10.81.7.49 subnet-29e9d14e vpc-42c92624 None ['sg-c5646fbe'] 16
3 10.81.11.151 subnet-29e9d14e vpc-42c92624 None ['sg-c5646fbe'] 16
2 10.81.3.67 subnet-29e9d14e vpc-42c92624 None ['sg-c5646fbe'] 16
4 10.81.2.172 subnet-29e9d14e vpc-42c92624 None ['sg-c5646fbe'] 16
Snapshots Metadata:
Node Root Malformed Vol type Dev name Snapshot ID
------ ------ ----------- ---------- ---------- ----------------------
1 True No gp2 /dev/sda1 snap-0861b5668b9f1fb55
1 False No gp2 /dev/xvdb snap-0cdaba5dae7adbe81
1 False No gp2 /dev/xvdc snap-0f2fd6ca8f914a77d
1 False No gp2 /dev/xvdd snap-09234953327e29c28
1 False No gp2 /dev/xvde snap-001166334bb5fef64
1 False No gp2 /dev/xvdf snap-0304b4f622289b62a
1 False No st1 /dev/xvdg snap-016476ec7029b6ab2
1 False No st1 /dev/xvdh snap-0753cef140ec6e2d0
1 False No st1 /dev/xvdi snap-0e38bde5823f062d4
1 False No st1 /dev/xvdj snap-002123b9b92d70d90
1 False No st1 /dev/xvdk snap-0287a966f93f0964c
1 False No st1 /dev/xvdl snap-0398743f749471413
1 False No st1 /dev/xvdm snap-04264cfdcbcbbc455
1 False No st1 /dev/xvdn snap-04a79a51d616ff140
1 False No st1 /dev/xvdo snap-0a997aae1e8b1a74a
1 False No st1 /dev/xvdp snap-0a89f955604f96efe
2 True No gp2 /dev/sda1 snap-0bac0e9bd7730b486
2 False No gp2 /dev/xvdb snap-0b78f0562fa66793e
2 False No gp2 /dev/xvdc snap-0b6cf294de897b2b2
2 False No gp2 /dev/xvdd snap-0620600ed0a4e02f0
2 False No gp2 /dev/xvde snap-03d49e34aaf3dc7b9
2 False No gp2 /dev/xvdf snap-04c01360ed6c0d83f
2 False No st1 /dev/xvdg snap-0330844a4946601db
2 False No st1 /dev/xvdh snap-0eb03334b7820ddd0
2 False No st1 /dev/xvdi snap-009aad661cca20fd5
2 False No st1 /dev/xvdj snap-07bd184254a514d3f
2 False No st1 /dev/xvdk snap-06374b200c4994851
2 False No st1 /dev/xvdl snap-0785853dda785ff90
2 False No st1 /dev/xvdm snap-06c0cc21fc4948e37
2 False No st1 /dev/xvdn snap-08396bd77b191001b
2 False No st1 /dev/xvdo snap-0edf193c633b8b115
2 False No st1 /dev/xvdp snap-0b057eeb535d29609
3 True No gp2 /dev/sda1 snap-0a82f027fd936ab39
3 False No gp2 /dev/xvdb snap-09b49a9c4289e81d3
3 False No gp2 /dev/xvdc snap-08d29e5501fadd1df
3 False No gp2 /dev/xvdd snap-0f4e503ad0b147fc6
3 False No gp2 /dev/xvde snap-040a2001a78400352
3 False No gp2 /dev/xvdf snap-00ff65f373380edbf
3 False No st1 /dev/xvdg snap-00f9aa0a6145ef23c
3 False No st1 /dev/xvdh snap-0db3383e773e1be2b
3 False No st1 /dev/xvdi snap-05ddcf5a423f290e2
3 False No st1 /dev/xvdj snap-0007943eaa044fe67
3 False No st1 /dev/xvdk snap-0e1515464691254d8
3 False No st1 /dev/xvdl snap-02de6eab4ef2e2e02
3 False No st1 /dev/xvdm snap-0b9d3681fe375b3f9
3 False No st1 /dev/xvdn snap-0fe05321b42496173
3 False No st1 /dev/xvdo snap-06e4def7b1ac02075
3 False No st1 /dev/xvdp snap-02ddef9ba7aa563df
4 True No gp2 /dev/sda1 snap-00fdac42c6bbe8a3e
4 False No gp2 /dev/xvdb snap-08e022436879029eb
4 False No gp2 /dev/xvdc snap-04922df1e5575dc43
4 False No gp2 /dev/xvdd snap-0a76db7443fe814bf
4 False No gp2 /dev/xvde snap-0c0657a907a1156ae
4 False No gp2 /dev/xvdf snap-0adca9928812df34e
4 False No st1 /dev/xvdg snap-0171303dae7c14846
4 False No st1 /dev/xvdh snap-0ef203b706d76272f
4 False No st1 /dev/xvdi snap-0c3c5ba67d7c88a72
4 False No st1 /dev/xvdj snap-0bbebe2fb67417bad
4 False No st1 /dev/xvdk snap-07e4298b7929be2ab
4 False No st1 /dev/xvdl snap-0ab14e56999fef373
4 False No st1 /dev/xvdm snap-057c7ec11507aafc9
4 False No st1 /dev/xvdn snap-09cabce40b60f9cfa
4 False No st1 /dev/xvdo snap-07c177cac27df6838
4 False No st1 /dev/xvdp snap-0f944064a81d3e908
Validate a Backup
The validate command checks that a backup is restorable using the backup ID:
./qumulo_aws_backup --region us-west-2 validate 34329b13-82bc-4182-b0b1-f347954a5f29
Validation succeeded for backup id: 34329b13-82bc-4182-b0b1-f347954a5f29
If the validation is successful, the user is informed. If the validation is not successful, a list of errors will be returned as outlined below:
./qumulo_aws_backup --region us-west-2 validate cc2376fa-632c-42dd-bfac-b9e7aa868fb6
Validation failed for backup id: cc2376fa-632c-42dd-bfac-b9e7aa868fb6.
Errors Encountered:
Snapshot-level errors:
---------------------
Snapshot snap-047c1a21586237603 is malformed: Root snapshot is not finalized.
Snapshot snap-0559b319b320f393e is malformed: Root snapshot is not finalized.
Snapshot snap-0623276860064cc2c is malformed: Root snapshot is not finalized.
Snapshot snap-06dcdc3d3c24ab129 is malformed: Root snapshot is not finalized.
Backups may fail to validate due to the following:
- EBS snapshots failing to complete
- EBS snapshots being deleted
- EBS snapshots having their tags modified
- EBS snapshots have not completed
- Failure is temporary and will go away when the EBS snapshots have completed
- The backup command fails before completion
Delete a Backup
The delete command will delete a backup by removing all associated EBS snapshots:
./qumulo_aws_backup --region us-west-2 delete 34329b13-82bc-4182-b0b1-f347954a5f29
Successfully deleted backup "34329b13-82bc-4182-b0b1-f347954a5f29"
Any backup, valid or invalid, can be deleted using the backup ID. Backups containing snapshots which are referenced by AMIs cannot be deleted and AMIs that are created during the restore command may still be available. These AMIs are safe to delete, which will allow the backup to be deleted.
Considerations with Replication
If a backup is attempted on a cluster that is part of a replication relationship —either as source or as target— qumulo_aws_backup will error with the following message:
Unable to create a backup of a cluster that is a part of a replication relationship. Please contact customer support.
Our qumulo_aws_backup tool cannot back up a cluster and guarantee replication data consistency. Moving replication configuration and relationships’ states to a different cluster is not supported; therefore, backing up a cluster with any replication relationship present is also not supported.
You can work around this limitation by deleting all the replication relationships from the cluster before running qumulo_aws_backup and then reconfigure the clusters with the same relationships once the backup is complete. Note that this is not recommended since new replication relationships will have to do a full tree walk, and may have to copy their entire data-set to the target.
RESOLUTION
You should now be able to successfully use Qumulo's backup tool with your AWS cloud cluster
ADDITIONAL RESOURCES
Qumulo in AWS: Add a Node to an Existing Cloud Cluster
Like what you see? Share this article with your network!
Comments
0 comments