Print Email PDF

Qumulo in AWS: Cloud Cluster Backup Tool

IN THIS ARTICLE

Outlines how to use Qumulo's backup tool with cloud clusters in AWS

REQUIREMENTS

  • Cloud Cluster with Qumulo Core 3.0.3 or above 
  • AWS Console access
  • SSH key-pair for accessing Qumulo instance
  • IAM permissions for full access to EC2
  • Command line (qq CLI) tools installed via API & Tools in the Web UI
  • The qumulo_aws_backup tool downloaded from the release folder that corresponds to the Qumulo Core version of your cloud cluster
  • Linux environment to run the tool

IAM PERMISSIONS

The table below lists the required IAM permissions for utilizing Qumulo's backup tool with cloud clusters in AWS.

ec2:CreateNetworkInterface ec2:CreateTags ec2:CreateSnapshot
*ec2:DeleteNetworkInterfaces
ec2:DeleteSnapshot ec2:DescribeImages
ec2:DescribeInstances ec2:DescribeNetworkInterfaces ec2:DescribeSnapshots
ec2:DescribeVolumes ec2:RegisterImage ec2:RunInstances

The ec2:DeleteNetworkInterfaces permission is only required if you are running in a lambda (for VPC access).

DETAILS

Qumulo's command line interface tool, qumulo_aws_backup, is designed to backup and restore an AWS cluster for disaster recovery scenarios via EBS volume snapshots. This collection of EBS volume snapshots are identified as being a part of a specific backup by their tags that include the backup ID, time they were created, node they belong to, block device mapping for original volume, and other pieces of information needed to restore a cluster.

EBS volume snapshots are backed by S3, so they have S3’s high durability, high availability, and low cost, making them an effective location for backup data. EBS volume snapshots do not appear in S3, but can be viewed in AWS’s Web UI under EC2 > Elastic Block Store > Snapshots. Users can search for snapshots belonging to a backup by their backup ID through this view, though the tool has commands to do this in a more user-friendly way.

Both the cluster’s data and configuration will be backed up and restored. Multiple backups of a cluster can be made, allowing the user to restore from different points in time. Backups can be listed and validated to know if they are usable for restoration.

IMPORTANT! This tool is not designed to be used to make multiple copies of a cluster; only the original cluster or one cluster restored from a backup may exist at a time.

Create a Backup of your AWS Cloud Cluster

The backup command creates a point in time backup of the entire cluster that can later be used to restore the cluster to that state. It takes an IP address of one of the nodes in the cluster and the path to the SSH key for the nodes. Optionally, additional tags for the EBS snapshots created as part of the backup can be specified:

./qumulo_aws_backup --region us-west-2 backup --ssh-key-path /home/me/.ssh/my_aws_key_file --tag User=Myself --tag "Reason=Felt like it" 1.2.3.4

The command will return a JSON containing the identifier for the backup and the time at which the backup was taken:

{"QumuloBackupId": "34329b13-82bc-4182-b0b1-f347954a5f29", "QumuloBackupTime": "2020-03-02T20:05:52Z"}

The QumuloBackupId can be used with the other commands to interact with this backup. The EBS snapshots are also tagged to allow them to be discovered via the AWS console. This output will be displayed even if the command fails, as there may be a partial or complete backup with EBS snapshots that can be cleaned up using the delete command. The backup command will always attempt to make sure the file system has returned to a usable state when the command has finished, regardless of a failure occurrence. If this fails, the command will notify you and the user will need to intervene to restart the instances.

The backup command should take a couple minutes to complete, during which time the cluster will become briefly unavailable. The cluster will be usable while the EBS snapshots complete asynchronously in the background (this can take as little as a few minutes or up to several hours, depending on your cluster). Once complete, you will be able to successfully restore a cloud cluster. If the EBS snapshots process fails to complete, the backup will be invalid and unusable for restoration.

IMPORTANT! Backups cannot be taken on clusters in a replication relationship. See the “Considerations with Replication” section below for more details. 

Restore from a Backup

The restore command creates a running cluster from a backup using the backup ID, the instance type of the nodes in the restored cluster, and the AWS SSH key pair name to use for the restored cluster. Optionally, additional tags for the AMIs, ENIs, and instances created as part of the backup can be specified:

./qumulo_aws_backup --region us-west-2 restore --instance-type m5.4xlarge --aws-key-pair-name my_keys --tag User=Myself --tag "Reason=Felt like it" 34329b13-82bc-4182-b0b1-f347954a5f29

The command will return a JSON containing the AMIs and instances created:

{"ImageIds": ["ami-06648dc0b47dea6d1", "ami-0621e948e12572d17", "ami-00082215a1044b724", "ami-0c9c9f7ce28f788ac"], "InstanceIds": ["i-0fafac1794d1866eb", "i-068afcff49d05e166", "i-077ebb8e96540785b", "i-06561699cb4b7185b"]}

This output may not be presented if the command fails. If AMIs and instances were created before the failure, they can be cleaned up by searching for AMIs and instances tagged with the backup ID.

Once the restore is complete, the cluster will be nearly identical to the original cluster. The configuration will include:

  • All Files and Directories
  • All Qumulo configuration, such as cluster name, quotas, snapshots
  • AWS configurations including:
    • Subnet
    • VPC
    • IP addresses
    • IAM roles
    • Security groups

Keep in mind that the restored cluster will not reflect other configurations such as instance IDs, tags, or instance protection. The backed up cluster and the restored cluster cannot exist at the same time for the same reason that two restored clusters cannot exist at the same time.

If an unrelated instance takes any IP addresses used for the original cluster, the backup will be unrestorable. To protect against this event, configure the Elastic Network Interfaces (ENIs) of the cluster to be backed up to not delete on termination. In this case, the restore command will reuse ENIs with the expected IP addresses, or it will create new ENIs if they are not found.

List/View Existing Backups

The list command provides a short description of all backups that can be found in the specified region for the AWS account. The command will return a table with the cluster name, creation time, backup id, and backup state sorted by creation time:

./qumulo_aws_backup --region us-west-2 list

Cluster Name    Creation Timestamp    Backup ID                 Backup State
--------------  --------------------  ------------------------------------  --------------
qumulo          2020-02-27T18:36:46Z  cc2376fa-632c-42dd-bfac-b9e7aa868fb6  invalid
target          2020-02-27T18:52:37Z  0a0fd077-507d-432f-a598-518060eca91c  valid
qfsd            2020-02-27T20:42:34Z  144ccd5f-ce03-4925-994a-8797807b9960  valid
qumulo          2020-03-02T20:05:52Z  34329b13-82bc-4182-b0b1-f347954a5f29  valid

The list command can optionally return only backups from a specific cluster by providing a cluster name to search for:

./qumulo_aws_backup --region us-west-2 list --cluster-name qumulo

Cluster Name    Creation Timestamp    Backup ID                 Backup State
--------------  --------------------  ------------------------------------  --------------
qumulo          2020-02-27T18:36:46Z  cc2376fa-632c-42dd-bfac-b9e7aa868fb6  invalid
qumulo          2020-03-02T20:05:52Z  34329b13-82bc-4182-b0b1-f347954a5f29  valid

Additionally, the list command can return the output in JSON format:

./qumulo_aws_backup --region us-west-2 list --json

[{"BackupState": "invalid", "QumuloBackupId": "cc2376fa-632c-42dd-bfac-b9e7aa868fb6", "QumuloBackupTime": "2020-02-27T18:36:46Z", "QumuloClusterName": "qumulo"}, {"BackupState": "valid", "QumuloBackupId": "0a0fd077-507d-432f-a598-518060eca91c", "QumuloBackupTime": "2020-02-27T18:52:37Z", "QumuloClusterName": "target"}, {"BackupState": "valid", "QumuloBackupId": "144ccd5f-ce03-4925-994a-8797807b9960", "QumuloBackupTime": "2020-02-27T20:42:34Z", "QumuloClusterName": "qfsd"}, {"BackupState": "valid", "QumuloBackupId": "34329b13-82bc-4182-b0b1-f347954a5f29", "QumuloBackupTime": "2020-03-02T20:05:52Z", "QumuloClusterName": "qumulo"}]

The view command provides a detailed description of a specific backup that includes node details like the node ID, IP address, subnet, VPC, IAM role, and security group. It also includes details about the individual volume snapshots and reports if the individual snapshots in the backup are malformed to a degree that they cannot be properly categorized.

NOTE: The absence of malformed snapshots does not indicate that the backup will be restorable; use the validate command to confirm, as detailed in the next section.

Include the backup ID in the view command to display:

./qumulo_aws_backup --region us-west-2 view 34329b13-82bc-4182-b0b1-f347954a5f29
Backup Metadata (retrieved from snapshot snap-0861b5668b9f1fb55):
-------------------  ------------------------------------
Backup ID            34329b13-82bc-4182-b0b1-f347954a5f29
Backup tool version  1
Cluster name         qumulo
Number of nodes      4
Backup time          2020-03-02T20:05:52Z
-------------------  ------------------------------------

Nodes Metadata:
  Node ID  IP address    Subnet ID   VPC ID IAM role ARN    Security group IDs Number of volumes
---------  ------------  ---------------  ------------ --------------  -------------------- -------------------
        1  10.81.7.49    subnet-29e9d14e  vpc-42c92624 None            ['sg-c5646fbe']         16
        3  10.81.11.151  subnet-29e9d14e  vpc-42c92624 None            ['sg-c5646fbe']         16
        2  10.81.3.67    subnet-29e9d14e  vpc-42c92624 None            ['sg-c5646fbe']         16
        4  10.81.2.172   subnet-29e9d14e  vpc-42c92624 None            ['sg-c5646fbe']         16

Snapshots Metadata:
  Node  Root   Malformed    Vol type Dev name    Snapshot ID
------  ------ -----------  ---------- ----------  ----------------------
     1  True    No     gp2 /dev/sda1   snap-0861b5668b9f1fb55
     1  False   No     gp2 /dev/xvdb   snap-0cdaba5dae7adbe81
     1  False   No     gp2 /dev/xvdc   snap-0f2fd6ca8f914a77d
     1  False   No     gp2 /dev/xvdd   snap-09234953327e29c28
     1  False   No     gp2 /dev/xvde   snap-001166334bb5fef64
     1  False   No     gp2 /dev/xvdf   snap-0304b4f622289b62a
     1  False   No     st1 /dev/xvdg   snap-016476ec7029b6ab2
     1  False   No     st1 /dev/xvdh   snap-0753cef140ec6e2d0
     1  False   No     st1 /dev/xvdi   snap-0e38bde5823f062d4
     1  False   No     st1 /dev/xvdj   snap-002123b9b92d70d90
     1  False   No     st1 /dev/xvdk   snap-0287a966f93f0964c
     1  False   No     st1 /dev/xvdl   snap-0398743f749471413
     1  False   No     st1 /dev/xvdm   snap-04264cfdcbcbbc455
     1  False   No     st1 /dev/xvdn   snap-04a79a51d616ff140
     1  False   No     st1 /dev/xvdo   snap-0a997aae1e8b1a74a
     1  False   No     st1 /dev/xvdp   snap-0a89f955604f96efe
     2  True    No     gp2 /dev/sda1   snap-0bac0e9bd7730b486
     2  False   No     gp2 /dev/xvdb   snap-0b78f0562fa66793e
     2  False   No     gp2 /dev/xvdc   snap-0b6cf294de897b2b2
     2  False   No     gp2 /dev/xvdd   snap-0620600ed0a4e02f0
     2  False   No     gp2 /dev/xvde   snap-03d49e34aaf3dc7b9
     2  False   No     gp2 /dev/xvdf   snap-04c01360ed6c0d83f
     2  False   No     st1 /dev/xvdg   snap-0330844a4946601db
     2  False   No     st1 /dev/xvdh   snap-0eb03334b7820ddd0
     2  False   No     st1 /dev/xvdi   snap-009aad661cca20fd5
     2  False   No     st1 /dev/xvdj   snap-07bd184254a514d3f
     2  False   No     st1 /dev/xvdk   snap-06374b200c4994851
     2  False   No     st1 /dev/xvdl   snap-0785853dda785ff90
     2  False   No     st1 /dev/xvdm   snap-06c0cc21fc4948e37
     2  False   No     st1 /dev/xvdn   snap-08396bd77b191001b
     2  False   No     st1 /dev/xvdo   snap-0edf193c633b8b115
     2  False   No     st1 /dev/xvdp   snap-0b057eeb535d29609
     3  True    No     gp2 /dev/sda1   snap-0a82f027fd936ab39
     3  False   No     gp2 /dev/xvdb   snap-09b49a9c4289e81d3
     3  False   No     gp2 /dev/xvdc   snap-08d29e5501fadd1df
     3  False   No     gp2 /dev/xvdd   snap-0f4e503ad0b147fc6
     3  False   No     gp2 /dev/xvde   snap-040a2001a78400352
     3  False   No     gp2 /dev/xvdf   snap-00ff65f373380edbf
     3  False   No     st1 /dev/xvdg   snap-00f9aa0a6145ef23c
     3  False   No     st1 /dev/xvdh   snap-0db3383e773e1be2b
     3  False   No     st1 /dev/xvdi   snap-05ddcf5a423f290e2
     3  False   No     st1 /dev/xvdj   snap-0007943eaa044fe67
     3  False   No     st1 /dev/xvdk   snap-0e1515464691254d8
     3  False   No     st1 /dev/xvdl   snap-02de6eab4ef2e2e02
     3  False   No     st1 /dev/xvdm   snap-0b9d3681fe375b3f9
     3  False   No     st1 /dev/xvdn   snap-0fe05321b42496173
     3  False   No     st1 /dev/xvdo   snap-06e4def7b1ac02075
     3  False   No     st1 /dev/xvdp   snap-02ddef9ba7aa563df
     4  True    No     gp2 /dev/sda1   snap-00fdac42c6bbe8a3e
     4  False   No     gp2 /dev/xvdb   snap-08e022436879029eb
     4  False   No     gp2 /dev/xvdc   snap-04922df1e5575dc43
     4  False   No     gp2 /dev/xvdd   snap-0a76db7443fe814bf
     4  False   No     gp2 /dev/xvde   snap-0c0657a907a1156ae
     4  False   No     gp2 /dev/xvdf   snap-0adca9928812df34e
     4  False   No     st1 /dev/xvdg   snap-0171303dae7c14846
     4  False   No     st1 /dev/xvdh   snap-0ef203b706d76272f
     4  False   No     st1 /dev/xvdi   snap-0c3c5ba67d7c88a72
     4  False   No     st1 /dev/xvdj   snap-0bbebe2fb67417bad
     4  False   No     st1 /dev/xvdk   snap-07e4298b7929be2ab
     4  False   No     st1 /dev/xvdl   snap-0ab14e56999fef373
     4  False   No     st1 /dev/xvdm   snap-057c7ec11507aafc9
     4  False   No     st1 /dev/xvdn   snap-09cabce40b60f9cfa
     4  False   No     st1 /dev/xvdo   snap-07c177cac27df6838
     4  False   No     st1 /dev/xvdp   snap-0f944064a81d3e908

Validate a Backup

The validate command checks that a backup is restorable using the backup ID:

./qumulo_aws_backup --region us-west-2 validate 34329b13-82bc-4182-b0b1-f347954a5f29

Validation succeeded for backup id: 34329b13-82bc-4182-b0b1-f347954a5f29

If the validation is successful, the user is informed. If the validation is not successful, a list of errors will be returned as outlined below:

./qumulo_aws_backup --region us-west-2 validate cc2376fa-632c-42dd-bfac-b9e7aa868fb6

Validation failed for backup id: cc2376fa-632c-42dd-bfac-b9e7aa868fb6.
Errors Encountered:

Snapshot-level errors:
---------------------
Snapshot snap-047c1a21586237603 is malformed: Root snapshot is not finalized.
Snapshot snap-0559b319b320f393e is malformed: Root snapshot is not finalized.
Snapshot snap-0623276860064cc2c is malformed: Root snapshot is not finalized.
Snapshot snap-06dcdc3d3c24ab129 is malformed: Root snapshot is not finalized.

Backups may fail to validate due to the following:

  • EBS snapshots failing to complete
  • EBS snapshots being deleted
  • EBS snapshots having their tags modified
  • EBS snapshots have not completed
    • Failure is temporary and will go away when the EBS snapshots have completed
  • The backup command fails before completion

Delete a Backup

The delete command will delete a backup by removing all associated EBS snapshots:

./qumulo_aws_backup --region us-west-2 delete 34329b13-82bc-4182-b0b1-f347954a5f29

Successfully deleted backup "34329b13-82bc-4182-b0b1-f347954a5f29"

Any backup, valid or invalid, can be deleted using the backup ID. Backups containing snapshots which are referenced by AMIs cannot be deleted and AMIs that are created during the restore command may still be available. These AMIs are safe to delete, which will allow the backup to be deleted.

Considerations with Replication

If a backup is attempted on a cluster that is part of a replication relationship —either as source or as target— qumulo_aws_backup will error with the following message:

Unable to create a backup of a cluster that is a part of a replication relationship. Please contact customer support.

Our qumulo_aws_backup tool cannot back up a cluster and guarantee replication data consistency. Moving replication configuration and relationships’ states to a different cluster is not supported; therefore, backing up a cluster with any replication relationship present is also not supported.

You can work around this limitation by deleting all the replication relationships from the cluster before running qumulo_aws_backup and then reconfigure the clusters with the same relationships once the backup is complete. Note that this is not recommended since new replication relationships will have to do a full tree walk, and may have to copy their entire data-set to the target.

RESOLUTION

You should now be able to successfully use Qumulo's backup tool with your AWS cloud cluster

ADDITIONAL RESOURCES

Qumulo in AWS: Add a Node to an Existing Cloud Cluster

QQ CLI: Cluster Configuration

 

Like what you see? Share this article with your network!

Was this article helpful?
0 out of 0 found this helpful

Comments

0 comments

Please sign in to leave a comment.

Have more questions?
Open a Case
Share it, if you like it.