IN THIS ARTICLE
Provides guidance and resources for getting started with a Qumulo cloud cluster in AWS
- Amazon Machine Image (AMI) of Qumulo Core obtained from AWS Marketplace or Qumulo
- Amazon Web Services (AWS) account
- AWS account number and target region
- IAM permissions for full access to EC2 and CloudFormation
- SSH key-pair for accessing Qumulo instance
- Virtual Private Cloud (VPC) with at least one subnet configured in the target region
NOTE: Modifying the type or size of the EBS volumes in the Qumulo AMI will render the software not functional. Please use the block device layout provided in the original AMI.
Welcome to our Getting Started Guide for Qumulo cloud clusters in AWS! Here you’ll find all the details you need to launch AWS instances, configure your cloud cluster, and hit the ground running as a new customer. While this guide serves as a great starting point, there’s so much more you can do with Qumulo! Be sure to take a look at other articles and videos here in the Knowledge Base to discover everything Qumulo has to offer.
If you do have any additional questions or want to provide some feedback, we would love to hear from you! Feel free to open a case, shoot an email over to firstname.lastname@example.org, or ping us in your private Slack channel so that we can get you the answers you need.
Deploy Qumulo in AWS
Deploying Qumulo in AWS involves creating unique EC2 instances backed by EBS storage that are then clustered together into a distributed file system. In general, performance correlates to the number of instances in your cluster, where higher instance counts equate to higher throughput and IOPS. Before creating your Qumulo cloud cluster in AWS, consider the amount of storage and number of instances that you need for your environment and workflow.
Qumulo uses CloudFormation templates to launch a cluster into your AWS account. Before deploying a Qumulo Cluster, check your account’s service quotas to ensure that there is enough capacity available to launch your selected instance type and the volumes used by the cluster. The minimum service quotas that should be checked include:
- Running On-Demand Standard (A, C, D, H, I, M, R, T, Z) instances
- Storage for General Purpose SSD (gp2) volumes
- Storage for Throughput Optimized HDD (st1) volumes
For step by step instructions and additional info, check out the articles below.
- Qumulo in AWS: Build a Multi-Instance Cluster with CloudFormation
- AWS Service Quotas
- Managing AWS Regions
Install, Configure, and Use the AWS Command Line Interface (CLI)
The AWS Command Line Interface (AWS CLI) is an open source tool that enables you to interact with AWS services using commands in your command-line shell. With minimal configuration, the AWS CLI enables you to start running commands that implement functionality equivalent to that provided by the browser-based AWS Management Console from the command prompt in your favorite terminal program. Reference the AWS documentation below for additional details.
AWS Authentication and Authorization
When using Qumulo software with AWS, you must confirm that the necessary credentials or resources to perform all of the actions involved in a workflow are authorized with the AWS IAM service.
To learn more about authentication and authorization, take a look at the following articles.
- AWS Identity and Access Management
- AWS IAM Users
- AWS Policies and Permissions
- Managing IAM Policies
- Managing Passwords
- Managing Access Keys for IAM Users
- Security Best Practices in IAM
Some Qumulo software features require permission to make AWS API calls directly from the instances in your cluster by associating a service role with your instance. Review the IAM roles for Amazon EC2 documentation to learn more about IAM roles for EC2 instances. For certain features, the cluster must be configured with AWS credentials via a Qumulo API. To determine which method to use, reference the support article for the feature you are configuring.
Please follow the IAM best practices to keep your AWS accounts secure. We recommend only using the root user of an account for creating your first IAM user and setting the minimal permissions necessary for user or resource authorization to accomplish a workflow.
Note that any article in Qumulo Care's Knowledge Base that outlines the configuration of an AWS feature will list the minimal set of required AWS IAM permissions. While it is possible to use IAM policies to limit access to a specific set of AWS resources, this strategy is not effective for all AWS services. Rather, we recommend that user groups are sandboxed to an account within your organization so that they can’t tamper with resources outside of their account.
Secret and Key Management
Customers can authenticate with Qumulo using Local Users or Active Directory, or both simultaneously for different accounts. For authenticating with Local Users, consider using the AWS Secrets Manager with IAM credentials to store usernames and passwords, especially for automation users. The AWS Secrets Manager can also be configured to automatically rotate passwords for a Qumulo cluster via an AWS Lambda Function.
- Create Users and Groups
- Join your Qumulo Cluster to Active Directory
- AWS Secrets Manager Tutorial: Creating and Retrieving a Secret
- Rotating AWS Secrets Manager Secrets for Other Databases or Services
- AWS Secrets Manager Guide
- Creating and Managing Secrets
- Managing AWS Access Keys
- AWS Security Credentials
- AWS Lambda deployment package in Python
- Example password rotation AWS Lambda in Python
AWS Tagging Resources
To help you manage, identify, organize, search for, and filter resources, you can create tags that assign metadata to your AWS resources. Each tag is a label consisting of a user-defined key and value that can be utilized to categorize resources by purpose, owner, environment, or other criteria. We recommend tagging all Qumulo instances inside your account to categorize and distinguish them from other non-Qumulo Instances.
Qumulo Encryption at Rest
Qumulo's CloudFormation template includes built-in encryption via EBS that requires no additional configuration for launching AWS cloud clusters. When launching new nodes for an existing AWS cluster, encryption must be manually enabled for each of the instance’s volumes. For additional details, check out the Qumulo Care and AWS support articles below.
- Qumulo Core's Encryption at Rest
- Qumulo in AWS: Build a Multi-Instance Cluster with CloudFormation
- Qumulo in AWS: Launch a New Node for an Existing Cloud Cluster
- Amazon EBS encryption
AWS Certificate Manager
Qumulo clusters come with a self-signed SSL Certificate to encrypt the connections to your browser sessions. Additionally, you can use certificates generated by your organization’s Certificate Authority to provide trusted authentication when connecting over SSL, including AWS Certificate Manager.
Integration with AWS Certificate Manager can be done by first generating a certificate through AWS Certificate Manager, and then importing that certificate into your Qumulo Cluster. AWS Certificate Manager is capable of automatically renewing the certificate given to the Qumulo Cluster.
Review the topics listed below to find out more about managing certificates.
- SSL: Generate a Certificate Signing Request
- SSL: Install a Signed SSL Certificate
- AWS Certificate Manager
- Security in AWS Certificate Manager
- Managed Renewal for ACM's Amazon-Issued Certificates
Monitor your AWS Cloud Cluster with Qumulo Sidecar and CloudWatch
The Qumulo Sidecar can be used to deploy AWS services that are useful in monitoring and maintaining a Qumulo cloud cluster with AWS. This custom Qumulo tool operates as an always-active service alongside the cluster and can be configured once your AWS cluster is up and running in order to perform the following actions.
Send Cluster Metrics to AWS CloudWatch
The Sidecar deploys an AWS Lambda Function that collects cluster metrics once every minute and then sends them to CloudWatch. CloudWatch can create dashboards: custom views that allow you to monitor the status of your cluster. You can configure the dashboard to display the alarms or metrics that are most important for your workflow, making it easy to determine the health of your cluster, and identify problems.
Detect and Repair EBS Volume Failures
The Sidecar deploys an AWS Lambda Function that polls the Qumulo cluster for disk failures every 10 minutes. Once a disk failure is detected, the lambda automatically replaces the corresponding EBS volume.
We recommend launching the AWS Sidecar alongside each Qumulo cloud cluster launched in AWS. For more information, check out the topics listed below.
- Qumulo in AWS: Qumulo Sidecar
- Qumulo in AWS: Automatic EBS Volume Replacement
- Qumulo in AWS: Create a CloudWatch Dashboard
- Qumulo in AWS: Configure CloudWatch Alarms
- Using Amazon CloudWatch Dashboards
Audit logging in Qumulo Core provides a mechanism for tracking filesystem operations. As connected clients issue requests to the cluster, log messages are generated describing each attempted operation. These log messages are then sent over the network to the CloudWatch logs service.
- Qumulo in AWS: Audit Logging with CloudWatch
- Qumulo Core Audit Logging
- Monitoring Audit Logs
- Viewing Audit Logs in CloudWatch Logs
Upgrade your AWS Cloud Cluster
Qumulo offers simple and fast upgrades that customers rely on to stay up to date with Qumulo’s continuous delivery of new features and enhancements. With this model, we aim to quickly adapt to your needs so that improvements and changes can be made in weeks instead of years. Our upgrade process is incredibly simple and only takes a few clicks. Depending on the protocol used, upgrades will not stop applications from running and will complete in under five minutes.
The Qumulo Sidecar is an additional service that is launched separately alongside a Qumulo Cluster and must be upgraded separately.
For more information about upgrading Qumulo Core and Qumulo Sidecar, please review the articles below.
To detect an AWS Region failure, you can use the AWS status page to review the status of each service in each region. Qumulo Core software depends on the EC2 service.
To detect an Availability Zone failure, there is no direct method. However, if all EC2 instances of a Qumulo cluster within the same Availability Zone send failure alerts, this can indicate an Availability Zone failure.
EC2 Instance and EBS Volume
To detect an AWS EC2 Instance failure, a CloudWatch alarm can be set on an instance to send an SNS topic notification that a failure has occurred.
Additionally, you can use the Qumulo Sidecar service to detect EBS volume failures. If any volume failure occurs, the service will automatically notify via an SNS topic. The Qumulo Sidecar will automatically replace failed volumes to save you time and make your cluster extra resilient.
An alternative to monitoring EC2 or EBS failure is to create a CloudWatch event rule with the service name "Health." Depending on whether you want to monitor the Availability Zone, EC2 instances or EBS volumes you can specify either "EC2" or "EBS" during configuration. From there, specify the resources that should be monitored and set the event’s target(s) to your desired AWS services, such as SNS, or SQS Lambda. See Monitoring AWS Health events with Amazon CloudWatch Events for more information.
You can create a CloudWatch alarm to be notified in the event of a cluster failure via the Qumulo Sidecar service. Select metric from customer namespace Qumulo/Metrics and select ClusterName subcategory. To monitor your cluster’s health, use the RemainingNodeFailures metric that tracks the number of nodes that can fail before the cluster goes down. Select the cluster name as the target and set up an alarm to trigger an applicable SNS topic notification in the event that there is insufficient data or the metrics dip below the specified threshold. The configured CloudWatch alarms set on these metrics will give you a detailed picture of the status of a cluster. For more info, check out the articles below.
- Qumulo in AWS: Qumulo Sidecar
- Qumulo in AWS: Configure CloudWatch Alarms
- Qumulo in AWS: Create a CloudWatch Dashboard
A Qumulo cluster can be configured to replicate data to another Qumulo cluster. With replication, you can create a one-way relationship that synchronizes files between a directory on a source cluster and a directory on a target cluster. Three different replication modes can be selected. Continuous replication will repeatedly replicate the data from the source directory to the target directory if changes have occurred. Snapshot policy replication will replicate snapshots to the target cluster on a given schedule. The snapshots will continue to exist on both clusters until the policy expires them. The third and final mode is a mixture of both Continuous replication and snapshot policy replication, where the target cluster can maintain a history of snapshots while also keeping the source and target directories as in-sync as possible.
When using continuous replication, the Recovery Point Objective (RPO) will be under 10 minutes since the source is being replicated to the target about as fast as it is being modified. If you are using snapshot policy replication without continuous replication, the RPO will generally be about the amount of time between snapshots that are getting replicated. For example, the RPO will be about an hour if the replicated snapshot policy takes a snapshot every hour.
The Recovery Time Objective (RTO) for a continuous replication relationship is also generally under 10 minutes. The variable time component of failing over to the target cluster is making the target directory writable. This process needs to revert the target directory’s data to the last successful replication. The process time is about the same as the time that the unsuccessful replication job was running for, since that job needs to be undone. An unfinished replication job should be taking no more than a couple of minutes if continuous replication was being used. If snapshot policy replication is being used without continuous replication, this time will be closer to the amount of time between snapshot policy snapshots. If no replication job was in progress when making the target directory writable, the process of reverting the data to the last successful replication state will be instant since there is nothing to revert. One exception to consider is in the event that the reverted replication job had a lot of file deletion, the time to revert those changes will be longer since deleting files is faster than recreating them.
Once the target directory is reverted successfully, the data will be writable and the clients can now use the directory.
Use the instructions in the Replication: Failover and Failback with 2.12.0 and above article to test the failover process and learn more about this functionality.
Keep in mind that you may need to increase the AWS service limits when configuring replication to account for the replication target cluster. Note that this cluster does not need to be the same size or configuration as the replication source cluster and additional resources are not needed during failover or failback.
Qumulo Cloud Cluster Backup Tool
The Qumulo Cloud Cluster Backup Tool is designed to backup and restore an AWS cluster for disaster recovery scenarios via EBS volume snapshots. The tool creates an EBS snapshot for each volume in the cluster, tags the snapshots to track them, and then can use the EBS snapshot to restore the cluster if needed.
Since it only needs to create a cluster from EBS snapshots, the tool has an RTO under 10 minutes. By using the same IP addresses as the original cluster, no reconfiguration of clients is necessary for restoration.
Unlike the RTO, the RPO will vary based on the amount of write operations completed between backups. AWS does not provide RPO or RTO guidance for EBS volume snapshots.
This backup solution does not have a testing strategy. Since the restored cluster is an exact replica of the original cluster using the same IP addresses as the original cluster, testing the restoration process is destructive to the original cluster.
During restoration, components may exist concurrently between the failed and recovered clusters. To ensure a smooth experience, ensure that the AWS account hosting the cluster has sufficiently high service limits for another cluster of the same configuration.
NOTE: The replication feature is incompatible with the Qumulo Cloud Cluster Backup Tool.
Determine the Best Disaster Recovery Solution
In general, replication is the preferred disaster recovery solution for Qumulo cloud clusters in AWS. This feature has better RPO, is testable, has more configuration options, can protect from region failures, and can operate on individual directories. Additionally, replication can be less expensive than the alternatives. EBS snapshots require payment for all provisioned EBS space per snapshot per month while replication requires only paying for a second cluster regardless of the number of snapshots. However, EBS snapshots do excel in being S3 backed resulting in higher durability and a simpler set of recovery operations.
Automatic Instance Recovery
Qumulo Core versions 3.1.0 or above automatically handle instance failures and reset the specified nodes. For previous versions of Qumulo Core in AWS, you can set-up automated instance recovery.
A client is connected to a single node during reads and writes to and from the cluster. If the node becomes unresponsive, the connection will be interrupted, the client will be disconnected and won’t be able to reconnect until the node comes back online.
IP failover swaps the connection from an unresponsive node to a healthy one allowing the client to reconnect to the cluster via another node. When a node goes offline with IP failover configured, a healthy node is selected to take its place to service new requests from clients.
NOTE: IP failover only protects you from node failure and not cluster-wide failures.
AWS can schedule events that require the instances to stop or restart. If an event is scheduled, an email will be sent to the registered address indicating the specified time that the event will automatically trigger the required action. Since Qumulo is designed to be tolerant to instance failures, it is safe to allow AWS to take these actions.
To increase availability during scheduled events, use the details above to configure the cluster for high availability from instance failures. Consider restarting the instances one at a time manually to preserve availability if the scheduled events will concurrently impact multiple instances.
You should now have an overall understanding of how get started with a Qumulo cloud cluster in AWS using the resources and details provided
Like what you see? Share this article with your network!