Print Email PDF

Qumulo in AWS: Getting Started

IN THIS ARTICLE

Provides guidance and resources for getting started with a Qumulo cloud cluster in AWS

REQUIREMENTS

  • Amazon Machine Image (AMI) of Qumulo Core obtained from AWS Marketplace or Qumulo
  • Amazon Web Services (AWS) account
    • AWS account number and target region
    • IAM permissions for full access to EC2 and CloudFormation
    • SSH key-pair for accessing Qumulo instance
    • Virtual Private Cloud (VPC) with at least one subnet configured in the target region

NOTE: Modifying the type or size of the EBS volumes in the Qumulo AMI will render the software not functional. Please use the block device layout provided in the original AMI.

DETAILS

Welcome to our Getting Started Guide for Qumulo cloud clusters in AWS! Here you’ll find all the details you need to launch AWS instances, configure your cloud cluster, and hit the ground running as a new customer. While this guide serves as a great starting point, there’s so much more you can do with Qumulo! Be sure to take a look at other articles and videos here in the Knowledge Base to discover everything Qumulo has to offer.

If you do have any additional questions or want to provide some feedback, we would love to hear from you! Feel free to open a case, shoot an email over to care@qumulo.com, or ping us in your private Slack channel so that we can get you the answers you need.

Deploy Qumulo in AWS

Deploying Qumulo in AWS involves creating unique EC2 instances backed by EBS storage that are then clustered together into a distributed file system. In general, performance correlates to the number of instances in your cluster, where higher instance counts equate to higher throughput and IOPS. Before creating your Qumulo cloud cluster in AWS, consider the amount of storage and number of instances that you need for your environment and workflow. 

Qumulo uses CloudFormation templates to launch a cluster into your AWS account. Before deploying a Qumulo Cluster, check your account’s service quotas to ensure that there is enough capacity available to launch your selected instance type and the volumes used by the cluster. The minimum service quotas that should be checked include:

  • Running On-Demand Standard (A, C, D, H, I, M, R, T, Z) instances
  • Storage for General Purpose SSD (gp2) volumes
  • Storage for Throughput Optimized HDD (st1) volumes

For step by step instructions and additional info, check out the articles below.

AWS Architecture

aws_arch_diagram_final.png

Install, Configure, and Use the AWS Command Line Interface (CLI)

The AWS Command Line Interface (AWS CLI) is an open source tool that enables you to interact with AWS services using commands in your command-line shell. With minimal configuration, the AWS CLI enables you to start running commands that implement functionality equivalent to that provided by the browser-based AWS Management Console from the command prompt in your favorite terminal program. Reference the AWS documentation below for additional details.

AWS Authentication and Authorization

When using Qumulo software with AWS, you must confirm that the necessary credentials or resources to perform all of the actions involved in a workflow are authorized with the AWS IAM service.

To learn more about authentication and authorization, take a look at the following articles.

Some Qumulo software features require permission to make AWS API calls directly from the instances in your cluster by associating a service role with your instance. Review the IAM roles for Amazon EC2 documentation to learn more about IAM roles for EC2 instances. For certain features, the cluster must be configured with AWS credentials via a Qumulo API. To determine which method to use, reference the support article for the feature you are configuring.

Please follow the IAM best practices to keep your AWS accounts secure. We recommend only using the root user of an account for creating your first IAM user and setting the minimal permissions necessary for user or resource authorization to accomplish a workflow. 

Note that any article in Qumulo Care's Knowledge Base that outlines the configuration of an AWS feature will list the minimal set of required AWS IAM permissions. While it is possible to use IAM policies to limit access to a specific set of AWS resources, this strategy is not effective for all AWS services. Rather, we recommend that user groups are sandboxed to an account within your organization so that they can’t tamper with resources outside of their account.

Secret and Key Management

Customers can authenticate with Qumulo using Local Users or Active Directory, or both simultaneously for different accounts. For authenticating with Local Users, consider using the AWS Secrets Manager with IAM credentials to store usernames and passwords, especially for automation users. The AWS Secrets Manager can also be configured to automatically rotate passwords for a Qumulo cluster via an AWS Lambda Function.

AWS Tagging Resources

To help you manage, identify, organize, search for, and filter resources, you can create tags that assign metadata to your AWS resources. Each tag is a label consisting of a user-defined key and value that can be utilized to categorize resources by purpose, owner, environment, or other criteria. We recommend tagging all Qumulo instances inside your account to categorize and distinguish them from other non-Qumulo Instances.

Qumulo Encryption at Rest

Qumulo's CloudFormation template includes built-in encryption via EBS that requires no additional configuration for launching AWS cloud clusters. When launching new nodes for an existing AWS cluster, encryption must be manually enabled for each of the instance’s volumes. For additional details, check out the Qumulo Care and AWS support articles below.

AWS Certificate Manager

Qumulo clusters come with a self-signed SSL Certificate to encrypt the connections to your browser sessions. Additionally, you can use certificates generated by your organization’s Certificate Authority to provide trusted authentication when connecting over SSL, including AWS Certificate Manager. 

Integration with AWS Certificate Manager can be done by first generating a certificate through AWS Certificate Manager, and then importing that certificate into your Qumulo Cluster. AWS Certificate Manager is capable of automatically renewing the certificate given to the Qumulo Cluster.

Review the topics listed below to find out more about managing certificates.

Monitor your AWS Cloud Cluster with Qumulo Sidecar and CloudWatch

The Qumulo Sidecar can be used to deploy AWS services that are useful in monitoring and maintaining a Qumulo cloud cluster with AWS. This custom Qumulo tool operates as an always-active service alongside the cluster and can be configured once your AWS cluster is up and running in order to perform the following actions.

Send Cluster Metrics to AWS CloudWatch

The Sidecar deploys an AWS Lambda Function that collects cluster metrics once every minute and then sends them to CloudWatch. CloudWatch can create dashboards: custom views that allow you to monitor the status of your cluster. You can configure the dashboard to display the alarms or metrics that are most important for your workflow, making it easy to determine the health of your cluster, and identify problems.

Detect and Repair EBS Volume Failures

The Sidecar deploys an AWS Lambda Function that polls the Qumulo cluster for disk failures every 10 minutes. Once a disk failure is detected, the lambda automatically replaces the corresponding EBS volume.

We recommend launching the AWS Sidecar alongside  each Qumulo cloud cluster launched in AWS. For more information, check out the topics listed below.

Audit Logging

Audit logging in Qumulo Core provides a mechanism for tracking filesystem operations. As connected clients issue requests to the cluster, log messages are generated describing each attempted operation. These log messages are then sent over the network to the CloudWatch logs service.

Upgrade your AWS Cloud Cluster

Qumulo offers simple and fast upgrades that customers rely on to stay up to date with Qumulo’s continuous delivery of new features and enhancements. With this model, we aim to quickly adapt to your needs so that improvements and changes can be made in weeks instead of years. Our upgrade process is incredibly simple and only takes a few clicks. Depending on the protocol used, upgrades will not stop applications from running and will complete in under five minutes.

The Qumulo Sidecar is an additional service that is launched separately alongside a Qumulo Cluster and must be upgraded separately.

For more information about upgrading Qumulo Core and Qumulo Sidecar, please review the articles below.

AWS Outposts

AWS Outposts is ideal for workloads that require low latency access to on-premises systems, local data processing, or local data storage by offering the same AWS hardware infrastructure, services, APIs, and tools to build and run your applications on premises as in the cloud. Unlike other hybrid solutions that require use of different APIs, manual software updates, and purchase of third-party hardware and support, Outposts provide a consistent experience for developer and IT operations across both environments.

You can access the full range of AWS services available in your Region to build, manage, and scale your on-premises applications using familiar AWS services and tools alongside AWS compute, storage, database, and other services that run locally on Outposts. If you're in a highly regulated industry or located in a country with data residency requirements, Outposts provides a solution by securely storing and processing customer data that needs to remain on-premises or in countries where there is no AWS region. And just like in the cloud, both your Outposts infrastructure and AWS services are managed, monitored, and updated by AWS. 

To get started with AWS Outposts, log-in to the AWS Management Console to create a site, select an Outpost configuration, and place your order. For more details on AWS Outposts, reference the following articles.

Additional Considerations

  • Due to limited capacities available on AWS Outposts (up to 55TB) and only having a single EBS volume type (gp2), AWS Outposts is only compatible with 600GB AF SKUs at this time.
  • High availability, including failover and automatic instance recovery, is fully supported with AWS Outposts and is outlined in the High Availability section below.
  • Qumulo's data residency practices are the same for cloud clusters utilizing AWS Outposts and can be reviewed in the Data Residency section below.

AWS GovCloud (US)

Qumulo’s AWS GovCloud (US) offering provides federal, state, and local agencies and organizations the powerful file data platform they need to enable file-based data workflows inheriting the compliance standards from the AWS GovCloud (US) region and cloud services (including FedRAMP HIGH, CJIS, or DoD SRG impact levels 2, 4, 5).

With Qumulo's AWS GovCloud, you can:

  • Meet compliance mandates
  • Safeguard sensitive data
  • Strengthen identity management by restricting access to sensitive data and API calls
  • Improve cloud visibility with auditing
  • Protect accounts and workloads with continuous security monitoring

Qumulo’s AWS Marketplace listings simplify software procurement for government and enterprise organizations by allowing the consumption of Qumulo services on any approved AWS contracts and vehicles that you may already have in place. 

Check out the articles below for additional details on Qumulo's AWS GovCloud (US) offering.

Disaster Detection

Region

To detect an AWS Region failure, you can use the AWS status page to review the status of each service in each region. Qumulo Core software depends on the EC2 service. 

Availability Zone

To detect an Availability Zone failure, there is no direct method. However, if all EC2 instances of a Qumulo cluster within the same Availability Zone send failure alerts, this can indicate an Availability Zone failure.

EC2 Instance and EBS Volume

To detect an AWS EC2 Instance failure, a CloudWatch alarm can be set on an instance to send an SNS topic notification that a failure has occurred.

Additionally, you can use the Qumulo Sidecar service to detect EBS volume failures. If any volume failure occurs, the service will automatically notify via an SNS topic. The Qumulo Sidecar will automatically replace failed volumes to save you time and make your cluster extra resilient.

An alternative to monitoring EC2 or EBS failure is to create a CloudWatch event rule with the service name "Health." Depending on whether you want to monitor the Availability Zone, EC2 instances or EBS volumes  you can specify either "EC2" or "EBS" during configuration. From there, specify the resources that should be monitored and set the event’s target(s) to your desired AWS services, such as SNS, or SQS Lambda. See Monitoring AWS Health events with Amazon CloudWatch Events for more information.

Qumulo Cluster

You can create a CloudWatch alarm to be notified in the event of a cluster failure via the Qumulo Sidecar service. Select metric from customer namespace Qumulo/Metrics and select ClusterName subcategory. To monitor your cluster’s health, use the RemainingNodeFailures metric that tracks the number of nodes that can fail before the cluster goes down. Select the cluster name as the target and set up an alarm to trigger an applicable SNS topic notification in the event that there is insufficient data or the metrics dip below the specified threshold. The configured CloudWatch alarms set on these metrics will give you a detailed picture of the status of a cluster. For more info, check out the articles below.

Disaster Recovery

Replication

A Qumulo cluster can be configured to replicate data to another Qumulo cluster. With replication, you can create a one-way relationship that synchronizes files between a directory on a source cluster and a directory on a target cluster. Three different replication modes can be selected. Continuous replication will repeatedly replicate the data from the source directory to the target directory if changes have occurred. Snapshot policy replication will replicate snapshots to the target cluster on a given schedule. The snapshots will continue to exist on both clusters until the policy expires them. The third and final mode is a mixture of both Continuous replication and snapshot policy replication, where the target cluster can maintain a history of snapshots while also keeping the source and target directories as in-sync as possible.

When using continuous replication, the Recovery Point Objective (RPO) will be under 10 minutes since the source is being replicated to the target about as fast as it is being modified. If you are using snapshot policy replication without continuous replication, the RPO will generally be about the amount of time between snapshots that are getting replicated. For example, the RPO will be about an hour if the replicated snapshot policy takes a snapshot every hour.

The Recovery Time Objective (RTO) for a continuous replication relationship is also generally under 10 minutes. The variable time component of failing over to the target cluster is making the target directory writable. This process needs to revert the target directory’s data to the last successful replication. The process time is about the same as the time that the unsuccessful replication job was running for, since that job needs to be undone. An unfinished replication job should be taking no more than a couple of minutes if continuous replication was being used. If snapshot policy replication is being used without continuous replication, this time will be closer to the amount of time between snapshot policy snapshots. If no replication job was in progress when making the target directory writable, the process of reverting the data to the last successful replication state will be instant since there is nothing to revert. One exception to consider is in the event that the reverted replication job had a lot of file deletion, the time to revert those changes will be longer since deleting files is faster than recreating them.

Once the target directory is reverted successfully, the data will be writable and the clients can now use the directory.

Use the instructions in the Replication: Failover and Failback with 2.12.0 and above article to test the failover process and learn more about this functionality. 

Keep in mind that you may need to increase the AWS service limits when configuring replication to account for the replication target cluster. Note that this cluster does not need to be the same size or configuration as the replication source cluster and additional resources are not needed during failover or failback.

Qumulo Cloud Cluster Backup Tool

The Qumulo Cloud Cluster Backup Tool is designed to backup and restore an AWS cluster for disaster recovery scenarios via EBS volume snapshots. The tool creates an EBS snapshot for each volume in the cluster, tags the snapshots to track them, and then can use the EBS snapshot to restore the cluster if needed. 

Since it only needs to create a cluster from EBS snapshots, the tool has an RTO under 10 minutes. By using the same IP addresses as the original cluster, no reconfiguration of clients is necessary for restoration. 

Unlike the RTO, the RPO will vary based on the amount of write operations completed between backups. AWS does not provide RPO or RTO guidance for EBS volume snapshots.

This backup solution does not have a testing strategy. Since the restored cluster is an exact replica of the original cluster using the same IP addresses as the original cluster, testing the restoration process is destructive to the original cluster.

During restoration, components may exist concurrently between the failed and recovered clusters. To ensure a smooth experience, ensure that the AWS account hosting the cluster has sufficiently high service limits for another cluster of the same configuration.

NOTE: The replication feature is incompatible with the Qumulo Cloud Cluster Backup Tool.

Determine the Best Disaster Recovery Solution

In general, replication is the preferred disaster recovery solution for Qumulo cloud clusters in AWS. This feature has better RPO, is testable, has more configuration options, can protect from region failures, and can operate on individual directories. Additionally, replication can be less expensive than the alternatives. EBS snapshots require payment for all provisioned EBS space per snapshot per month while replication requires only paying for a second cluster regardless of the number of snapshots. However, EBS snapshots do excel in being S3 backed resulting in higher durability and a simpler set of recovery operations.

High Availability

Automatic Instance Recovery

Qumulo Core versions 3.1.0 or above automatically handle instance failures and reset the specified nodes. For previous versions of Qumulo Core in AWS, you can set-up automated instance recovery.

IP Failover

A client is connected to a single node during reads and writes to and from the cluster. If the node becomes unresponsive, the connection will be interrupted, the client will be disconnected and won’t be able to reconnect until the node comes back online.

IP failover swaps the connection from an unresponsive node to a healthy one allowing the client to reconnect to the cluster via another node. When a node goes offline with IP failover configured, a healthy node is selected to take its place to service new requests from clients. 

NOTE: IP failover only protects you from node failure and not cluster-wide failures.

Scheduled Events

AWS can schedule events that require the instances to stop or restart. If an event is scheduled, an email will be sent to the registered address indicating the specified time that the event will automatically trigger the required action. Since Qumulo is designed to be tolerant to instance failures, it is safe to allow AWS to take these actions.

To increase availability during scheduled events, use the details above to configure the cluster for high availability from instance failures. Consider restarting the instances one at a time manually to preserve availability if the scheduled events will concurrently impact multiple instances.

Data Residency

By default, Qumulo does not interact with other services or hosts outside of the individual nodes in the cluster and client connections via the supported protocols.

If Qumulo's Cloud-Based Monitoring is enabled, diagnostic data (e.g., performance and capacity statistics) will be sent over an encrypted connection to a Qumulo cloud instance so that a proprietary application aggregates the cluster diagnostic data. File data, file & path names, client IP addresses, and login information (like usernames & passwords) will not be collected. 

Keep in mind that you can restrict communication to Qumulo's Cloud-Based Monitoring using ACL rules or by simply not providing an egress route to the internet, a standard default behavior of any VPC. Additionally, you can choose to disable the feature at any time in the Web UI or via the qq command-line.

RESOLUTION

You should now have an overall understanding of how get started with a Qumulo cloud cluster in AWS using the resources and details provided

ADDITIONAL RESOURCES

Qumulo in AWS: Build a Multi-Instance Cluster with CloudFormation

Qumulo in AWS: Qumulo Sidecar

Qumulo in AWS: Cloud Cluster Backup Tool

Qumulo in AWS: Automatic EBS Volume Replacement

Qumulo in AWS: Create a CloudWatch Dashboard

Qumulo in AWS: Configure CloudWatch Alarms

 

Like what you see? Share this article with your network! 

Was this article helpful?
0 out of 0 found this helpful

Comments

0 comments

Please sign in to leave a comment.

Have more questions?
Open a Case
Share it, if you like it.