Qumulo Trends is where we here at Qumulo experiment with potential new features and gather your feedback. Qumulo Trends features are not part of our regular releases of Qumulo Core and are updated and maintained on their own schedule. We do not guarantee the same service level agreement for Qumulo Trends features as we do in Qumulo Core.
IN THIS ARTICLE
Outlines the different features of Qumulo Trends
- Timeseries Graphs
- Capacity Forecast
- Event Timeline
- Latency & Ops
- Settings & Email Alerts
- Cluster running Qumulo Core
- SSO Account with Qumulo
Problems signing in? Click Login with Qumulo Single Sign-On and select Don't Remember your password
- Qumulo Trends is only available to customers who have purchased licensed product from Qumulo and is not available to trial users at this time
- QF2 cloud-based monitoring service enabled in the Web UI
Qumulo Trends offers complimentary data to your current analytics that can be accessed anywhere at any time via the cloud. While the metrics in the Web UI dashboard provide analytics in real time, Qumulo Trends provide a look over a period of time so you can monitor cluster performance, capacity, latency and events.
The home page is your starting point into Qumulo Trends where you can see a quick overview of your cluster (or clusters). Here you can:
- Click Timeseries Graphs to take a deep look at your cluster's overall performance
- Click the capacity bar to see your Capacity trend and forecast
- Select Event Timeline to review activity over the last few weeks on the cluster
- See latency trends and operation counts by clicking the Latency & Ops line graph. Note that the grey bars correspond to your typical latency range during the past 2 weeks and the blue line is the latency during the last 24 hours.
Our timelines provide a deeper look at the cluster's overall performance. You can see what the cluster is doing on an hour to hour basis and adjust the start and end times to focus in on certain periods of time. Keep in mind that because of sampling and decay built into the collection of the active files, certain visualizations may not always perfectly reflect data on the cluster at an instantaneous moment in time.
- Used Cluster Capacity shows capacity over time at a granular level of 15 minutes. Note that data capacity and metadata capacity are shown separately.
- If you have billions of files, you can expect to see large capacity usage for metadata.
- File and Directory Count shows the total file and directory count across the cluster over time at a granular level of 15 minutes.
- Read and Write Throughput shows average read and write throughput for data at 15 minute granularity. Keep in mind that this visualization includes data written and read from the cluster, and does not include network throughput between nodes or other non-data operations.
- The max throughputs, which are disabled by default, are the max throughputs recorded in 1 minute of the 15 minute window.
- Read and Write IOPS shows average read and write IOPS of operations (data and metadata) per second performed by the cluster for data at 15 minute granularity.
- The max IOPS, which are disabled by default, are the max IOPS recorded in 1 minute of the 15 minute window.
- Disk Utilization is a percentage from 0 to 100 indicating the busyness of the SSDs or HDDs.
- Because Qumulo distributes the workload evenly across all drives, we are able to reduce the metric to a single number.
- CPU Utilization is the average CPU utilization of all nodes and cores across the cluster.
- Data Read & Write Latency shows the latency of read and write data operations.
- Metadata Latency shows the latency of read and write metadata operations.
- Active Clients shows the number of clients reading and writing files based on data in Qumulo's activity by client report and API (qq current_activity_get).
- Files Read/Written shows the number of files being read and written based on data in Qumulo's activity by path report and API (qq current_activity_get).
- Data Read and Write Size shows the average number of bytes written and read per file operation based on data in Qumulo's activity by path report and API (qq current_activity_get).
- Note that while Qumulo has a great deal of support for smaller read and write size as well as smaller file sizes, in general, the larger chunks of data, the more throughput you will achieve.
- System Temperatures is an average of all temperatures (reported in degrees Celsius) measured inside the chassis per node reported via IPMI. Depending on the model this may include PSU, CPU, NIC, memory, and front/rear temps.
- Node Load Balance shows a vertical bar per node that indicates the amount of work the node is doing in support of all protocol operations. Specifically, it is the sum of operation time (latency) of all operations the node completed in the hour displayed on a logarithmic scale.
- If a bar is colored orange, it means the node is performing more than the ideal amount of protocol operations. Specifically, it has a load that is at least 50% larger than the ideal load of 1 / # number of nodes. For example, on a 10 node cluster, the ideal load is 10%, so a bar would be orange if the load was at least 15% of the cluster-wide load.
- Network Throughput outlines the throughput on a per node basis at 15 minute granularity.
The capacity trend and forecast displays your cluster's used capacity growth over time and forecasts your future growth based on the trend. This forecast provides visibility into the usable capacity verses your current storage and provides a window at a projected time that your cluster may reach capacity.
- The high estimate is based on your cluster getting filled at a rate that is the 80th percentile of your daily capacity growth.
- The middle estimate is the 50th percentile and the lower estimate is the 20th percentile.
The event timeline shows a variety of events tracked by Qumulo in reverse chronological order. Events shown are color-coded by importance and include upgrades, snapshots, quotas, quorum changes, ldap and more.
- RED is high importance
- ORANGE is medium importance
- GREEN is low importance
LATENCY AND OPS
The latency and protocol op details visualization is one of Qumulo Trend's more advanced visualizations. Here you can see what changed in your workload to cause latency or review the operations in play.
- Use the slider (drag or change ends) to select an interesting time range from the previous week.
- Blue line is the average latency
- Green bar is the operation count
- The big list of operations over time below the slider is sorted with the highest total times with op count * average latency at the top.
- The rectangles are made up of latency (blue line) and operation count (green line). They are both defaulted to a square root scale due to the high variability across workloads.
- Click the Normalize by op type checkbox to adjust each row's op count (green bar) to be size relative to the other bars in the row. Selecting this option makes it easier to see big changes in your workload.
SETTINGS & EMAIL ALERTS
To see a list of your clusters in Qumulo Trends, navigate to the Settings Menu at the top of the page. Here you'll be able to check the box for any one of your clusters that would enable the following:
- Show on Home Page: selected clusters will be displayed on the Qumulo Labs home page upon login
- Email Alerts: selected clusters will prompt email alerts when an event occurs including upgrades, quorum events, or drive loss
- Qumulo Care highly recommends enabling this feature so that you can proactively monitor your cluster's activity
Qumulo Trends is the go to place to explore and gather additional details and visualization on your cluster. Keep in mind that Qumulo Trends is a work in progress and will continue to improve with feedback from Qumulo customers.
Head on over to Qumulo Community to let us know what you think including what works, what doesn't, and additional data you would like to see.
You should now be able to successfully utilize the data available in Qumulo Trends
Like what you see? Share this article with your network!