IN THIS ARTICLE
Outlines the capabilities of Qumulo Core's Capacity Trends
- Cluster running Qumulo Core
That little fuel gauge on your car’s dashboard is critical. Do I need to fill up today? Do I have enough gas to get across town? Without that gauge, you’d soon be hitchhiking with a gas can.
On your storage system, information about your capacity consumption is equally critical. Do I have enough storage to start a new project? When will I run out of storage? Is my business going to grind to a halt because there’s no space left? To answer these questions, you need data. Not only do you need to know your current capacity consumption, but you also need to know how it’s changing over time.
Qumulo Core’s Capacity Trends graph shows your consumption over 72 hours, 30 days, or 1 year. You can also pull this data from the API to build and save your own views of the capacity consumption.
All cars have a fuel gauge. Some cars can tell you more about your gas consumption. Maybe they tell you average MPG or how many miles you can go to empty. But that’s simple stuff. Imagine if your car could tell you that driving with your golf clubs the trunk uses 4% more gas. Or that your teenage son used 1.3 gallons of gas at 2:30 am yesterday. Likewise, every storage product can tell you something about your capacity consumption. You need more than that, however, to be able to truly manage your data.
For example let’s say you left work on Friday and the storage was 50% full. When you come in on Monday, the storage is 90% full. What the heck happened? Did a scientist write a script that generated a ton of data? Or did an intern upload his entire music collection?
With Qumulo Core, you can click on one of the bars in the Capacity Trends graph and see the most significant capacity changes by path, both additions and deletes, during that time period. Now you know what data changed.
All storage systems can tell you something about your storage, but only Qumulo Core gives you real-time analytics to tell you about your data.
How it Works
At the top of each hour we save the overall used capacity and begin a background process to collect a list of the paths that consume a significant amount (> 0.1%) of the overall used capacity. The background data collection process doesn’t have any significant impact on overall performance. This information is stored on the cluster, and will consume a small part (up to 4MB/day, but usually much less) of the usable storage.
When you click a bar in the graph to get the details of what changed during a time period (tn), the UI compares the list of paths from tn and tn-1. It then displays up to 20 paths and the amount of change, either positive (writes) or negative (deletes). Moving a large file within the cluster will show as both a delete from the source directory and a write to the destination.
As with our other analytics pages, we do “smart aggregation” of the paths to simplify what gets shown to the user. If the only significant changes happen deep within the file tree (/foo/bar/baz/qux/norf), you’ll see only that path in the detailed change data and not the parent directories (/foo/, /foo/bar/, etc...). If lots of small files and directories were changed in /foo/ that add up to a significant change, you’ll see only /foo/. If both types of change happen simultaneously, you’ll see changes associated with /foo/bar/baz/qux/norf/ as well as /foo/.
This design means that we are not collecting or showing all changes for a given time period. Instead we are showing only the most significant changes. Significant is defined as consuming 0.1% or more of the total used capacity. This feature is not an audit log. It’s intended to show high level trends.
There are two API calls for this feature.
- This takes a two timestamps and an interval as inputs and returns the used capacity history.
- This takes a timestamp as an input and returns the paths that used a significant amount of the capacity as well as a “threshold_for_inclusion”. The inclusion threshold is the number we used to determine whether something is significant (greater than 0.1% of total used capacity at that time).
Note that if you wanted to reconstruct the UI experience, you’d have to calculate the differences between timestamps on your own.
sudo mkdir /mnt/qumulo
There are several cases that may cause the data to be incomplete or less detailed than desired.
- Unable to get the total used capacity - There are several reasons why the data for a given hour may not be available. If the cluster is down or was 100% full, it won’t be possible to store the data point. The graph will show a gap in the line and say “no data.”
- Unable to collect details - To gather the data needed to list the detailed breakdown, we have a background task that collects the aggregate data from the file system. If this is takes more than 55 minutes (due to very deep directories or a cluster under heavy load) we will stop the process. The process can also get interrupted by a quorum event. In those cases, when you select a time period, you’ll see “Sorry, no details are available for [time window]”
- Details are not very granular - In the cases of very wide directories or lots of small changes, the only significant change we’ll report could be root or a high level directory. Since the threshold for showing up as a significant change is a percentage of total used capacity, changes smaller than that won’t show at all. On a large cluster, this threshold may be quite high.
You should now have a general understanding of the capabilities of Qumulo Core capacity trends
Like what you see? Share this article with your network!