IN THIS ARTICLE
Outlines how Snapshots work within Qumulo Core including:
- Snapshots and Storage
- Snapshot Deletes
- Snapshot Capacity
- Cluster running Qumulo Core 2.5.0 and above
The Qumulo Core file system has an entry for each version of every file. If any data in a file is modified after a snapshot has been taken, we create a new entry for that version of the file which points to just the new data. This has several benefits including:
- Instantaneous snapshots
- No storage is consumed when a snapshot is taken
- No performance penalty for taking a snapshot
- Little to no performance penalty for reading and writing data from the filesystem in the presence of snapshots
New data is written alongside the original. When possible, the new version of the file shares data with the old. When data is written to a snapshotted file, there will then be two entries stored in the file system as two different versions of the same file. As more data is modified, more file data is added to this entry.
This allows us to preserve snapshotted data for everything in the Qumulo Core file system including file data, directory entries, creation/modification time, permissions, and more.
Snapshots and Storage
In Qumulo Core, a snapshot is taken instantaneously and no extra storage overhead is needed. The snapshot data then accumulates over time as data in the file system is modified which allocates new storage space for the data and links it into the file metadata.
Consider the following 4MB file:
- First, the file is filled with 4MB of data and a snapshot is taken
- Then, 1MB in the middle of the file is modified
- The filesystem allocates a new 1MB region for the modified data
- 3MB of the original 4MB of data is shared between both versions of the file
- 1MB of new data exists in the newest, or “live,” version of the file
- 1MB of old data exists only in the snapshotted version of the file
- The total storage usage for this file is now 5MB
If that particular 1MB is rewritten again, the existing live data would simply be overwritten without allocating the new space. If a different region of the file is rewritten instead, Qumulo Core will allocate more storage.
Storage usage is driven directly by the workload and data is only duplicated when necessary or on demand. There is no performance penalty for taking a snapshot and little to no performance penalty for reading and writing data from the filesystem in the presence of snapshots.
Snapshot Data Consumption
As highlighted above, Qumulo's snapshots are taken instantly and contain a point-in-time representation of the current state of a directory. These snapshots reference the data present in a directory at the time the snapshot is taken.
When changes are made to a file, they are internally tracked as being different from the latest snapshot which results in different versions of files. Essentially, you have this saved data in two forms:
• Saved data stored in a snapshot
• Live data that has not been saved yet
This creates a lineage of snapshots that are independent of one another. When a snapshot is deleted, we remove the data covered by that snapshot while data referenced in any other snapshot is still retained so that a full representation of a file within other snapshots can be provided. When checking the size of a snapshot, only the data referenced in the snapshot is shown in the output of the command below:
qq snapshot_get_capacity_used_per_snapshot --id SNAPSHOT ID
When data is referenced in more than one snapshot, it is covered. Covered data cannot be released unless all covering snapshots are deleted. This total amount of data, which includes covered data and existing snapshots, is included in the output of the following command:
For example. running qq snapshot_get_total_used_capacity says usage is 1319413953331 Bytes (1.2 TB Used).
However, if you were to add up all snapshots currently on the system you have qq snapshot_get_capacity_used_per_snapshot saying snapshot usage is 2147483648 Bytes (2 GB Used).
The former is the total snapshot data which includes the data that is covered by snapshots. The latter is the differences or changes to the data stored in each specific snapshot. This does not include the unchanged portions of files if you're looking at each individual snapshot. To see the total covered data, including data no longer present in a snapshot, you can include multiple snapshot ids (comma separated) in the qq snapshot_get_capacity_used_per_snapshot command.
Imagine a scenario where you have a 1TB file that was modified over time:
Snapshot 1: 1099511627776 Bytes (Contains the full 1TB file)
Snapshot 2: 1073741824 Bytes (Contains 1GB worth of changes to the 1TB file.)
Snapshot 3: 1073741824 Bytes (Contains an additional 1GB worth of changes to the 1TB file.)
If Snapshot 1 is deleted, you end up with 1023GB of data that is covered by Snapshots 2 and 3. The 1023GB of data is not freed up until all the snapshots that reference the file are deleted as well due to the need to retain the data until no snapshot references the file. Without the data that is covered (no longer present in the active snapshots 2, 3), there can be no representation of the full file.
When you delete a snapshot, it is immediately removed from the list of snapshots and is no longer accessible. The space consumed by that snapshot is then recovered in a background process that frees data blocks that are no longer used by any snapshot.
Although the snapshot disappears immediately, the background process may take some time to reclaim space for the file system. To monitor progress, you can run the following command during deletion to track the reclaimed space as the snapshot capacity decreases.
You should now have an overall understanding of how snapshots work within Qumulo Core
Like what you see? Share this article with your network!