Print Email PDF

Identify File and Byte Range Changes between Snapshots

Using Qumulo's API, file changes between two snapshots of the same directory can easily be identified. Each entry in the returned list will detail the CREATE, MODIFY, or DELETE change operation and will contain the path of the changed file. Additionally, you can use the API to determine the byte ranges that have changed in the newer snapshot from the older snapshot for a specific file. 

Identify the Snapshot IDs

First you'll need to identify the Snapshot IDs for the points in time you will be referencing.

  1. Login to the Qumulo Core Web UI.
  2. Hover over Cluster and click Saved Snapshots.

    snapshots_menu.png

  3. Copy the Snapshot IDs for the first and last timepoints from the Created/Snapshot Name Column.

    snapshot_id.png

Identify the File Changes

Once you have the IDs, you can use the following qq command replacing the ID_HERE portions with the appropriate snapshot IDs to list the file changes.

qq snapshot_diff --newer-snapshot ID_HERE --older-snapshot ID_HERE 

Change operations will be included in the results list as specified by the details below:

  • Directories are shown with a trailing slash in their path
  • When directories are created or deleted, it’s assumed that all directories and files under them are recursively created or deleted as well
  • Renames appear as a delete of the old name and a create of the new name
  • Adding links to a file (hardlinking) appears as a create of the new link’s path

IMPORTANT! The time to discover all the changes is proportional to the number of files changed and not to the number of total files in the directory.

Example of snapshot_diff 

Below you'll find an output example where the following steps were taken before using the snapshot_diff API:

  • Snapshot an empty directory /mchmiel/
  • Add a directory /mchmiel/new_dir/
  • Make a new file in that directory /mchmiel/new_dir/file
  • Make a new top level file /mchmiel/new_file
  • Snapshot the directory /mchmiel/

snapshot_diff__1_.png

From the results list above, the new_dir/ was created and it’s assumed any data inside that directory was also created. The output also displays that new_file was created. Lastly, the directory /mchmiel/ was modified because children inside it were added, modifying its timestamps.

The output below displays the returned list after deleting all the data in the /mchmiel/ directory.

removed_snapshot__1_.png 

The directory and all its contents, including the file, were deleted as described. The parent directory was modified since it had children deleted, updating its timestamps.

Identify the Changed Byte Ranges of a File 

Once you have the file ID or path and two snapshots that include the file, you can use the API to determine the byte ranges that have changed in the newer snapshot from the older snapshot. Changes can include FILE_REGION_DATA or FILE_REGION_HOLE. New bytes that were written in that range will be reflected with a DATA region while a HOLE region specifies a newly deallocated region.

qq snapshot_file_diff --newer-snapshot ID_HERE --older-snapshot ID_HERE {--path FILE_PATH or --id FILE_ID} 

Example of snapshot_file_diff 

Below you'll find an output example where the following steps were taken before using the snapshot_file_diff API:

  • Create a file /myfile with 1 MByte of data
  • Create Snapshot 1
  • Write “Hello World.” at the beginning of the file
  • Truncate the file to 2 Mbyte
  • Append “Good night world.” to the file (17 bytes of data)
  • Create snapshot 2

Now compare the old and the new version of the file as shown in the example below:

snapshot_example.png

  • The first entry represents the first block of data being overwritten with Hello world at offset 0. Even if the string is only 12 bytes long, the whole block has been newly written.
  • The second entry represents the hole created by truncating the file to a larger size.
  • The third entry represents the extra data written at the end. The tailing piece’s size may not be block-aligned if the file size is not a multiple of block size.

NOTE: This API requires the file to be present in both snapshots.

Pagination of Results

  • Page sizes can be used to retrieve up to 10,000 results per call. The default is 1,000 results.
  • If there are more results to retrieve, a “next” endpoint will be generated and output. Using this endpoint will retrieve the next data set.

Keep in mind that differences between two snapshots will always be ordered in the same way no matter how many times you call the API or what combination of page sizes you use.

Was this article helpful?
0 out of 0 found this helpful

Comments

0 comments

Please sign in to leave a comment.

Have more questions?
Open a Case
Share it, if you like it.