Print Email PDF

Configuring VMware vSphere for Handling Cluster Upgrades and Reboots

REQUIREMENTS

  • Running Qumulo Core OVA
  • VMware vSphere 

DETAILS

At Qumulo headquarters, we have implemented a configuration setting to our internal VMware vSphere 6 hosts that allows our hosts to better handle Qumulo cluster upgrades & reboots.

A storage device is considered to be in an All Paths Down (APD) state when the it remains unreachable for a specified length of time. The default on vSphere 6 is 140 seconds, typically not long enough to survive a reboot of the host. If the host marks the datastore as APD, further I/O will cause read-only file systems in Linux guests, and has the potential to blue-screen Windows systems.

To help avoid these types of issues during an upgrade, we recommend increasing the APD timeout values:

  • From Hosts and Clusters, select the Host
  • Click the Manage tab > Settings
  • Select Advanced System Settings
  • Ensure Misc.APDHandlingEnable is set to 1
  • Change Misc.APDTimeout to the value you'd like in seconds. Note that the default value is 140

As seen in the screenshot below, we use a setting of 1200 which gives the hosts twenty minutes to retry accessing physical storage. Note that it is best to keep VMware Tools up to date on your guest operating systems to ensure the latest scsi timeouts are applied as well.

manage_apdhandling.png


TIP: You should also consider setting CPU Reservations and High Latency Sensitivity on any virtual domain controllers. This will ensure that they continue to perform at a high level regardless of neighboring VM workload. This can improve metadata latency on your Qumulo clusters.

Was this article helpful?
1 out of 1 found this helpful

Comments

0 comments

Please sign in to leave a comment.

Have more questions?
Open a Case
Share it, if you like it.