After doing an upgrade from ESXi 4.1/ESXi 5.0 to ESXi 5.5 u2 I began noticing increased latency events on hosts. More troubling was that the host were frequently dropping all of their datastores, though they would reconnect within a few seconds.
While there are many possible causes to explore these sorts of connectivity issues one that is often overlooked is how ESXi actually does heartbeating to the datastores. Starting with ESXi 5.5 u2 the default datastore heartbeating protocol switch to Atomic Test and Set (ATS). Previous releases of ESXi used a SCSI reservation method to confirm the datastores were still present.
With ATS, the host sends a heartbeat packet to the datastore every 8 seconds to make sure the datastores are still available. If the storage array is VAAI aware and is compatible with ATS heartbeating this works fantastic. If the array can't identify this heartbeat packet and respond to it very quickly, well, the work Atomic is fitting.
Storage arrays that don't use VAAI plugins to identify ATS heartbeats treat those packets as any other storage request. During a point of high I/O the array will put the ATS heartbeat into the queue and respond to it eventually. ATS is very sensitive to latency so it starts to panic and sends another packet. Eventually it just drops the connection and reconnects.
Since the storage is present the reconnect is quick, however the momentary drop will pause the running VMs. Many applications aren't written to withstand a 7 second disconnect and this can lead to app crashes and cranky end users.
So, how do we fix this? Well ideally you would check with the storage vendor to see if there's an update or a plugin to make it support ATS. This isn't always possible, so luckily for us there is an easy, scriptable way to change this setting. It uses PowerCLI and can do an entire datacenter in one line.
To start, let's check the setting on one host:
Get-VMHost "hostname" | Get-AdvancedSetting -Name VMFS3.UseATSForHBOnVMFS5
This will return a line to tell you if have the feature enabled. In 5.5 u2 and beyond it will be enabled.
Now we can disable the setting on that host:
Get-VMHost | Get-AdvancedSetting -Name VMFS3.UseATSForHBOnVMFS5 | Set-AdvancedSetting -0 -Confirm:$false
I included the Confirm:$false line so I don't have to say I am sure I want to change this after every host when I do the full datacenter.
Get-Datacenter "datacenter" | Get-VMHost | Get-AdvancedSetting -Name VMFS3.UseATSForHBOnVMFS5 | Set-AdvancedSetting -0 -Confirm:$false
This will identify the datacenter in your vCenter server, get all the hosts in the datacenter, and disable ATS on VMFS5 datastores.
You can find more information on this and other VAAI issues in VMware KB 1033665