- ESXi - Misc.HeartbeatPanicTimeout
- VPXD (aka vCenter) - vpxd.das.heartbeatPanicMaxTimeout
esxcli system settings advanced list | grep -A10 /Misc/HeartbeatPanicTimeout
[root@esx01:~] esxcli system settings advanced list | grep -A10 /Misc/HeartbeatPanicTimeout Path: /Misc/HeartbeatPanicTimeout Type: integer Int Value: 14 Default Int Value: 14 Min Value: 1 Max Value: 86400 String Value: Default String Value: Valid Characters: Description: Interval in seconds after which to panic if no heartbeats received
- if you have single standalone ESXi host not connected to HA Cluster effective value is 900 seconds
- if you have ESXi host as a member of vSphere HA Cluster then the value is 14 seconds
Side note: It was very different in vSphere4/ESXi4 because HA cluster was rewritten in vSphere 5 from the scratch but it is already a history and I hope nobody use vSphere4 anymore.
Behavior described in paragraph above makes perfect sense if you ask me. If you have standalone ESXi host and you are experiencing some hardware issue it is better to wait 900 seconds (15 minutes) before ESXi goes to PSOD state because virtual machines running on top of this ESXi host cannot be automatically restarted in other ESXi hosts anyway. And guess what, if ESXi host have some significant hardware failure, it has most probably negative impact on virtual machines running on top of this particular ESXi host, right? Unfortunately, if you have just a single ESXi host vSphere cannot do anything for you.
On the other hand, if affected ESXi host is a member of vSphere HA cluster then it is better to wait only 14 seconds (by default) or maximally 60 seconds and put ESXi host into PSOD quicker because HA cluster will restart affected virtual machines automatically and helps to mitigate the risk of unavailable virtual machines and with that application services running inside these virtual machines.
So that's the explanation how ESXi setting /Misc/ HeartbeatPanicTimeout behaves. Now we can look what vpxd.das.heartbeatPanicMaxTimeout setting is. My understanding is that vpxd.das.heartbeatPanicMaxTimeout is vCenter (VPXD) global configuration for ESXi advanced setting Misc.HeartbeatPanicTimeout. But don't forget that HA cluster is capping Misc.HeartbeatPanicTimeout value on ESXi hosts as described above.
You can read further details about vpxd.das.heartbeatPanicMaxTimeout in VMware KB 2033250 but I think that following description is little bit misleading.
"This option impacts how long it takes for a host impacted by a PSOD to release file locks and hence allow HA to restart virtual machines that were running on it. If not specified, 60s is used. HA sets the host Misc.HeartbeatPanicTimeout advanced option to the value of this HA option. The HA option is in seconds."
"This option is in seconds and impacts how long it takes for ESXi host experiencing some critical issue to go into a PSOD. Setting vpxd.das.heartbeatPanicMaxTimeout is a global setting used for vCenter managed ESXi advanced option Misc.HeartbeatPanicTimeout however Misc.HeartbeatPanicTimeout is adjusted automatically in certain situations.Potential side effects and impacts
In standalone ESXi host 900s is used. In vSphere HA Cluster ESXi host it is automatically changed to 14s and capped to maximum of 60s. This setting have indirect impact on time when file locks are released and hence allow HA cluster to restart virtual machines that were running on affected ESXi host."
- ESXi HA Cluster restart of virtual machines - if your Misc.HeartbeatPanicTimeout is set to 60 seconds than HA cluster will most probably try to restart VMs on another ESXi hosts because network heartbeat (also 14 seconds) will not be received. However because it is not in PSOD the file lock still exist and VM restart will be unsuccessful.
- ESXi Host Profiles - if you use the same host profile for HA protected and also non-protected ESXi hosts then it can report difference of Misc.HeartbeatPanicTimeout against compliance.
- Scott Norris : Exception 14 and no heartbeat (2/2 IPIs received) PSOD
- VMware KB : ESXi/ESX host using Emulex 10Gb NIC cards fails with a purple diagnostic screen with the error: PCPU #: no heartbeat (2053642)
- HPE.com : Proliant DL380e Gen8 keeps crashing (PSOD) after upgrade to HP customized VMWare ESXi 5.5 U2