Thursday, September 09, 2021

vSphere design : ESXi protection against network port flapping

I've just finished a root cause analysis of VM restart in customer production environment, so let me share with you the symptoms of the problem, current customer's vSphere design and recommended improvement to avoid similar problems in the future. 

After the further discussion with customer we have identified following symptoms:

  • VM was restarted in different ESXi host
  • original ESXi host, where VM was running before the restart, was isolated (network isolation)
  • vSAN was partitioned

What does it mean?

Well, for those understanding how vSphere HA Cluster works it is pretty simple diagnosis.

  • ESXi was isolated from the network
  • HA Cluster "Response for Host Isolation Response" was set to "Power Off and restart VMs"
    • this is recommended setting for IP storage, because when network is not available, there is a huge probability, the storage is not available and VM is in trouble
    • customer has vSAN, which is a IP storage, therefore such setting makes perfect sense

That having said, this was the reason VM was restarted and it is expected behavior to achieve higher VM availability in cost of some small unavailability because of VM restart.   

However, there is a logical question.

Why was ESXi isolated from network when there is network teaming (vmnic1 + vmnic3) configured?

The customer environment is depicted on design drawing below.

When vSAN is used, vSphere HA heart beating is happening across vSAN network, therefore vmk3 L3 interface (vSAN) is in use, leveraging vmnic1 and vmnic3 uplinks. Customer has both uplinks active with "Route based on originating virtual port", therefore the traffic goes either through vminc1 or vmnic3. This is called uplink pinning and only one uplink is used for vSphere HA heart beat traffic.

Customer is using VMware LogInsight (syslog + data analytics) for central log management, therefore troubleshooting was a piece of cake. We have found vmnic3 flapping (link up, down, up, down, ...) and Fault Domain Manager (aka FDM) log message about the host isolation and VM restart.

Cool, we know the Root Cause, but what options do we have to avoid such situation?

Well, the issue described above is called Network Port Flapping and in such single port issue, in our case with vmnic3, the vmk3 (vSAN, HA heart beat) interface was originally pinned to vmnic3 and when vmnic3 went down, vmk3 was failed over from vmnic3 to vmnic1. However, because vmnic3 went up, the fail-over process was stooped and kept on vmnic3. Nevertheless, vmnic3 went down, up, down, up, etc. again and as network was very unstable, vSphere HA heart beating failed. As we do not have traditional datastores, there is no vSphere HA storage heart beating and we only rely on network heart beating which failed, thus ESXi host was claimed as isolated, and VM was Powered Off and restarted on another ESXi within vSphere cluster, where VM can provide application services running within VM again. This is actually the goal of vSphere HA, to increase VM services availability and network availability is part of the availability.

So, what is port flapping?

Source: https://lantern.splunk.com/IT_Use_Case_Guidance/Infrastructure_Performance_Monitoring/Network_Monitoring/Managing_Cisco_IOS_devices/Port_flapping_on_Cisco_IOS_devices

Port flapping is a situation in which a physical interface on the switch continually goes up and down, three or more times a second for at least 10 seconds.

Common causes for port flapping are bad, unsupported, or non-standard cable or other link synchronization issues. The cause for port flapping can be intermittent or permanent. You need a search to identify when it happens on your network so you can investigate and resolve the problem.

How to avoid port flapping consequences in vSphere Cluster?

(1) Link Dampening. There are some possibilities in Ethernet switch side. I was blogging about "Dell Force10 Link Dampening" few years ago, which should help in these situations.

(2) There is VMware vSwitch "Teaming and failover" option Failback=No available through GUI.


(3) And there is ESXi advanced setting "Net.teampolicyupdelay" which is something like "Link Dampening" described above. Source: https://kb.vmware.com/s/article/2014075

Each option above has their own benefits and drawbacks

+ means benefit

- means drawback

Let's go option by option and discuss pluses and minuses.

Option 1: Physical Ethernet switch Link Dampening

+ per physical switch port setting, therefore not too much places to set, but still some effort. Some switches supports profile configuration which can have positive impact on manageability.

- such feature might or might not be available for particular network vendor and if available, configuration varies vendor by vendor

- must be done by network admin, therefore vSphere admin does not have rights and clue about such setting and you must explain and justify it to network admin, network manager, etc.

Option 2: VMware vSwitch "Teaming and failover / Failback=No"

+ per vSwitch port group setting, therefore, single and straight forward setting in case of Distributed Virtual Switch (aka VDS)

- In case of Standard Virtual Switch (aka VSS), the setting must be done for vSwitch on each ESXi host, which has negative impact on manageability

- it will failover all trafic from flapping vmnic to fully operated vmnic, but it will never failback until ESXi restart. It has a positive impact on availability but potentially negative impact on performance and throughput

Option 3: ESXi advanced setting "Net.teampolicyupdelay"

- per ESXi advanced setting, which is not perfect from manageability point of view

+ it has a positive impact on availability and also performance, because in case of temporary flapping issue, it can failover traffic back after some longer time, lets say 5 or 10 second.

- Unfortunately, there is no such granularity like Force10 Link Dampening, which can penalize the interface based on flap frequency and decays exponentially depending on the configured half-life.

Conclusion

What option should customer implement? To be honest, it is up to cross team discussion, because each option has some advantages and disadvantages. Nevertheless, there are some options to consider to increase system availability and resiliency.

Hope you have found this write up useful. This is my give back to VMware community. I believe that sharing the knowledge is the only way how to improve not only technology but human civilization. Do you have another opinion, options or experience, please do not hesitate to write a comment below this article.

Thursday, September 02, 2021

ESXi, Intel NICs and LLDP

This will be a very short blog post because Dusan Tekeljak has already written a blog post about this topic. Nevertheless, I was not aware about such Intel NIC driver behavior which is pretty interesting, thus writing this blog post for broader awareness.

My customer who is modernizing their physical networking and implementing Cisco ACI, therefore moving from CDP (Cisco Discovery Protocol) to LLDP (Link Layer Discovery Protocol), which is industry standard. They have observed, that LLDP does not work on some NICs and after further troubleshooting, they realized it happens only on ESXi hosts with Intel X710 NIC. The NICs from other vendors (Broadcom, QLogic) worked as expected.

After some further research, they found following internet articles

and after opening this topic with server vendor (HPE), they got the following Customer Avisory

Long story short Intel NIC driver contains the "LLDP agent" which is by default enabled and consumes LLDP ethernet frames. By disabling LLDP agent within the Intel NIC driver, LLDP is not handled by NIC driver and can be observed within ESXi hypervisor, thus vSwitch network discovery via LLDP works as expected.

If you have four (4) port Intel NIC you must use following command to disable LLDP agent on all four NIC port.

esxcli system module parameters set -m i40en -p LLDP=0,0,0,0

On HPE server, this behavior can be controlled via BIOS. The feature for disabling the Link Layer Discovery Protocol (LLDP) in the BIOS/Platform Configuration (RBSU) has been included in the HPE Intel Online Firmware Upgrade released in 20 Dec 2019.