Tuesday, August 15, 2017

NSX Basic Concepts, Tips and Tricks

NSX and Network Teaming

There are multiple options how to achieve network teaming from ESXi to the physical network. For more information see my another blog post "Back to the basics - VMware vSphere networking".

In a nutshell, there are generally three supported methods how to connect NSX VTEP(s) to the physical network
  1. Explicit failover - only single physical NIC is active at any given time, therefore no load balancing at all
  2. LACP - single aggregated virtual interface where load balancing is done based on hashing algorithm
  3. Switch independent teaming achieved by multiple VTEPs where each VTEP is bind to different ESXi pNIC.
Let's assume we have switch independent teaming with multiple independent uplinks to the physical network. Now the question is how to check VM vNIC to ESXi host pNIC mapping? I'm aware of at least four methods how to check this mapping
  3. NSX Controller
  4. NSX Manager
1/ ESXTOP method
  • ssh to ESXi
  • run esxtop
  • Press key [n] to switch to network view
  • Check column TEAM-PNIC – it should be different vmnic (ESXi pNIC) for each VM
2/ ESXCLI method
  • ssh to ESXi
  • Use command “esxcli network vm list” and locate World IDs of VM
  • Use “esxcli network vm port list -w ” and check “Team Uplink” value. It should be different vmnic (ESXi PNIC) for each VM
3/ NSX Controller method
  • Identify MAC address of VM
  • Login to NSX Controller nodes (ssh or console) one by one
  • Use command “show control-cluster logical-switches mac-table ” to show mac-address to VTEP mappings. I assume multi VTEP configuration where each VTEP is statically bound to particular ESXi pNIC (vmnic)
4/ NSX Manager method
  • Identify MAC address of VM
  • Login to NSX Manager (ssh or console)
  • Go through all controllers and show mac address table where is also information behind which VTEP particular mac address is
  • i) show controller list all
  • ii) show logical-switch controller controller-1 vni 10001 mac
  • iii) show logical-switch controller controller-2 vni 10001 mac
  • iv) show logical-switch controller controller-3 vni 10001 mac
The appropriate method is typically chosen based on the role and Role Based Access Control. vSphere Administrator will probably use esxtop or esxcli and Network Administrator will use NSX Manager or Controller.

Distributed Logical Router (DLR)

DLR is a virtual router distributed across multiple ESXi hosts. You can imagine it as a chassis with multiple line cards.  Chassis is virtual (software based) and line cards are software modules spread across multiple ESXi hosts (physical x86 servers).

The basic concept of DLR is that every routing decision is done locally which means that NSX DLR always performs local routing on the DLR instance running in the kernel of the ESXi hosting the workload that initiates the communication. When VM traffic needs to be routed to another logical switch, it first comes to DLR on the same ESXi host where VM is running. Each DLR line card module (ESXi host) has all logical switches (VXLANs) connected locally so DLR forwards the packet to the appropriate destination logical switch and if the target VM runs on another ESXi host the packet is encapsulated on local ESXi host and decapsulated on target ESXi host.

It is good to know, that DLR uses always the same MAC address for default gateway addresses for all logical switches. This MAC address is called VMAC. This is a MAC address used for DLR logical L3 interfaces (LIFs) connected into logical switches (VXLANs).

However, there must be some coordination between multiple DLR "line card" modules (ESXi hosts) therefore each DLR module must also have physical MAC address. This MAC address is called PMAC.

To show DLR PMAC and VMAC run following command on ESXi host
net-vdr -l -C

Distributed Logical Firewall (DFW) - firewall rules

NSX Distributed Firewall applies firewall rules directly to VM vNICs. In the vNIC is the concept of slots where different services are bind and chain together. NSX DFW sits in slot 2 and for example, the third party firewall sits in slot 4.

So the DFW firewall rules are automatically applied on each vNIC so the question is how to double check what rules are at vNIC level.

There are two methods how to check it
  1. ESXi commands
  2. NSX Manager commands
1/ ESXi method
  • ssh to ESXi
  • Use command “summarize-dvfilter” and locate the VM of your interest and its vNIC name is slot 2 used by agent vmware-sfw
  • grep commands can help us here ... "summarize-dvfilter | grep -A 10 "
  •  vNIC name should looks similar to nic-24565940-eth0-vmware-sfw.2
  • Now you can list firewall rules by command "vsipioctl getfwrules -f nic-24565940-eth0-vmware-sfw.2"

2/ NSX Manager method (https://kb.vmware.com/kb/2125482)
  • Log in to the NSX Manager with the admin credentials
  • To display a summary of DVFilter information, run the command "show dfw host-id summarize-dvfilter"
  • To display detailed information about a vnic, run the command "show dfw host host-id vnic"
  • To display the rules configured on the filter, run the command "show dfw host host-id vnic vnic-id filter filter-name rules"
  • To display the addrsets configured on the filter, run the command "show dfw host host-id vnic vnic-id filter filter-name addrsets"
And again, the appropriate method is typically chosen based on the administrator role and Role Based Access Control. 

Distributed Logical Firewall (DFW) - third party integration and availability considerations

NSX Distributed Firewall supports integration with third party solutions. This integration is also called service chaining. Third party solution is hooked to a particular vNIC slot and usually, some selected or potentially all (not recommended) traffic can be redirected to third-party solution agent running on each ESXi host as a special Virtual Machine. The third-party solution can inspect the traffic and allow or deny the traffic. However,  what happens when agent VM is not available? It is easy to test it, you can Power Off Agent VM and see what happens. Actually, the behavior depends on Service failOpen/failClosed policy.  You can check policy setting as depicted on the screenshot below ...

Service failOpen/failClosed policy
If failOpen is set to false then the virtual machine traffic will be dropped in case the agent is unavailable. It has a negative impact on availability but positive impact on security. If failOpen is set to true then the VM traffic will be allowed and everything works even the agent is not available. In such situation, the security policy cannot be enforced and there is a potential security risk. So this is typical design decision point where a decision is dependent on customer specific requirements.

Now the question is how failOpen setting can be changed. Well, my understanding is that it depends on third party solution. Here is the link to TrendMicro how to - "Set vNetwork behavior when appliances shut down"  


FlyingLlama said...

Thanks for the post! You've had this up for over a year and haven't received any comments so I wanted to say posts like these DO provide value even if you don't hear it.

vikram M said...

Awesome summary, Please write more blogs which could help us.