Tuesday, February 11, 2020

Host cannot communicate with one or more other nodes in the vSAN enabled cluster

I work as VMware HCI Specialist, therefore I have to do a lot of vSAN testing and demonstrations in my home lab. The only reasonable way how to effectively test and demonstrate different vSAN configurations and topologies is to run vSAN in a nested environment. Thanks to a nested virtualization, I can very easily and quickly build any type of vSAN cluster.

Recently I have experienced the issue in 3-node (nested) vSAN cluster. I have seen vSAN datastore capacity just of a single node instead of three nodes and on hosts was an error message "Host cannot communicate with one or more other nodes in the vSAN enabled cluster".

The first idea was about networking issue but ping between nodes was working ok so it was not a physical network issue. This is the lab environment so all services (mgmt, vMotion, vSAN) are enabled on single VMKNIC (vmknic0) so everything is pretty straight forward.

So what's the problem?

I did some google searching and found that some people were seeing the same error message when experiencing problems with vSAN unicast agents.

Here is the command to list of unicast agents on vSAN node

esxcli vsan cluster unicastagent list

I test it in my environment.
Grrrr. The list is empty!!!! It is empty on all ESXi hosts in my 3 nodes vSAN cluster.

Let's try to configure it manually.

Each vSAN node should have a connection to agents on other vSAN nodes in the cluster.

For example, one vSAN node from 4-node vSAN Cluster should have 3 connections

 [root@n-esx04:~] esxcli vsan cluster unicastagent list  
 NodeUuid               IsWitness Supports Unicast IP Address    Port Iface Name Cert Thumbprint  
 ------------------------------------ --------- ---------------- -------------- ----- ---------- -----------------------------------------------------------  
 5e3ec640-c033-7c7d-888f-00505692f54d     0       true 192.168.11.105 12321       18:F3:B7:9F:66:C4:C4:3E:0F:7D:69:BB:55:92:BC:A3:AC:E4:DD:5F  
 5df792b0-f49f-6d76-45af-005056a89963     0       true 192.168.11.107 12321       20:4C:C1:48:F5:2D:04:16:55:F1:D3:F1:4C:26:B5:C4:23:E5:B4:12  
 5e3e467a-1c1b-f803-3d0f-00505692ddc7     0       true 192.168.11.106 12321       53:99:00:B8:9D:1A:97:42:C0:10:C0:AF:8C:AD:91:59:22:8E:C9:79  

We need the get local UUID of the cluster node.

 [root@n-esx08:~] esxcli vsan cluster get  
 Cluster Information  
   Enabled: true  
   Current Local Time: 2020-02-11T08:32:55Z  
   Local Node UUID: 5df792b0-f49f-6d76-45af-005056a89963  
   Local Node Type: NORMAL  
   Local Node State: MASTER  
   Local Node Health State: HEALTHY  
   Sub-Cluster Master UUID: 5df792b0-f49f-6d76-45af-005056a89963  
   Sub-Cluster Backup UUID:  
   Sub-Cluster UUID: 52c99c6b-6b7a-3e67-4430-4c0aeb96f3f4  
   Sub-Cluster Membership Entry Revision: 0  
   Sub-Cluster Member Count: 1  
   Sub-Cluster Member UUIDs: 5df792b0-f49f-6d76-45af-005056a89963  
   Sub-Cluster Member HostNames: n-esx08.home.uw.cz  
   Sub-Cluster Membership UUID: f8d4415e-aca5-a597-636d-005056997c1d  
   Unicast Mode Enabled: true  
   Maintenance Mode State: ON  
   Config Generation: 7ef88f9d-a402-48e3-8d3f-2c33f951fce1 6 2020-02-10T21:58:16.349  

So here are my nodes
n-esx08 - 192.168.11.108 - 5df792b0-f49f-6d76-45af-005056a89963
n-esx09 - 192.168.11.109 - 5df792b0-f49f-6d76-45af-005056a89963
n-esx10 - 192.168.11.110 - 5df792b0-f49f-6d76-45af-005056a89963

And now the problem is clear. All vSAN nodes have the same UUID.
Why?  Let's check ESXi system UUIDs on each ESXi host.

 [root@n-esx08:~] esxcli system uuid get  
 5df792b0-f49f-6d76-45af-005056a89963  
 [root@n-esx08:~]  

 [root@n-esx09:~] esxcli system uuid get  
 5df792b0-f49f-6d76-45af-005056a89963  
 [root@n-esx09:~]  

 [root@n-esx10:~] esxcli system uuid get  
 5df792b0-f49f-6d76-45af-005056a89963  
 [root@n-esx10:~]  

So the root cause is obvious.
I use nested ESXi hosts to test vSAN and I forgot to regenerate system UUID after the clone. 
The solution is easy. Just delete UUID from /etc/vmware/esx.conf and restart ESXi hosts.

ESXi system UUID in /etc/vmware/esx.conf

You can do it from command line as well

sed -i 's/system\/uuid.*//' /etc/vmware/esx.conf
reboot

So we have identified the problem and we are done. After ESXi hosts restart vSAN Cluster Nodes UUIDs are changed automatically and vSAN unicastagents are automatically configured on vSAN nodes as well.

However, if you are interested in how to manually add a connection to a unicast agent on a particular node, you would execute the following command

esxcli vsan cluster unicastagent add –a [ip address unicast agent] –U [supports unicast] –u [Local UUID] -t [type]

Anyway, such a manual configuration should not be necessary and you should do it only when instructed by VMware support.

Hope this helps someone else in VMware community.

Saturday, February 01, 2020

vSphere Integrated Containers - PoC in my home lab

vSphere Integrated Containers (aka VIC) is VMware Enterprise Container Infrastructure. Any VMware customer having VMware vSphere Enterprise Plus can get enterprise container infrastructure to help IT Ops run traditional and containerized applications side-by-side on a common platform with vSphere Integrated Containers. Supporting containers in your virtualized environments means IT teams get the security, isolation, and management of VMs, while developers enjoy the speed and agility of containers—all within vSphere. The VIC project is available on GitHub at https://vmware.github.io/vic-product/

The overall concept is very interesting, especially for customers willing to use Docker containers but not having requirements or skills to operate Kubernetes. The most interesting part of VIC is the networking concept very well explained in following video https://www.youtube.com/watch?v=QLi9KasWLCM&feature=youtu.be


As one of my customers is considering VIC to provide containers to their developers, I have decided to test it in my home lab.

The first step is to deploy VIC from OVF. It's pretty straight forward so I'm not going to document any details.

The second step is to create the first VCH (Virtual Container Host), which acts as a remote Docker server. You can have multiple VCH's and they can be grouped into projects. Let's keep architectural decisions besides at the moment and focus on testing the technology itself.

VCH deployment is typically done via vic-machine CLI Utility as GUI is too slow interface for DevOps approach. Documentation is available at
https://vmware.github.io/vic-product/assets/files/html/1.5/vic_vsphere_admin/using_vicmachine.html

VIC-MACHINE ... is the primary tool for DevOps admin. I will discuss roles, tooling, and RBAC for particular actors in DevOps approach in the next blog post together with the overall architecture.

We can download vic-machine from VIC appliance deployed in step 1. In my case, it is available at https://vic.home.uw.cz:9443/

When we have vic-machine available in our DevOps workstation we can start act as a DevOps engineer.

Note 1: 
If you have self-signed certificate as I have in my lab, you need to get vCenter Thumbprint.You have to ssh to vCenter Server Appliance and get the fingerprint.
openssl x509 -in /etc/vmware-vpx/ssl/rui.crt -fingerprint -sha1 -noout
The vCenter thumbprint in my lab is
SHA1 Fingerprint=64:06:CD:4E:D8:39:8B:E8:80:2D:D3:25:50:C7:B9:7D:E1:6F:8B:E9
Note 2: 
vic-machine tool is available for various operating systems (windows, linux, darwin).  As my workstation is Mac OS X (aka darwin), in commands below I will use ./vic-machine-darwin

Before the first VCH deployment, we have to enable the firewall rules in a particular vSphere cluster (the name of vSphere Cluster in my home lab is CLUSTER).

./vic-machine-darwin update firewall --target vc01.home.uw.cz --thumbprint 64:06:CD:4E:D8:39:8B:E8:80:2D:D3:25:50:C7:B9:7D:E1:6F:8B:E9 --user administrator@uw.cz --compute-resource CLUSTER --allow

Now we are ready to create our first VCH.

Here is the command, how to create VCH.

./vic-machine-darwin create --name vch01 --container-name-convention vch01-{name} --compute-resource CLUSTER --image-store vsan-Underlay --base-image-size 8GB --bridge-network VCH01-BRIDGE --bridge-network-range 172.16.0.0/12 --public-network MGMT --dns-server 192.168.4.4 --container-network CONTAINER01:container01 --container-network-ip-range CONTAINER01:192.168.51.0/24 --container-network-gateway CONTAINER01:192.168.51.254/24 --container-network-dns CONTAINER01:192.168.4.4 --container-network-firewall CONTAINER01:published --tls-cname vch01 --certificate-key-size 2048 --no-tlsverify --user admin@UW.CZ --thumbprint 64:06:CD:4E:D8:39:8B:E8:80:2D:D3:25:50:C7:B9:7D:E1:6F:8B:E9 --target vc01.home.uw.cz/SDDC --ops-user administrator

The output from the command looks similar to ...

INFO[0068] VCH Admin Portal:                            
INFO[0068] https://192.168.4.140:2378                   
INFO[0068]                                              
INFO[0068] VCH Default Bridge Network Range: 172.16.0.0/12 
INFO[0068] VCH Default Bridge Network Width: 16         
INFO[0068]                                              
INFO[0068] Published ports can be reached at:           
INFO[0068] 192.168.4.140                                
INFO[0068]                                              
INFO[0068] Management traffic will use:                 
INFO[0068] 192.168.4.140                                
INFO[0068]                                              
INFO[0068] Docker environment variables:                
INFO[0068] DOCKER_HOST=192.168.4.140:2376 COMPOSE_TLS_VERSION=TLSv1_2 
INFO[0068]                                              
INFO[0068] Environment saved in vch2/vch2.env           
INFO[0068]                                              
INFO[0068] Connect to docker:                           
INFO[0068] docker -H 192.168.4.140:2376 --tls info      
INFO[0068] Installer completed successfully             

Davids-MacBook-Pro:vic-machine dpasek$ 

My first VCH has been created with a single container network (network name CONTAINER01). To add another container network (CONTAINER02) we need to know VCH id, therefore inspect command has to be used.

./vic-machine-darwin inspect --target vc01.home.uw.cz --thumbprint 64:06:CD:4E:D8:39:8B:E8:80:2D:D3:25:50:C7:B9:7D:E1:6F:8B:E9 --user administrator@uw.cz --compute-resource CLUSTER --name vch2

INFO[0005] VCH ID: vm-299                               
INFO[0005]                                              
INFO[0005] VCH Admin Portal:                            
INFO[0005] https://192.168.4.140:2378                   
INFO[0005]                                              
INFO[0005] VCH Default Bridge Network Range: 172.16.0.0/12 
INFO[0005] VCH Default Bridge Network Width: 16         
INFO[0005]                                              
INFO[0005] Published ports can be reached at:           
INFO[0005] 192.168.4.140                                
INFO[0005]                                              
INFO[0005] Management traffic will use:                 
INFO[0005] 192.168.4.140                                
INFO[0005]                                              
INFO[0005] Docker environment variables:                
INFO[0005] DOCKER_HOST=192.168.4.140:2376 COMPOSE_TLS_VERSION=TLSv1_2 
INFO[0005]                                              
INFO[0005] Connect to docker:                           
INFO[0005] docker -H 192.168.4.140:2376 --tls info      

INFO[0005] Completed successfully  

Note that to configure additional network we have to add all already existing networks into the command as well.

./vic-machine-darwin configure --target vc01.home.uw.cz --thumbprint 64:06:CD:4E:D8:39:8B:E8:80:2D:D3:25:50:C7:B9:7D:E1:6F:8B:E9 --user administrator@uw.cz --id vm-299 --container-network CONTAINER01 --container-network CONTAINER02

Docker client is using VCH (aka Virtual Container Host) as a docker server

export DOCKER_HOST=192.168.4.140:2376
docker --tls info
docker --tls network ls

Here is the output of the networks available in my Docker host (aka Virtual Container Host)

dpasek@photon-machine [ ~ ]$ docker --tls network ls   
NETWORK ID          NAME                DRIVER              SCOPE
4313b6b0ab2e        CONTAINER01         external            
6eb9e8911266        bridge              bridge              

dpasek@photon-machine [ ~ ]$ 

and after vic-machine command to add additional network CONTAINER02 you can see the network in container host (VCH) and use it for our containers.

dpasek@photon-machine [ ~ ]$ docker --tls network ls
NETWORK ID          NAME                DRIVER              SCOPE
4a8318cb2738        CONTAINER01         external            
a011cf3ced72        CONTAINER02         external            
7445a9e714ed        bridge              bridge              

dpasek@photon-machine [ ~ ]$ 

Of course, we can use other standard docker commands to manage our docker images and containers.

docker --tls images -a
docker --tls ps -a

Run NGINX and access the service through NAT
docker --tls run --name nginx1 -d –p 8080:80 nginx
docker --tls ps -a
docker --tls images -a

Run NGINX and access the service directly through container network (IP address assigned via DHCP) 
docker --tls run --name nginx2 --network="container01" -d -p 80 nginx

Run NGINX and access the service directly through container network (IP address assigned staticaly) 
docker --tls run --name nginx30 --network="container01" --ip="192.168.51.30" -d -p 80 nginx

Execute a Shell Command from a Container
docker --tls run busybox date
docker --tls ps -a

In the screenshot below you see what vSphere Admin has in his environment.

Containers visibility for vSphere Admin

Specific container (nginx) visibility for vSphere Admin
Docker Volumes

If you need persistent storage for your containers, you can create volumes which are persistently stored in your enterprise infrastructure and visible not only to DevOps admin and Developer, but also to your infrastructure administrator.

Create a Docker Volume
docker --tls volume create --opt Capacity=2GB --name volume-test
docker --tls volume ls
docker --tls volume inspect volume-test

Attach a Docker Volume to a Container
docker --tls run --name busybox -it -v volume-test:/data/volume-test busybox
# cd /data
/data # ls
volume-1
/data # ls –l
total 4
drwxr-xr-x
/data # df -h
This is what Developer and DevOps admin see via Docker Client

dpasek@photon-machine [ ~ ]$ docker --tls volume ls
DRIVER              VOLUME NAME

vsphere             volume-test

And this is what vSphere Admin can see in his environment

Docker volume (2 GB) is visible as 1.92 GB Hard Disk
List VCH instances

DevOps admin manages Virtual Container Hosts. Here is the command to list all VCH's from a particular vCenter.

./vic-machine-darwin ls --target vc01.home.uw.cz --thumbprint 64:06:CD:4E:D8:39:8B:E8:80:2D:D3:25:50:C7:B9:7D:E1:6F:8B:E9 --user administrator@uw.cz

Davids-MacBook-Pro: vic-machine dpasek$ ./vic-machine-darwin ls --target vc01.home.uw.cz --thumbprint 64:06:CD:4E:D8:39:8B:E8:80:2D:D3:25:50:C7:B9:7D:E1:6F:8B:E9 --user administrator@uw.cz 
INFO[0000] vSphere password for administrator@uw.cz:    
INFO[0003] ### Listing VCHs ####                        
INFO[0003] Validating target                            

ID            PATH                                         NAME        VERSION                      UPGRADE STATUS
vm-299        /SDDC/host/UNDERLAY/CLUSTER/Resources        vch2        v1.5.4-21221-b3d3b06f        Up to date

Davids-MacBook-Pro:vic-machine dpasek$ 


Delete VCH

And here is the command to destroy particular VCH.

./vic-machine-darwin delete --target vc01.home.uw.cz --thumbprint 64:06:CD:4E:D8:39:8B:E8:80:2D:D3:25:50:C7:B9:7D:E1:6F:8B:E9 --user administrator@uw.cz --name vch2 --compute-resource CLUSTER

Conclusion

DevOps is a big topic nowadays. It is worth to say that DevOps is about methodology and not the tooling. At the end of the day, it doesn't matter if you use Docker, Kubernetes, Ansible or even more traditional tools. What matters is the final product which has to be developed in sync with the business owner, otherwise, there is a high chance you are not meeting business owner expectations. I'm ex-developer, Software Engineer / Architect and now I work as an Infrastructure Technical Designer / Architect, therefore I know, the DevOps methodology is the best way how to develop and deliver a business application to the business owner. The key principles of DevOps methodology are to identify the core business requirements, quickly develop the application prototype, present it to the business owner, validate if the business owner expectations are met and if so, move the prototype into production and continuously develop and deploy additional business requirements in short iterations with continuous feedback from the business owner. For such an approach, you need to deploy your application into infrastructure dynamically and quickly. Traditional Enterprise Infrastructure is not ready for such agility.

In this blog post, we were testing the tool (VMware vSphere Integrated Containers) which can help Developers, DevOps admins and infrastructure admins cooperate in an agile fashion. It might or might not help to deliver successfully the business project. However, it definitely helps DevOps teams to have agile DevOps infrastructure with Enterprise level of support, security, availability, and cooperation between Developers, Security, and Infrastructure (compute, storage, network) teams which each lives in the totally different world and they do not understand each other :-)

In my opinion, vSphere Integrated Containers is a great technology to start DevOps in the Enterprise environment. We all know that Enterprises will eventually adopt Kubernetes as it is nowadays a "de facto" standard for DevOps infrastructure, however huge Docker Hosts backed by solid enterprise infrastructure (VMware vSphere) is good enough for anybody just starting with DevOps methodology. In the near future, VMware vSphere will have integrated Kubernetes (watch the Project Pacific) and DevOps teams will be able to transform their DevOps tooling into another level.

In the next post, I will focus on the Architecture Design of DevOps environment leveraging VMware vSphere Integrated Containers.

Hope this blog post helps at least one other person in the great VMware community.