Saturday, March 21, 2020

What's new in VMware vSphere 7

vSphere 7 has been announced and will be GA and available to download into our labs very soon. Let's briefly summarize what's new in vSphere 7 and put some links to other resources.

vSphere with Kubernetes

Project Pacific evolved into Integrated Kubernetes and Tanzu. vSphere has been transformed in order to support both VMs and containers. Tanzu Kubernetes Grid Service is how customers can run fully compliant and conformant Kubernetes with vSphere. However, when complete conformance with the open-source project isn’t required, the vSphere Pod Service can provide optimized performance and improved security through VM-like isolation. Both of these options are available through VMware Cloud Foundation 4.

The important takeaway is that Kubernetes is now built into vSphere which allows developers to continue using the same industry-standard tools and interfaces they’ve been using to create modern applications. vSphere Admins also benefit because they can help manage the Kubernetes infrastructure using the same tools and skills they have developed around vSphere.

References:
Improved DRS

DRS used to focus on the cluster state and the algorithm would recommend a vMotion when it would benefit the balance of the cluster as a whole. This meant that DRS used to achieve cluster balance by using a cluster-wide standard deviation model. The new DRS logic computes a VM DRS score on the hosts and moves the VM to a host that provides the highest VM DRS score. This means DRS cares less about the ESXi host utilization and prioritizes the VM “happiness”. The VM DRS score is also calculated every minute and this results in a much more granular optimization of resources.


Another new feature is "DRS Scalable Shares". Scalable Shares solves a problem many have been facing over the last decade or so, which is that DRS does not take the number of VMs in the pool into account when it comes to allocating resources.

References:
Refactored vMotions

Improvements in live migrations of monster workloads. Monster VMs with a large memory & CPU footprint, like SAP HANA and Oracle database backends, had challenges being live-migrated using vMotion. The performance impact during the vMotion process and the potentially long stun-time during the switchover phase meant that customers were not comfortable using vMotion for these large workloads. With vSphere 7, we are bringing back that capability as we have greatly improved the vMotion logic.

How the improvement was achieved?
  • Multi-threading
  • A dedicated vCPU is used for page tracing which means that the VM and its applications can keep working while the vMotion processes are occurring. Prior to vSphere 7, page tracing occurred on all vCPUs within a VM, which could cause the VM and its workload to be resource-constrained by the migration itself. 

References:
Assignable Hardware

There is a new framework called Assignable Hardware that was developed to extend support for vSphere features when customers utilize hardware accelerators. It introduces vSphere DRS (for initial placement of a VM in a cluster) and vSphere High Availability (HA) support for VM’s equipped with a passthrough PCIe device or a NVIDIA vGPU. Related to Assignable Hardware is the new Dynamic DirectPath I/O which is a new way of configuring passthrough to expose PCIe devices directly to a VM. The hardware address of a PCIe device is no longer directly mapped to the configuration (vmx) file of a virtual machine. Instead, it is now exposed as a PCIe device capability to the VM.

Together, Dynamic DirectPath I/O, NVIDIA vGPU, and Assignable Hardware are a powerful new combination unlocking some great new functionality. For example, let’s look at a VM that requires an NVIDIA V100 GPU. Assignable Hardware will now interact with DRS when that VM is powered on (initial placement) to find an ESXi host that has such a device available, claim that device, and register the VM to that host. If there is a host failure and vSphere HA kicks in, Assignable Hardware also allows for that VM to be restarted on a suitable host with the required hardware available.


References:
Bitfusion

Bitfusion stays in vSphere 7 as a Tech Preview feature. It allows us to leverage hardware accelerators (GPUs) across an infrastructure (over network fabric) and integrate it with evolving technologies such as FPGAs and custom ASICs using the same infrastructure. This is actually the first implementation of the software-defined composable infrastructure within VMware SDDC stack, therefore it is a very promising and very needed technology for modern applications such as ML/AI workloads.


References:
Precision Time Protocol (PTP)

Precision Time Protocol is helpful for financial and scientific applications requiring sub-millisecond accuracy. PTP requires VM Hardware 17 and it must be enabled on both an in-quest device and an ESXi service. Thus, you have to choose between NTP or PTP.


VM Template Management (Content Library)

VM template check-in and check-out operations with versioning feature. Content Library should also support of controlled replication into remote locations. With these vSphere 7 Content Library improvements, the Content Library is now a mature and very useful tool for VM template management.


References:
vSphere Lifecycle Manager (vLCM)

Desired state of ESXi hosts image (divers & firmware) and host configuration assigned to vSphere Clusters. It requires integration with hardware vendor system management like Dell OMIVV (OpenManage Integration for VMware vCenter) or HPE OneView for VMware vCenter.


References:
vSphere Update Planner

Update Planner is part of vLCM and it monitors current interoperability based on VMware HCL.


References:
vCenter Server Profiles

Export / Import of VCSA (vCenter) configuration. This is good for effective management of a lot of vCenters but please, do NOT expect export/import of vCenter objects like Clusters, VM Folders, Resource Pools, Virtual Switches, etc... This is export / import of VCSA configurations.

References:
VCSA multihoming

VCSA now supports multiple (up to 4) vNICs. The first vNIC (vNIC0) is for management, the second (vNIC1) is dedicated for vCenter Server HA and other vNICs can be used for other purposes like a backup or so.

vCenter and SSO Architecture

vCenter Server Appliance (VCSA) with embedded Platform Service Controler (PSC). External PSC is not supported and it leads into simple SSO topology.

Simplified Certificate Management

Much simpler SSL certificate management. Fewer certificates to manage. For example, vCenter has only two SSL certificates, a Machine SSL certificate, and Certification Authority Certificate. vSphere 7 introduced some vSphere Client UI improvements and also the REST API for certificate management for environments with more vCenters to manage. This is, of course, beneficial for environments implemented based on VMware Validated Designs (VVD) or VMware Cloud Foundation (VCF) environments which is the automated implementation of VVD.


Identity Federation

vCenter is not the key Identity Management System anymore. vSphere Client is using external authentication providers to optimize IDM integration in customer's environments. The first implementation supports only Microsoft Active Directory Federation Services (ADFS), however, VMware SSO still exists, therefore the customer can choose if he will use the brand new Identity Federation or keep existing AD/LDAP authentication through VMware SSO.



vSphere Trust Authority (vTA)

In vSphere 7, vCenter is not trusted authority anymore. vSphere 7 introduces vTA, which creates a hardware root of trust using a separate ESXi host cluster.


vSGX - Support of Intel Software Guard Extensions (SGX)

vSphere 7 introduces support of Intel Software Guard Extensions. I was blogging about SGX a few years ago in blog post Intel Software Guard Extensions (SGX) in VMware VM. Intel SGX allows applications to work with hardware to create a secure enclave that cannot be viewed by the guest OS or hypervisor. With SGX, applications can move sensitive logic and storage into this enclave. SGX is the Intel-only feature. AMD has SEV, which is a different approach.


References:
vSphere 7 Configuration Maximums

Hosts per single vCenter: 2,500
Powered-on VMs on single vCenter: 30,000

Hosts per SSO domain (vCenters in linked mode): 15,000
Powered-on VMs per SSO domain (vCenters in linked mode): 150,000

vCenter Server Latency - vCenter <-> vCenter: 150 ms
vCenter Server Latency - vCenter <-> ESXi: 150 ms
vCenter Server Latency - vSphere Client (web browser) <-> vCenter: 100 ms

The improvements between vSphere 6.7 and 7 are clearly visible in figure below.


For further configuration maximums, look at https://configmax.vmware.com/

Skyline Health for vSphere 7

Skyline Health for vSphere 7 is the unified health check tool for vSphere which works exactly as Skyline Health for vSAN available since vSphere 6.7 U3. It brings into infrastructure operations similar approach developers are doing in agile development methods - automated testing. You can think about it as a set of tests (health check tests) continually testing everything works as expected.


Conclusion

vSphere 7 is another major vSphere Release. For those who work with VMware virtual infrastructures for ages (see old ESX 3i below), it is amazing where the VMware virtualization platform (vSphere 7, ESXi 7) evolved and what is possible nowadays.

Old good ESXi from Virtual Infrastructure 3 from 2006-ish year :-)
Nowadays, there are totally different reasons to upgrade to the latest vSphere version in comparison to the old days of server consolidation, TCO reduction, and better manageability. Top reasons to upgrade to vSphere 7 are
  • Scalability: The fastest path to the Hybrid/Multi-Cloud and increase scalability through leveraging HCI (Hyper-Converged Infrastructure) 
  • Security: Infrastructure security, secure audits, and account management
  • Performance: maximize performance and efficiency
  • Manageability: Reduce complexity, simplify software patching and hardware upgrades, proactive support technology and services
VMware vSphere 7 new features and incorporation of containers (Kubernetes) into the single platform is another step into VMware's vision to run any app on any cloud. On vSphere 7, you can run
  • monster workloads such as SAP HANA
  • traditional applications in virtual machines
  • modern distributed applications (Cloud Native Applications, CNA) containerized and orchestrated by Kubernetes
This is a great message to all of us, who invested a lot of time (years) to learn, test, design, implement and operate VMware technologies. I can honestly say, ... I LOVE VMWARE ...

Thursday, March 19, 2020

Home Lab 2019/2020

First thing first. Why I have the home lab(s)?

Well, I really need at least one home lab to test and demonstrate VMware vSphere, vSAN, NSX and other components of VMware SDDC stack.

The other reason is, that from time to time I have discussions with other VMware folks discussing our home lab configurations and some of these people have the blog post about their labs. I have never written the blog post about my home lab so far but I realized it is quite useful to document at least some basic information about the lab to quickly show lab details during these discussions and demonstrations. So, here it is.

At the moment, I have two home labs
  1. One in a garage - GARAGE LAB
  2. Second in a flat - APARTMENT LAB
Here are the photos and quick descriptions of my home labs. 

GARAGE LAB




GARAGE LAB vSphere/vSAN Cluster specification
  • 4-node vSphere Cluster (4x ESXi on Dell PE R620) with hybrid vSAN Enabled (4x NVMe 512 GB as cache disks, 8x SATA 500 GB as capacity disks)
  • Each node has 1 CPU Socket (Intel Xeon CPU E5-2620 @ 2.00GHz), 128 Gb RAM, 4x 1Gb Ethernet Port, 1x NVMe 512 GB, 2x SATA 500 GB
GARAGE LAB external storage
  • Flash NAS (NFS) Storage - Synology DS115j , 1x SSD 840 Series 512 GB (465.76 GB SSD)
  • Flash iSCSI Storage - Synology DS115j , 1x SanDisk Ultra II 960 GB (894.3 GB SSD)
  • SATA NAS (NFS) Storage - Synology DS214se, 2x SATA Disk 3TB (2794.52 GB HDD) 
APARTMENT LAB


APARTMENT LAB vSphere/vSAN Cluster specification
  • 4-node vSphere Cluster (4x ESXi on Intel NUC) with All-Flash vSAN Enabled (4x SATA M.2 SSD 180 GB as cache disks, 4x SATA SSD 480 GB as capacity disks)
  • Each node is Intel NUC (6i3SYH) having 1 CPU Socket (Intel Core i3-6100U CPU @ 2.30GHz), 32 GB RAM, 1x 1Gb Ethernet Port, 1x SATA M.2 SSD 180 GB, 1x SATA SSD 480 GB
VMware Licensing

As I'm VMware Certified Design Expert, I'm automatically (after application) awarded as VMware vExpert thus entitled to use VMware vSphere Licenses for almost all VMware products. This is IMHO one of the biggest advantages to be VCDX and/or participate in the VMware vExpert program.

Tuesday, February 11, 2020

Host cannot communicate with one or more other nodes in the vSAN enabled cluster

I work as VMware HCI Specialist, therefore I have to do a lot of vSAN testing and demonstrations in my home lab. The only reasonable way how to effectively test and demonstrate different vSAN configurations and topologies is to run vSAN in a nested environment. Thanks to a nested virtualization, I can very easily and quickly build any type of vSAN cluster.

Recently I have experienced the issue in 3-node (nested) vSAN cluster. I have seen vSAN datastore capacity just of a single node instead of three nodes and on hosts was an error message "Host cannot communicate with one or more other nodes in the vSAN enabled cluster".

The first idea was about networking issue but ping between nodes was working ok so it was not a physical network issue. This is the lab environment so all services (mgmt, vMotion, vSAN) are enabled on single VMKNIC (vmknic0) so everything is pretty straight forward.

So what's the problem?

I did some google searching and found that some people were seeing the same error message when experiencing problems with vSAN unicast agents.

Here is the command to list of unicast agents on vSAN node

esxcli vsan cluster unicastagent list

I test it in my environment.
Grrrr. The list is empty!!!! It is empty on all ESXi hosts in my 3 nodes vSAN cluster.

Let's try to configure it manually.

Each vSAN node should have a connection to agents on other vSAN nodes in the cluster.

For example, one vSAN node from 4-node vSAN Cluster should have 3 connections

 [root@n-esx04:~] esxcli vsan cluster unicastagent list  
 NodeUuid               IsWitness Supports Unicast IP Address    Port Iface Name Cert Thumbprint  
 ------------------------------------ --------- ---------------- -------------- ----- ---------- -----------------------------------------------------------  
 5e3ec640-c033-7c7d-888f-00505692f54d     0       true 192.168.11.105 12321       18:F3:B7:9F:66:C4:C4:3E:0F:7D:69:BB:55:92:BC:A3:AC:E4:DD:5F  
 5df792b0-f49f-6d76-45af-005056a89963     0       true 192.168.11.107 12321       20:4C:C1:48:F5:2D:04:16:55:F1:D3:F1:4C:26:B5:C4:23:E5:B4:12  
 5e3e467a-1c1b-f803-3d0f-00505692ddc7     0       true 192.168.11.106 12321       53:99:00:B8:9D:1A:97:42:C0:10:C0:AF:8C:AD:91:59:22:8E:C9:79  

We need the get local UUID of the cluster node.

 [root@n-esx08:~] esxcli vsan cluster get  
 Cluster Information  
   Enabled: true  
   Current Local Time: 2020-02-11T08:32:55Z  
   Local Node UUID: 5df792b0-f49f-6d76-45af-005056a89963  
   Local Node Type: NORMAL  
   Local Node State: MASTER  
   Local Node Health State: HEALTHY  
   Sub-Cluster Master UUID: 5df792b0-f49f-6d76-45af-005056a89963  
   Sub-Cluster Backup UUID:  
   Sub-Cluster UUID: 52c99c6b-6b7a-3e67-4430-4c0aeb96f3f4  
   Sub-Cluster Membership Entry Revision: 0  
   Sub-Cluster Member Count: 1  
   Sub-Cluster Member UUIDs: 5df792b0-f49f-6d76-45af-005056a89963  
   Sub-Cluster Member HostNames: n-esx08.home.uw.cz  
   Sub-Cluster Membership UUID: f8d4415e-aca5-a597-636d-005056997c1d  
   Unicast Mode Enabled: true  
   Maintenance Mode State: ON  
   Config Generation: 7ef88f9d-a402-48e3-8d3f-2c33f951fce1 6 2020-02-10T21:58:16.349  

So here are my nodes
n-esx08 - 192.168.11.108 - 5df792b0-f49f-6d76-45af-005056a89963
n-esx09 - 192.168.11.109 - 5df792b0-f49f-6d76-45af-005056a89963
n-esx10 - 192.168.11.110 - 5df792b0-f49f-6d76-45af-005056a89963

And now the problem is clear. All vSAN nodes have the same UUID.
Why?  Let's check ESXi system UUIDs on each ESXi host.

 [root@n-esx08:~] esxcli system uuid get  
 5df792b0-f49f-6d76-45af-005056a89963  
 [root@n-esx08:~]  

 [root@n-esx09:~] esxcli system uuid get  
 5df792b0-f49f-6d76-45af-005056a89963  
 [root@n-esx09:~]  

 [root@n-esx10:~] esxcli system uuid get  
 5df792b0-f49f-6d76-45af-005056a89963  
 [root@n-esx10:~]  


Note: if you want to check UUID of all ESXi hosts, use following PowerCLI

 Get-VMHost | Select Name,  
   @{N='HW BIOS Uuid';E={$_.Extensiondata.Hardware.SystemInfo.Uuid}},  
   @{N='ESXi System UUid';E={(Get-Esxcli -VMHost $_).system.uuid.get()}}  

So the root cause is obvious.
I use nested ESXi hosts to test vSAN and I forgot to regenerate system UUID after the clone. 
The solution is easy. Just delete UUID from /etc/vmware/esx.conf and restart ESXi hosts.

ESXi system UUID in /etc/vmware/esx.conf

You can do it from command line as well

sed -i 's/system\/uuid.*//' /etc/vmware/esx.conf
reboot

So we have identified the problem and we are done. After ESXi hosts restart vSAN Cluster Nodes UUIDs are changed automatically and vSAN unicastagents are automatically configured on vSAN nodes as well.

However, if you are interested in how to manually add a connection to a unicast agent on a particular node, you would execute the following command

esxcli vsan cluster unicastagent add –a [ip address unicast agent] –U [supports unicast] –u [Local UUID] -t [type]

Anyway, such a manual configuration should not be necessary and you should do it only when instructed by VMware support.

Hope this helps someone else in VMware community.

Saturday, February 01, 2020

vSphere Integrated Containers - PoC in my home lab

vSphere Integrated Containers (aka VIC) is VMware Enterprise Container Infrastructure. Any VMware customer having VMware vSphere Enterprise Plus can get enterprise container infrastructure to help IT Ops run traditional and containerized applications side-by-side on a common platform with vSphere Integrated Containers. Supporting containers in your virtualized environments means IT teams get the security, isolation, and management of VMs, while developers enjoy the speed and agility of containers—all within vSphere. The VIC project is available on GitHub at https://vmware.github.io/vic-product/

The overall concept is very interesting, especially for customers willing to use Docker containers but not having requirements or skills to operate Kubernetes. The most interesting part of VIC is the networking concept very well explained in following video https://www.youtube.com/watch?v=QLi9KasWLCM&feature=youtu.be


As one of my customers is considering VIC to provide containers to their developers, I have decided to test it in my home lab.

The first step is to deploy VIC from OVF. It's pretty straight forward so I'm not going to document any details.

The second step is to create the first VCH (Virtual Container Host), which acts as a remote Docker server. You can have multiple VCH's and they can be grouped into projects. Let's keep architectural decisions besides at the moment and focus on testing the technology itself.

VCH deployment is typically done via vic-machine CLI Utility as GUI is too slow interface for DevOps approach. Documentation is available at
https://vmware.github.io/vic-product/assets/files/html/1.5/vic_vsphere_admin/using_vicmachine.html

VIC-MACHINE ... is the primary tool for DevOps admin. I will discuss roles, tooling, and RBAC for particular actors in DevOps approach in the next blog post together with the overall architecture.

We can download vic-machine from VIC appliance deployed in step 1. In my case, it is available at https://vic.home.uw.cz:9443/

When we have vic-machine available in our DevOps workstation we can start act as a DevOps engineer.

Note 1: 
If you have self-signed certificate as I have in my lab, you need to get vCenter Thumbprint.You have to ssh to vCenter Server Appliance and get the fingerprint.
openssl x509 -in /etc/vmware-vpx/ssl/rui.crt -fingerprint -sha1 -noout
The vCenter thumbprint in my lab is
SHA1 Fingerprint=64:06:CD:4E:D8:39:8B:E8:80:2D:D3:25:50:C7:B9:7D:E1:6F:8B:E9
Note 2: 
vic-machine tool is available for various operating systems (windows, linux, darwin).  As my workstation is Mac OS X (aka darwin), in commands below I will use ./vic-machine-darwin

Before the first VCH deployment, we have to enable the firewall rules in a particular vSphere cluster (the name of vSphere Cluster in my home lab is CLUSTER).

./vic-machine-darwin update firewall --target vc01.home.uw.cz --thumbprint 64:06:CD:4E:D8:39:8B:E8:80:2D:D3:25:50:C7:B9:7D:E1:6F:8B:E9 --user administrator@uw.cz --compute-resource CLUSTER --allow

Now we are ready to create our first VCH.

Here is the command, how to create VCH.

./vic-machine-darwin create --name vch01 --container-name-convention vch01-{name} --compute-resource CLUSTER --image-store vsan-Underlay --base-image-size 8GB --bridge-network VCH01-BRIDGE --bridge-network-range 172.16.0.0/12 --public-network MGMT --dns-server 192.168.4.4 --container-network CONTAINER01:container01 --container-network-ip-range CONTAINER01:192.168.51.0/24 --container-network-gateway CONTAINER01:192.168.51.254/24 --container-network-dns CONTAINER01:192.168.4.4 --container-network-firewall CONTAINER01:published --tls-cname vch01 --certificate-key-size 2048 --no-tlsverify --user admin@UW.CZ --thumbprint 64:06:CD:4E:D8:39:8B:E8:80:2D:D3:25:50:C7:B9:7D:E1:6F:8B:E9 --target vc01.home.uw.cz/SDDC --ops-user administrator

The output from the command looks similar to ...

INFO[0068] VCH Admin Portal:                            
INFO[0068] https://192.168.4.140:2378                   
INFO[0068]                                              
INFO[0068] VCH Default Bridge Network Range: 172.16.0.0/12 
INFO[0068] VCH Default Bridge Network Width: 16         
INFO[0068]                                              
INFO[0068] Published ports can be reached at:           
INFO[0068] 192.168.4.140                                
INFO[0068]                                              
INFO[0068] Management traffic will use:                 
INFO[0068] 192.168.4.140                                
INFO[0068]                                              
INFO[0068] Docker environment variables:                
INFO[0068] DOCKER_HOST=192.168.4.140:2376 COMPOSE_TLS_VERSION=TLSv1_2 
INFO[0068]                                              
INFO[0068] Environment saved in vch2/vch2.env           
INFO[0068]                                              
INFO[0068] Connect to docker:                           
INFO[0068] docker -H 192.168.4.140:2376 --tls info      
INFO[0068] Installer completed successfully             

Davids-MacBook-Pro:vic-machine dpasek$ 

My first VCH has been created with a single container network (network name CONTAINER01). To add another container network (CONTAINER02) we need to know VCH id, therefore inspect command has to be used.

./vic-machine-darwin inspect --target vc01.home.uw.cz --thumbprint 64:06:CD:4E:D8:39:8B:E8:80:2D:D3:25:50:C7:B9:7D:E1:6F:8B:E9 --user administrator@uw.cz --compute-resource CLUSTER --name vch2

INFO[0005] VCH ID: vm-299                               
INFO[0005]                                              
INFO[0005] VCH Admin Portal:                            
INFO[0005] https://192.168.4.140:2378                   
INFO[0005]                                              
INFO[0005] VCH Default Bridge Network Range: 172.16.0.0/12 
INFO[0005] VCH Default Bridge Network Width: 16         
INFO[0005]                                              
INFO[0005] Published ports can be reached at:           
INFO[0005] 192.168.4.140                                
INFO[0005]                                              
INFO[0005] Management traffic will use:                 
INFO[0005] 192.168.4.140                                
INFO[0005]                                              
INFO[0005] Docker environment variables:                
INFO[0005] DOCKER_HOST=192.168.4.140:2376 COMPOSE_TLS_VERSION=TLSv1_2 
INFO[0005]                                              
INFO[0005] Connect to docker:                           
INFO[0005] docker -H 192.168.4.140:2376 --tls info      

INFO[0005] Completed successfully  

Note that to configure additional network we have to add all already existing networks into the command as well.

./vic-machine-darwin configure --target vc01.home.uw.cz --thumbprint 64:06:CD:4E:D8:39:8B:E8:80:2D:D3:25:50:C7:B9:7D:E1:6F:8B:E9 --user administrator@uw.cz --id vm-299 --container-network CONTAINER01 --container-network CONTAINER02

Docker client is using VCH (aka Virtual Container Host) as a docker server

export DOCKER_HOST=192.168.4.140:2376
docker --tls info
docker --tls network ls

Here is the output of the networks available in my Docker host (aka Virtual Container Host)

dpasek@photon-machine [ ~ ]$ docker --tls network ls   
NETWORK ID          NAME                DRIVER              SCOPE
4313b6b0ab2e        CONTAINER01         external            
6eb9e8911266        bridge              bridge              

dpasek@photon-machine [ ~ ]$ 

and after vic-machine command to add additional network CONTAINER02 you can see the network in container host (VCH) and use it for our containers.

dpasek@photon-machine [ ~ ]$ docker --tls network ls
NETWORK ID          NAME                DRIVER              SCOPE
4a8318cb2738        CONTAINER01         external            
a011cf3ced72        CONTAINER02         external            
7445a9e714ed        bridge              bridge              

dpasek@photon-machine [ ~ ]$ 

Of course, we can use other standard docker commands to manage our docker images and containers.

docker --tls images -a
docker --tls ps -a

Run NGINX and access the service through NAT
docker --tls run --name nginx1 -d –p 8080:80 nginx
docker --tls ps -a
docker --tls images -a

Run NGINX and access the service directly through container network (IP address assigned via DHCP) 
docker --tls run --name nginx2 --network="container01" -d -p 80 nginx

Run NGINX and access the service directly through container network (IP address assigned staticaly) 
docker --tls run --name nginx30 --network="container01" --ip="192.168.51.30" -d -p 80 nginx

Execute a Shell Command from a Container
docker --tls run busybox date
docker --tls ps -a

In the screenshot below you see what vSphere Admin has in his environment.

Containers visibility for vSphere Admin

Specific container (nginx) visibility for vSphere Admin
Docker Volumes

If you need persistent storage for your containers, you can create volumes which are persistently stored in your enterprise infrastructure and visible not only to DevOps admin and Developer, but also to your infrastructure administrator.

Create a Docker Volume
docker --tls volume create --opt Capacity=2GB --name volume-test
docker --tls volume ls
docker --tls volume inspect volume-test

Attach a Docker Volume to a Container
docker --tls run --name busybox -it -v volume-test:/data/volume-test busybox
# cd /data
/data # ls
volume-1
/data # ls –l
total 4
drwxr-xr-x
/data # df -h
This is what Developer and DevOps admin see via Docker Client

dpasek@photon-machine [ ~ ]$ docker --tls volume ls
DRIVER              VOLUME NAME

vsphere             volume-test

And this is what vSphere Admin can see in his environment

Docker volume (2 GB) is visible as 1.92 GB Hard Disk
List VCH instances

DevOps admin manages Virtual Container Hosts. Here is the command to list all VCH's from a particular vCenter.

./vic-machine-darwin ls --target vc01.home.uw.cz --thumbprint 64:06:CD:4E:D8:39:8B:E8:80:2D:D3:25:50:C7:B9:7D:E1:6F:8B:E9 --user administrator@uw.cz

Davids-MacBook-Pro: vic-machine dpasek$ ./vic-machine-darwin ls --target vc01.home.uw.cz --thumbprint 64:06:CD:4E:D8:39:8B:E8:80:2D:D3:25:50:C7:B9:7D:E1:6F:8B:E9 --user administrator@uw.cz 
INFO[0000] vSphere password for administrator@uw.cz:    
INFO[0003] ### Listing VCHs ####                        
INFO[0003] Validating target                            

ID            PATH                                         NAME        VERSION                      UPGRADE STATUS
vm-299        /SDDC/host/UNDERLAY/CLUSTER/Resources        vch2        v1.5.4-21221-b3d3b06f        Up to date

Davids-MacBook-Pro:vic-machine dpasek$ 


Delete VCH

And here is the command to destroy particular VCH.

./vic-machine-darwin delete --target vc01.home.uw.cz --thumbprint 64:06:CD:4E:D8:39:8B:E8:80:2D:D3:25:50:C7:B9:7D:E1:6F:8B:E9 --user administrator@uw.cz --name vch2 --compute-resource CLUSTER

Conclusion

DevOps is a big topic nowadays. It is worth to say that DevOps is about methodology and not the tooling. At the end of the day, it doesn't matter if you use Docker, Kubernetes, Ansible or even more traditional tools. What matters is the final product which has to be developed in sync with the business owner, otherwise, there is a high chance you are not meeting business owner expectations. I'm ex-developer, Software Engineer / Architect and now I work as an Infrastructure Technical Designer / Architect, therefore I know, the DevOps methodology is the best way how to develop and deliver a business application to the business owner. The key principles of DevOps methodology are to identify the core business requirements, quickly develop the application prototype, present it to the business owner, validate if the business owner expectations are met and if so, move the prototype into production and continuously develop and deploy additional business requirements in short iterations with continuous feedback from the business owner. For such an approach, you need to deploy your application into infrastructure dynamically and quickly. Traditional Enterprise Infrastructure is not ready for such agility.

In this blog post, we were testing the tool (VMware vSphere Integrated Containers) which can help Developers, DevOps admins and infrastructure admins cooperate in an agile fashion. It might or might not help to deliver successfully the business project. However, it definitely helps DevOps teams to have agile DevOps infrastructure with Enterprise level of support, security, availability, and cooperation between Developers, Security, and Infrastructure (compute, storage, network) teams which each lives in the totally different world and they do not understand each other :-)

In my opinion, vSphere Integrated Containers is a great technology to start DevOps in the Enterprise environment. We all know that Enterprises will eventually adopt Kubernetes as it is nowadays a "de facto" standard for DevOps infrastructure, however huge Docker Hosts backed by solid enterprise infrastructure (VMware vSphere) is good enough for anybody just starting with DevOps methodology. In the near future, VMware vSphere will have integrated Kubernetes (watch the Project Pacific) and DevOps teams will be able to transform their DevOps tooling into another level.

In the next post, I will focus on the Architecture Design of DevOps environment leveraging VMware vSphere Integrated Containers.

Hope this blog post helps at least one other person in the great VMware community.

Friday, January 31, 2020

VMware vSphere Replication

VMware vSphere Replication is a software-based replication solution for virtual machines running on vSphere infrastructure. It is storage agnostic so it can replicate VMs from any source storage to any target storage. Such flexibility and simplicity is the biggest value of vSphere Replication. It doesn't matter if you have Fibre Channel, DAS, NAS, iSCSI or vSAN based datastores you can simply start full replica from any to any datastore and based on defined RPO your data are in sync between two places.

It is very simple to install and use solution and also very cost-effective as it is included in all vSphere Editions higher than vSphere Essential Plus.

The installation is straight forward

  1. Download the installation package (ISO)
  2. Mount ISO
  3. Deploy OVF (Virtual Appliance in Open Virtual Format)
  4. Configure Virtual Appliance to register into vCenter

The installation package is in the OVF format. You can deploy the package by using the Deploy OVF wizard in the vSphere Client. 

The installation package contains
  • vSphere_Replication_OVF10.ovf - Use this file to install all VR components, including the vSphere Replication Management Server and a vSphere Replication Server.
  • vSphere_Replication_AddOn_OVF10.ovf - Use this file to install an optional additional vSphere Replication Server.
Of course, there are some solution limitations. Let's write down some of them
  1. vSphere Replication supports RPO 5 min and higher, therefore it is an asynchronous replication and you cannot use it if you have strict RPO Zero requirement. However, do your business really require RPO Zero? Do not hesitate to ask and challenge somebody responsible for Business Impact Analysis and costs associated with it. Do not simply assume you cannot lose any data. There should always be a risk management exercise.
  2. vSphere Replication is a one-to-one VM mapping and not one-to-many. While you can replicate to multiple vCenters (target sites), only one VM can be replicated to one target vCenter. Here are further details and supported topologies
  3. vSphere Replication supports up to 24 recover points. These 24 recover points can have retention for up to 24 days or you can configure up to 24 snapshots per 1 day. So you can spread these 24 recover points within these guide rails based on your specific business requirements.
As vSAN does not have any native storage replication, vSphere Replication is the great add on to VMware HCI solution. If you have higher requirements you can leverage 3rd party solutions like EMC Recoverpoint, which is by the way included in the VxRAIL hardware appliance.

For more information see the vSphere Replication online documentation at http://www.vmware.com/support/pubs/vsphere-replication-pubs.html

Saturday, January 18, 2020

How to configure Jumbo Frames not only for vSAN

Not only vSAN but also vMotion, NFS and other types of traffic can benefit from Jumbo Frames configured on an ethernet network as the network traffic should consume fewer CPU cycles and achieve higher throughput.

Jumbo Frames must be configured end-to-end, therefore we should start the configuration in the network core on Physical Switches, then continue to Virtual Switches and finish on VMkernel ports (vmk). These three configuration places are depicted on schema below.

Physical Switch
Jumbo Frames on physical switches can be configured per the whole switch or per switch ports. It depends on a particular physical switch but my Force10 switch supports configuration only per switch ports as shown on the screenshot below. The configuration per the whole switch would be easier with less configuration and as far as I know, some Cisco switches support it.


If you have more physical switches, all ports in the path must be configured for Jumbo Frames.

Virtual Switch
On the screenshot below you can see the Jumbo Frame configuration on my VMware Virtual Distributed Switch.


VMkernel port
And last but not least, the configuration on VMkernel port, in this case, the vmk interface used for vSAN traffic.


Final test
After any implementation, we should do the test that implementation was successful and all is working as expected. We should log in to ESXi host via ssh and use following ping command

vmkping -I vmk5 -s 8972 -d 192.168.26.122

-d                  set DF bit (IPv4) or disable fragmentation (IPv6)
-I                   outgoing interface
-s                   set the number of ICMP data bytes to be sent.
                      The default is 56, which translates to a 64 byte
                      ICMP frame when added to the 8 byte ICMP header.
                      (Note: these sizes does not include the IP header).

and here is the result in case everything is configured correctly.


In case the message is longer than configured MTU we would see the following ...


You can ask why we use size 8972 and not 9000?
The reason for the 8972 on *nix devices is that the ICMP/ping implementation doesn’t encapsulate the 28 byte ICMP (8) + IP (20) (ping + standard internet protocol packet) header – thus we must take the 9000 and subtract 28 = 8972. [source & credits for the answer]

Hope this helps.

Sunday, December 22, 2019

How to remove VMFS datastore and reuse local disks for vSAN

I'm upgrading the hardware in my home lab to to leverage vSAN. I have 4x Dell PowerEdge R620, each having 2x 500 GB SATA disks but no SSD for cache disks. The cost is always the constraint for any home lab but I've recently found the M.2 NVMe PCI-e adapter for M.2 NVMe SSD in my local computer shop. The total cost of 1x M.2 NVMe PCI-e adapter + 1x M.2 NVMe 512 GB SSD is just $100.




Such hardware upgrade for only $400 would allow me to have vSAN datastore with almost 4 TB raw space because I would have 4-node HYBRID vSAN where each node has 1x NVMe disk as a cache disk and 2x 500 GB SATA disks as capacity disks. The vSAN raw space will be probably 4TB - 10% after disks format but 3.6 TB raw space and 2 TB usable space after decreasing 25% slack space and an additional 25% for RAID 5 protection is still a pretty good deal.

The issue I'm describing in this blog post usually happens in environments where you use local disks as backing storage for local VMFS datastores. Local VMFS datastores work perfectly fine until you would like to remove VMFS datastore and reuse these local disks for example for vSAN. That was exactly my case in my home lab where I have four ESXi hosts each with 2x 500 GB SATA disks having local VMFS datastore on two disks in each ESXi host.

When I tried to remove local datastore (ESX22-Local-SATA-01) it fails with the following error message:

The resource 'Datastore Name: ESX22-Local-SATA-01 VMFS uuid: 5c969e10-1d37088c-3a57-90b11c142bbc' is in use.




Why is the datastore in use? Well, it can be from several reasons. All these reasons are very well described back in 2014 on Virten blog post "Cannot remove datastore * because file system is busy."

Here is Virten's LUN removal checklist:
  • No virtual machine, template, snapshot or CD/DVD image resides on the datastore
  • The datastore is not part of a Datastore Cluster
  • Storage I/O Control is disabled for the datastore
  • The datastore is not used for vSphere HA heartbeat
  • The LUN is not used as an RDM
  • The Datastore is not used as a scratch location
  • The Datastore is not used as VMkernel Dump file location (/vmkdump/)
  • The Datastore is not used as active vsantraced location (/vsantrace/)
  • The Datastore is not used as Scratch location
  • The Datastore is not used to store VM swap files.
The root cause of my issue was the usage of "scratch location". I was blogging about this topic back in 2012 here "Set the Scratch Partition from the vSphere Client".

When you have another datastore available on ESXi host, the solution is very easy. You can simply change "the scratch location". It is much more tricky, in case you do not have any alternative datastore. Fortunately enough, in my home lab, I have three Synology NAS boxes leveraged as shared datastores over NFS and iSCSI, so the fix was quick. If you would need to do it for more then few ESXi hosts, PowerCLI script can be handy.

In case, you do not have any other datastore and you need to remove VMFS datastore you have two options

  1. Reboot the computer to some alternative system (linux, FreeBSD, etc.) and destroy MBR or GPT partition on a particular disk device. Something like gpart destroy -F /dev/ad0 in FreeBSD.
  2.  Physically remove the disk from your computer and when you boot it up VMware should automatically default back to temp scratch location (assuming you don't have any other available datastores on that box). You can then reinsert the disk and correctly remove Datastore from the ESXi host.