Sunday, November 29, 2020

Virtual Machine Advanced Configuration Options

First and foremost, it is worth mentioning, that it is definitely not recommended to change any advanced settings unless you know what you are doing and you are fully aware of all potential impacts. VMware default settings are the best for general use covering the majority of use cases, however, when you have some specific requirements you might need to do the VM tuning and change some advanced virtual machine configuration options. In this blog post, I'm trying to document advanced configuration options I've found useful in some specific design decisions.

Time synchronization

    • Description:
    • Type: Boolean
    • Values:
      • true / 1 (default)
      • false / 0
  • time.synchronize.restore
    • Description:
    • Type: Boolean
    • Values:
      • true / 1 (default)
      • false / 0
  • time.synchronize.shrink
    • Description:
    • Type: Boolean
    • Values:
      • true / 1 (default)
      • false / 0
  • time.synchronize.continue
    • Description:
    • Type: Boolean
    • Values:
      • true / 1 (default)
      • false / 0
  • time.synchronize.resume.disk
    • Description:
    • Type: Boolean
    • Values:
      • true / 1 (default)
      • false / 0

Relevant resources:



With the isolation option, you can restrict file operations between the virtual machine and the host system, and between the virtual machine and other virtual machines.

VMware virtual machines can work both in a vSphere environment and on hosted virtualization platforms such as VMware Workstation and VMware Fusion. Certain virtual machine parameters do not need to be enabled when you run a virtual machine in a vSphere environment. Disable these parameters to reduce the potential for vulnerabilities.

Following advanced settings are booleans (true/false) with default value false. You can disable it by changing the value to true.

  • isolation.bios.bbs.disable


Remote Display

Tuesday, November 24, 2020

vSAN 7 Update 1 - What's new in Cloud Native Storage

 vSAN 7 U1 comes with new features also in Cloud Native Storage area, so let's look at what's new.

PersistentVolumeClaim expansion

Kubernetes v1.11 offered volume expansion by editing the PersistentVolumeClaim object. Please note, that volume shrink is not supported and extension must be done offline. Online expansion is not supported in U1 but planned on the roadmap.  

Static Provisioning in Supervisor Cluster

This feature allows exposing an existing storage volume within a K8s cluster integrated within vSphere Hypervisor Cluster (aka Supervisor Cluster, vSphere with K8s, Project Pacific).

vVols Support for vSphere K8s and TKG Service

Supporting external storage deployments on vK8s and TKG using vVols.

Data Protection for Modern Applications

vSphere 7.0 U1 comes with support Dell PowerProtect and Velero backup for Pacific Supervisor and TKG clusters. Velero only option to initiate snapshots from supervisor Velero plugin and store on S3.

vSAN Direct

vSAN Direct is the feature introducing Directly Attach Storage (typically physical HDD) for object storage solutions running on top of vSphere. 

There will not be a shared vSAN Datastore like typical vSAN has but vSAN Direct Datastores are allowing connect physical disks directly to virtual appliances or containers on top of vSphere/vSAN Cluster providing Object Storage services and bypassing traditional vSAN datapath.

Hope you find it useful.

Monday, November 23, 2020

Why HTTPS is faster than HTTP?

Recently, I was planning, preparing, and executing a network performance test plan, including TCP, UDP, HTTP, and HTTPS throughput benchmarks. The intention of the test plan was the network throughput comparison between two particular NICs

  • Intel X710
  • QLogic FastLinQ QL41xxx

There was a reason for such exercise (reproduction of specific NIC driver behavior) and I will probably write another blog post about it, but today I would like to raise another topic. During the analysis of testing results, I've observed very interesting HTTPS throughput results in comparison to HTTP throughput. These results were observed on both types of NICs, therefore, it should not be a benefit of specific NIC hardware or driver.

Here is the Test Lab Environment:

  • 2x ESXi hosts
    • Server Platform: HPE ProLiant DL560 Gen10
    • CPU: Intel Cascade Lake based Xeon
    • BIOS: U34 | Date (ISO-8601): 2020-04-08
    • NIC1: Intel X710, driver i40en version: 1.9.5, firmware 10.51.5
    • NIC2: QLogic QL41xxx, driver qedentv version:, firmware mfw storm 
    • OS/Hypervisor: VMware ESXi 6.7.0 build-16075168 (6.7 U3)
  • 1x Physical Switch
    • 10Gb switch ports  <<  network bottleneck by purpose, because customer is using 10Gb switch ports as well

Below are the observed interesting HTTP and HTTPS results.




We have observed

  • HTTP throughput between 5 and 6 Gbps
  • HTTPS throughput between 8 and 9 Gbps

which means 50% higher throughput of HTTPS over HTTP. Normally, we would be expecting HTTP transfer faster than HTTPS as HTTPS requires encryption, which should end-up with some CPU overhead. Encryption overhead is questionable, but nobody would expect HTTPS significantly faster than HTTP, right? That's the reason I was asking myself, 

why HTTPS overachieved HTTP results on HPE Lab with the latest Intel CPUs?

Here is my process of the "issue" troubleshooting or better to say, root cause analysis. 


  • In my home lab, I have old Intel CPUs models (Intel Xeon CPU E5-2620 0 @ 2.00GHz), that's the reason HTTP and HTTPS throughputs are identical.
  • In the HPE test lab, there are the latest Intel CPU models, therefore, HTTPS can be offloaded and client/server communication can leverage asynchronous advantages for web servers using Intel® QuickAssist Technology introduced in the Intel Xeon E5-2600 v3 product family. 
  • It is worth to mention, that it is not only about CPU hardware acceleration, but also about software code which must be written in the form, hardware acceleration can leverage for a positive impact on performance. This is the case of OpenSSL 1.1.0, and NGINX 1.10 to boost HTTPS server efficiency. 

Lesson learned

When you are virtualizing network functions, it is worth considering the latest CPUs, as it can have a significant impact on overall system performance and throughput. Does not matter, if such network function virtualization is done by VMware NSX or other virtualization or containerization platforms.

Investigation continues

To be honest, I do not know if I really fully understand the root cause of such behavior. I still wonder why HTTPS is 50% faster than HTTP, and if CPU offloading is the only factor for such performance gain.

I'll try to run the test plan on other hardware platforms, compare results, and do some further research to understand much deeper. Unfortunately, I do not have direct access to the latest x86 servers of other vendors, so it can take a while. If you have access to some modern x86 hardware and want to run my test plan by yourself, you can download the test plan document from here. If you will invest some time into the testing, please share your results in the comments below this article or simply send me an e-mail

Hope this blog post is informative, and as always, any comment or idea is very welcome. 

Saturday, November 21, 2020

Understanding vSAN Architecture Components

VMware vSAN becomes more and more popular, thus more often used as primary storage in data centers and server rooms. Sometimes, as with any IT technology, is necessary to do the troubleshooting. Understanding of architecture and components interactions is essential for effective troubleshooting of vSAN. Over years, I have collected some vSAN architectural information into a slide deck I made available at

In the slide deck are the slides with the following sections ...

vSAN Terminology

  • CMMDS - Cluster Monitoring, Membership, and Directory Service
  • CLOMD - Cluster Level Object Manager Daemon
  • OSFSD - Object Storage File System Daemon
  • CLOM - Cluster Level Object Manager
  • OSFS - Object Storage File System
  • RDT - Reliable Datagram Transport
  • VSANVP - Virtual SAN Vendor Provider
  • SPBM - Storage Policy-Based Management
  • UUID - Universally unique identifier
  • SSD - Solid-State Drive
  • MD - Magnetic disk
  • VSA - Virtual Storage Appliance
  • RVC - Ruby vSphere Console

Architecture components
    • Cluster Monitoring, Membership, and Directory Service
  • CLOM
    • Cluster Level Object Manager Daemon
  • DOM
    • Distributed Object Manager
    • Each object in a vSAN cluster has a DOM owner and a DOM client
  • LSOM
    • Local Log Structured Object Manager
    • LSOM works with local disks
  • RDT
    • Reliable Datagram Transport
Components interaction

Architecture & I/O Flow

Troubleshooting tools
  • RVC
    • vsan.disks_info
    • vsan.disks_stats
    • vsan.disk_object_info
    • vsan.cmmds_find
    • esxcli vsan debug disk list
  • Objects tools
    • /usr/lib/vmware/osfs/bin/objtool
How to use vSAN Observer

  • SSH somewhere where you have RVC. It can be for example VCSA or HCIbench
    • ssh root@[IP-ADDRESS-OF-VCSA]
  • Run RVC command-line interface and connect to your vCenter where you have vSphere cluster with vSAN service enabled. RVC requires the password of the administrator in your vSphere domain. 
    • rvc administrator@[IP-ADDRESS-OF-VCSA]
  • Start vSAN Observer on your vSphere cluster with vSAN service enabled
    • -r /localhost/[vDatacenter]/computers/[vSphere & vSAN Cluster]
  • Go to vSAN Observer web interface
    • vSAN Observer is available at https://[IP-ADDRESS-OF-VCSA]:8010
Slide deck includes little more info so download it from

Hope it helps the broader VMware community.

If you know some other detail or troubleshooting tool, please leave a comment below this post.

Thursday, November 05, 2020

NSX-T Edge Node performance profiles

It is good to know that NSX-T Edge Node has multiple performance profiles. Those profiles will change the # of vCPU for DPDK and so leave more or less vCPU for other services such as LB:

  • default (best for L2/L3 traffic)
  • LB TCP (best for L4 traffic)
  • LB HTTP (best for HTTP traffic)
  • LB HTTPS (best for HTTPS traffic)

Now you can ask how to choose Load Balancer Performance profile. SSH to the edge node and use CLI.

 nsx-edgebm3> set load-balancer perf-profile  
  http   Performance profile type argument  
  https   Performance profile type argument  
  l4    Performance profile type argument  
 Note: You may be prompted to restart the dataplane or reboot the Edge Node if there are changes in the profile in # of cores used by LB.  
 To go back to default profile:  
 nsx-edgebm3> clear load-balancer perf-profile  

Changing from L4 to HTTP help me to achieve ~3x higher HTTP throughput through L7 NSX-T load balancer. Hope this helps someone else as well.

Tuesday, September 22, 2020

vSAN - vLCM Capable ReadyNode

VMware vSphere Lifecycle Manager (aka vLCM) is one of the very interesting features in vSphere 7.  vLCM is a powerful new approach to simplified consistent lifecycle management for the hypervisor and the full stack of drivers and firmware for the servers powering your data center.

There are only a few server vendors who have implemented firmware management with vLCM.

At the moment of writing this article, these vendors are:

  • Dell and HPE for vSphere 7.0
  • Dell, HPE, Lenovo for vSphere 7.0 Update 1

Recently I have got the following question from one of my customers.

"Where I can find official information about certified vLCM server vendors?"

It is a very good question. I would expect such information in VMware Compatibility Guides (VCG), however, there is no such information on "Systems / Servers" VCG but you can find it in "vSAN" VCG.

vSAN VCG contains "vSAN ReadyNodes Additional Features" where one feature is "vLCM Capable ReadyNode". So, there you can find Server Vendors successfully implemented firmware management integration with vLCM, but it is available only for vSAN Ready Nodes. I can imagine, that in the future, vLCM capability may or may not be available even for standard servers and not only for vSAN Ready Nodes.

Tuesday, September 15, 2020

vSAN 7 Update 1 - What's new

vSAN 7 Update 1 has been announced, so let's look at what it brings into the table. 


In the figure below, you will see what new features are available in this release.

New features in vSAN 7 Update 1

VMware HCI Mesh

It is a possibility to mount remote vSAN datastore from external (aka Server) vSAN Cluster to multiple Client vSAN Clusters. An example topology is depicted in the figure below.


HCI Mesh allows multiple client vSAN clusters can mount and share a remote datastore from vSAN Server Cluster. A single datastore can be mounted up to a maximum of 64 hosts, including the server cluster's hosts. With such topology, you can do vMotion (Compute only) across multiple vSphere/vSAN Clusters.

With HCI Mesh you can also do a Full Mesh where vSAN Cluster acts as Client and Server of HCI Mesh. Such topology is depicted below, where all three clusters are both clients and servers.

Full Mesh

Such Full Mesh topology is ideal for homogeneous clusters and equalizes storage consumption across clusters. Scalability of such topology is limited to 5 remote datastores and 5 client datastores. In other words, Client clusters can mount a maximum of 5 remote datastores, and Server clusters can export up to a maximum of 5 client clusters.

Few more notes about HCI Mesh 

  • HCI Mesh Client server is vSAN Cluster. So, the minimum node count is 2-node vSAN Cluster
  • Compute-only vSAN cluster technically works but not recommended and supported at the moment.
  • Meshing Hybrid and All-Flash datastores is supported

vSAN Native File Services

VMware is extending vSAN File Services to SMB protocol. SMB is integrated with Microsoft Active Directory and supports Kerberos authentication. This means that vSAN now supports NFS (version 3 and 4.1) and SMB.


vSAN Data-in-Transit Encryption

vSAN 7 Update 1 increases overall security with native Inter-node encryption of vSAN data traffic over TCP, which ensures data privacy, authentication, and integrity, leveraging existing FIPS-2 validated crypto module. The interesting fact is, that external Key Management Server (KMS) is not required for this feature. However, please be aware that vSAN Mesh and Data-in-Transit Encryption together are not supported in this release.

vSAN Data-in-Transit Encryption

SSD Secure Erase (Secure wipe method)

vSAN 7 Update 1 has the option to securely erase SSDs for Dell and HPE supported devices at this release, so HPE & Dell vSAN Ready Nodes and DellEMC VxRail should be able to use this feature. Other hardware vendors will obviously come in the future.


Overall performance optimization

Based on VMware internal performance tests, vSAN 7 Update 1 should be approximately 30% faster in comparison to vSAN 6.7 U3, which was the fastest vSAN release so far. I know this is a kind of vague statement without further details but I personally believe, the vSAN performance, especially in the All-Flash model, was already good enough for the majority of traditional workloads. Of course, additional performance improvements are always nice to have but I think there are other factors which are more important at least for customers I work with. 


vSAN prior 7 Update 1 supported compression together with deduplication. vSAN 7 Update 1 decouples the compression feature from the deduplication feature to allow space efficiency with a lower performance overhead caused by the deduplication algorithm. 

When both features (Dedup & Compress) are turned on, it works in the following way ... 


  • Per disk group
  • Occurs when destaging to the capacity tier
  • 4KB fixed blocks


  • Occurs after dedup, prior to data being destaged
  • If block is compressed <= 2KB
  • Otherwise full 4KB block is stored

When Deduplication is Turned On, the failure domain is the whole disk group.
In Compression-only mode, the failure domain is reduced to a disk, which is another benefit improving availability, reducing data to resync on failures, therefore improves availability and recoverability SLA.

Compression-only has a significantly higher performance than dedup, that's why OLTP workloads primarily benefit from this feature.


Enhanced Durability During Maintenance Operations

First of all, it is important to understand the difference between Durability and Availability. We are talking here about Durability during vSphere/vSAN Cluster node maintenance mode. The feature is nicely depicted in the figure below.

Enhanced Durability During Maintenance Mode

When ESXi host enters Maintenance Mode you can do 
  • Full data Evacuation <-- time-consuming operation but vSAN objects stay protected per Storage Policy intent
  • Ensure Availability <-- it checks that none vSAN object becomes unavailable, but can become unprotected
  • Nothing <-- this is very dangerous and can cause a data loss
Now with vSAN 7 Update 1, you can Ensure Availability, acknowledge that some vSAN objects can become unavailable in case of some other failure, but doing delta writes to other Host or Failure Domain for the vSAN object placed on a host with the only active replica. This results in additional durability protection (prevents loss of data) because, in the case of Replica 1 failure, the vSAN object becomes unavailable, but you can recover the data very quickly from Replica 2 and Delta Writes after the ESXi host returns back from the planned maintenance mode.

Implementation details:
  • Only available when another Host or Failure Domain is available
  • Can be the same host as the witness component
  • Applies to both RAID Mirroring and Erasure Coding
Faster Host Reboots and Cluster Upgrades
  • Significant improvement in cluster upgrades due to faster host reboots
  • Host metadata is written to disk before a reboot and read back to memory after reboot. This is faster than rebuilding metadata.
  • Average 5x improvement in host reboot times

Shared Witness for 2 Node vSAN Deployments

This feature enables 2-Node vSAN deployments to share a common witness instance. vSAN 7 Update 1 supports up to 64 ROBO 2-Node clusters. With vSAN witness consolidation, customers can reduce deployment cost and operational complexity.

Slack Space optimized, operationalized, and renamed into Reserved Capacity

In the past, VMware recommended keeping 25% - 30% capacity as slack space. Now, the new Reserved Capacity is optimized to requires less space and is dependent on deployment variables and decreases with the number of hosts in vSAN Cluster. Example deployment:
  • 12 node cluster = ~16%
  • 24 node cluster = ~12%
  • 48 node cluster =~ 10%
Reserved capacity is required for:
  • Resync operations such as policy changes, rebalancing, and data movement
  • Rebuild activities due to failures
Slack space is Reserved Capacity

On top of disk space optimization for reserve capacity, vSAN now optionally prevents the consumption of vSAN reserved capacity with optional capacity reserves including
  • Operations reserve
  • Host rebuild reserve
Capacity reserves are soft-thresholds that prevent provisioning activities, thus existing VM I/O is not prevented. And once again, this prevention is completely optional, so it is OPT-IN setting and it is not enabled by default. You can see UI in the figure below.

Please note that the vSAN Reserved Capacity feature is not supported on stretched clusters and 2-node vSAN.

vLCM Enhancements

Dell and HPE are supporting vLCM since vSAN 7 release. Now, in vSAN 7 Update 1, Lenovo ReadyNode models are supported as well.

From the technical features point of view, vLCM has been enhanced in the following areas 
  • vSAN Fault Domains, 2-node, and Stretched Clusters awareness
  • hardware compatibility pre-checks
  • parallel cluster remediation of up to 64 clusters
  • support for environments running NSX-T 3.1
The technical features above are depicted in the following figure.


Simplified Routing for vSAN Network Topologies

Prior to this release, vSAN topologies with external witness had to use static routing in each ESXi host which was a significant management pain. In vSAN 7 Update 1, alternate default gateway can be specified and static routes do not need to be used. This is a very useful feature from an operational point of view if you ask me.

vSAN alternate default gateway

vSAN I/O Insight

This is another very useful feature, to analyze storage workload I/O patterns. It is
  • Quick and easy tool in vSphere Client to capture workload IO characteristics on VSAN
  • Rich IO Pattern metrics and histograms to analyze R/W ratio, Seq/Random ratio, 4K aligned / unaligned ratio, IO size distribution
  • Finer granular IO performance metrics
The tool provides solid data points for the infrastructure team to triage issues with app users without the need for complex external tools. See. the screenshot below.

vSAN IO Insight


vSAN 7.0 U1 release is definitely a significant step forward for VMware software-defined storage, which is an important component of full VMware SDDC stack. And this is a very nice proof of a key SDDC benefit ... Quick response to customers and industry requirements ... This is the reason, why I really like software-defined and hyper-converged infrastructure, leveraging commoditization, integration, and continuous product improvement. vSphere administrator is the king of the house ... at least from the infrastructure point of view :-)