Friday, December 07, 2018

ESXi : This host is potentially vulnerable to issues described in CVE-2018-3646

This is a very short post in reaction to those who asked me recently.

When you update to the latest ESXi builds you can see the warning message as depicted on the screenshot below.

Warning message in ESXi Client User Interface (HTML5)
This message just informs you about Intel CPU Vulnerability described in VMware Security Advisory 2018-0020 (VMSA-2018-0020).

You have three choices

  • to eliminate the security vulnerability
  • ignore potential security risk and dismiss the warning
  • keep it as it is and ignore the warning in User Interface
Elimination of "L2 Terminal" security vulnerability is described in VMware KB 55806. It is configurable by ESXi advanced option VMkernel.Boot.hyperthreadingMitigation. If you set a value to TRUE or 1, ESXi will be protected.

The warning message suppression is configurable by another ESXi advanced option UserVars.SuppressHyperthreadWarning. A value TRUE or 1 will suppress the warning message. 

Thursday, December 06, 2018

VMware Metro Storage Cluster - is it DR solution?

Yesterday morning I had a design discussion with one of my customers about HA and DR solutions. We were discussing VMware Metro Storage Cluster topic the same day afternoon within our internal team, therefore it inspired me to write this blog article and use it as a reference for future similar discussions. By the way, I have presented this topic on local VMUG meeting two years ago so you can find the original slides here on SlideShare. On this blog post, I would like to document the topics, architectures, and conclusions I discussed today with several folks.

Stretched (aka active/active) clusters are very popular infrastructure architecture patterns nowadays. VMware implementation of such active/active cluster pattern is vMSC (VMware Metro Storage Cluster). Official VMware vSphere Metro Storage Cluster Recommended Practices can be found here. Let's start with definition what vMSC is and is not from HA (High Availability), DA (Disaster Avoidance) and DR (Disaster Recovery) perspective.

vMSC (VMware Metro Storage Cluster) is
  • High Availability solution extending infrastructure high availability across two availability zones (sites in the metro distance)
  • Disaster Avoidance solution enabling live migration of VMs not only across ESXi hosts within single availability zone (local cluster) but also to another availability zone (another site)
vMSC (VMware Metro Storage Cluster) is great High Availability and Disaster Avoidance technology but it is NOT pure Disaster Recovery solution even it can help with two specific disaster scenarios (one of two storage systems failure,  single site failure). Why it is not pure DR solution? Here are a few reasons
  • vMSC requires Storage Metro Cluster technology which joins two storage systems into a single distributed storage system allowing stretched storage volumes (LUNs) but this creates a single fault zone for situations when LUNs are locked or badly served from the storage system. It is great for HA but not good for DR. Such single fault zone can lead to total cluster outage in situations like described here - http://www-01.ibm.com/support/docview.wss?uid=ssg1S1005201https://kb.vmware.com/kb/2113956
  • vMSC compute cluster (vSphere cluster) requires to be stretched across two availability zones which creates a single fault zone. Such single fault zone can lead to total cluster outage in situations like described here - https://kb.vmware.com/kb/56492
  • DR is not only about infrastructure but also about applications, people and processes.
  • DR should be business service oriented therefore from IT perspective, DR is more about applications than infrastructure
  • DR should be tested on regular basis. Can you afford to power-off the whole site and test that all VMs will be restarted on the other side? Are you sure the applications (or more importantly business services) will survive such test? I know a few environments where they can afford it but most enterprise customers cannot.
  • DR should allow going back into the past, therefore the solution should be able to leverage old data recovery points. Recoverability from old recovery points should be possible on the application group and not for the whole infrastructure.
Combination of HA and DR solutions
Any HA solution should be combined with some DR solution. At a minimum, such DR solution is any classic backup solution having a local or even remote site backup repositories. The typical challenge with any backup solution is RTO (Recovery Time Objective) because
  • You must have the infrastructure hardware ready for workloads to be restored and powered on
  • Time to recovery from traditional backup repositories is usually very time-consuming and it may or may not fulfill RTO requirement
That's the reason why orchestrated DR with storage replications and snapshots is usually better DR solution than a classic backup. vMSC can be safely combined with storage based DR solutions with lower RTO SLA's. VMware has specific Disaster Recovery product called Site Recovery Manager (SRM) to achieve orchestrated vSphere or Storage replications and automated workload recovery. With such combination, you can get Cross Site High Availability, Cross Site Disaster Avoidance provided by vMSC, and pure Disaster Recovery provided by SRM. Such a combination is not so common, at least in my region, because it is relatively expensive. That's the reason customers usually have to decide for only one solution. Now, let's think why vMSC is preferred solution by infrastructure guys over pure DR like SRM. Here are the reasons
  • It is "more simple" and much easier to implement and operate
  • No need to understand, configure and test application dependencies
  • Can be "wrongly" claimed as DR solution 
It is not very well known, but VMware SRM nowadays supports Disaster Recovery and Avoidance on top of stretched storage. It is described in the last architecture concept below.

So let's have a look at various architecture concepts for cross-site HA and DR with VMware products.

VMware Metro Storage Cluster (vMSC) - High Availability and Disaster Avoidance Solution

VMware Metro Storage Cluster (vMSC)
On the figure above I have depicted the VMware Metro Storage Cluster consist of
  • Two availability zones (Site A, Site B)
  • Single centralized vSphere Management (vCenter A)
  • Single stretched storage volume(s) distributed across two storage system each in different availability zone (Site A, Site B)
  • VMware vSphere Cluster stretched across two availability zone (Site A, Site B)
  • Third location (Site C) for storage witness. If the third site is not available, the witness can be placed in Site A or B but storage administrator is the real arbiter in case of potential split-brain scenarios
Advantages of such architecture are
  • Cross-site high availability (positive impact on Availability, thus Business Continuity)
  • Cross-site vMotion (good for Disaster Avoidance)
  • Protects against single storage system (storage in one site) failure scenario
  • Protects against single availability zone (one site) failure scenario
  • Self-initiated fail-over procedure.
Drawbacks
  • vMSC is tightly integrated distributed cluster system between vSphere HA Cluster and Storage Metro Cluster, therefore it is potential single fault zone. Stretched LUN(s) is a single fault zone for issues caused by the distributed storage system or the bad behavior of cluster filesystem (VMFS)
  • Typically, the third location is required for storage witness
  • It is usually very difficult to test HA
  • It is almost impossible to test DR
VMware Site Recovery Manager in Classic Architecture - Disaster Recovery Solution

VMware Site Recovery Manager - Classic Architecture
On the figure above I have depicted the classic architecture of VMware DR solution (Site Recovery Manager) consist of
  • Two availability zones (Site A, Site B)
  • Two independent vSphere Management servers (vCenter A, vCenter B)
  • Two independent DR orchestration servers (SRM A, SRM B)
  • Two independent vSphere Clusters
  • Two independent storage systems. One in Site A, second in Site B
  • Synchronous or asynchronous data replication between storage systems
  • Snapshots (multiple recovery points) on backup site are optional but highly recommended if you do DR planning seriously.
Advantages of such architecture are
  • Cross-site disaster recoverability (positive impact on Recoverability, thus Business Continuity)
  • Maximal infrastructure independence, therefore we have two independent fault zones. The only connection between the two sites is storage (data) replication.
  • Human-driven and well-tested disaster recovery procedure.
  • Disaster Avoidance (migration of applications between sites) can be achieved but only with business service downtime. Protection Group has to be shut down on one site and restarted on another site.
Drawbacks
  • Disaster Avoidance without service disruption is not available.
  • Usually, there is a huge level of effort with application dependency mapping and application-specific recovery plans (Automated or Semi-automated Run Books) has to be planned, created and tested
VMware Site Recovery Manager in Stretched Storage Architecture - Disaster Recovery and Avoidance Solution

VMware Site Recovery Manager - Stretched Storage Architecture
On the last figure, I have depicted the new architecture of VMware DR solution (Site Recovery Manager). In this architecture, SRM supports stretched storage volumes but everything else is independent and specific for each site. The solution consists of
  • Two availability zones (Site A, Site B)
  • Two independent vSphere Management servers (vCenter A, vCenter B)
  • Two independent DR orchestration servers (SRM A, SRM B)
  • Two independent vSphere Clusters
  • Single distributed storage systems having storage volumes stretched across Site A and Site B
  • Snapshots (multiple recovery points) on backup site are optional but highly recommended if you do DR planning seriously.
Advantages of such architecture are
  • Cross-site disaster recoverability (positive impact on Recoverability, thus Business Continuity)
  • Maximal infrastructure independence, therefore we have two independent fault zones. The only connection between the two sites is storage (data) replication.
  • Human-driven and well-tested disaster recovery procedure.
  • Disaster Avoidance without service disruption leveraging cross vCenter vMotion technology.
Drawbacks
  • Usually, there is a huge level of effort with application dependency mapping and application-specific recovery plans (Automated or Semi-automated Run Books) has to be planned, created and tested
  • Virtual Machine internal identifier (moRef ID) is changed after cross vCenter vMotion, therefore your supporting solutions (backup software, monitoring software, etc.) must not be dependent on this identifier.
CONCLUSION

Infrastructure availability and recoverability are two independent infrastructure qualities. Both of them have a positive impact on business continuity but each solves the different situation. High Availability solutions are increasing the reliability of the system with more redundancy and self-healing automated failover among redundant system components. Recoverability solutions are data backups from one system and allow a full recovery in another independent system. Both solutions can and should be combined in compliance with SLA/OLA requirements.

VMware Metro Storage Cluster is great High Availability technology but it should not be used as a replacement for disaster recovery technology. VMware Metro Storage Cluster is not a Disaster Recovery solution even it can protect the system against two specific disaster scenarios (single site failure, single storage system failure). You also do not call VMware "vSphere HA Cluster" as DR solution even it can protect you against single ESXi host failure.

The final infrastructure architecture always depends on specific use cases, requirements and expectations of the particular customer but expectations should be set correctly and we should know what designed system does and what does not. It is always better to know potential risks and not have unknown risks. For known risks, mitigation or contingency plan can be prepared and communicated to system users and business clients. You cannot do it for unknown risks.

Other resources
There are other posts on the blogosphere explaining what vMSC is and is NOT.

“VMware vMSC can give organizations many of the benefits that a local high-availability cluster provides, but with geographically separate sites. Stretched clustering, also called distributed clustering, allows an organization to move virtual machines (VMs) between two data centers for failover or proactive load balancing. VMs in a metro storage cluster can be live migrated between sites with vSphere vMotion and vSphere Storage vMotion. The configuration is designed for disaster avoidance in environments where downtime cannot be tolerated, but should not be used as an organization's primary disaster recovery approach.”
Another very nice technical write up about vMSC is here - The dark side of stretched clusters

Tuesday, November 13, 2018

VCSA - This appliance cannot be used or repaired ...

I have just got an email from my customer describing the weird issue with VMware vCenter Server Appliance (aka VCSA).

The customer is doing weekly native backups of VCSA manually via VAMI. He wanted to run VCSA native backup again but when he tried to log into virtual appliance management interface (VAMI) he is getting the following error message

Error message - This appliance cannot be used or repaired because of failure was encountered. You need to deploy a new appliance.
The error message includes a resolution. Deploy a new appliance. The recommended solution is the last thing a typical vSphere admin would like to resolve such an issue. Fortunately enough, there is another solution/workaround.

To resolve this issue stop and start all the services on the vCSA,
  • Putty/SSH to vCenter server appliance.
  • Login to VCSA using the root credentials.
  • Enabled "shell".
  • Restart VCSA services

To restart VCSA services run the following commands:
service-control --stop --all
service-control --start --all

In case, simple services restart does not help, you can have an issue with some recent backup job. In such a case, there is another resolution with an additional workaround
  • Putty/SSH to vCenter server appliance.
  • Login to vCSA using the root credentials.
  • Enabled "shell".
  • Move the /var/vmware/applmgmt/backupRestore-history.json file to /var/tmp/.
  • Restart the vCenter Server Appliance.
Hope this helps other folks in VMware community.    

Sunday, November 11, 2018

Intel Software Guard Extensions (SGX) in VMware VM

Yesterday, I have got a very interesting question. I have been asked by a colleague of mine if Intel SGX can be leveraged within VMware virtual machine. We both work for VMware as TAMs (Technical Account Managers), therefore we are the first stop for similar technical questions of our customers.

I'm always curious what is the business reason behind any technical question. The question comes from the customer of my colleague who is going to run infrastructure for some "Blockchain" applications leveraging Intel SGX CPU feature set. The customer would like to run these applications virtualized on top of VMware vSphere to
  • simplify infrastructure management and capacity planning
  • increase server high availability 
  • optimize compute resource management

However, support of SGX is mandatory for such type of application, therefore if Virtual Machines do not support it, they are forced to run it on bare metal.

We, as VMware TAMs, can leverage a lot of internal resources, however, I personally believe that nothing compares to the own testing. After few hours of testing in the home lab, I feel more confidential to discuss the subject with other folks internally within VMware or externally with my customers. By the way, that's the reason I have my own vSphere home lab and this is a very nice example of justification to me and my family why I have invested a pretty decent money into the home lab in our garage. But back to the topic.

Let's start with the terminology and testing method

Intel Software Guard Extensions (SGX) 
is a set of the central processing unit (CPU) instruction codes from Intel that allows user-level code to allocate private regions of memory, called enclaves, that are protected from processes running at higher privilege levels.[Intel designed SGX to be useful for implementing a secure remote computation, secure web browsing, and digital rights management (DRM). [source]

The CPUID opcode is a processor supplementary instruction (its name derived from CPU IDentification) for the x86 architecture allowing software to discover details of the processor. It was introduced by Intel in 1993 when it introduced the Pentium and SL-enhanced 486 processors. [source]

It is worth to read the document "Properly Detecting Intel® Software Guard Extensions (Intel® SGX) in Your Applications" [source]

The most interesting part is ...
What about CPUID?The CPUID instruction is not sufficient to detect the usability of Intel SGX on a platform. It can report whether or not the processor supports the Intel SGX instructions, but Intel SGX usability depends on both the BIOS settings and the PSW. Applications that make decisions based solely on CPUID enumeration run the risk of generating a #GP or #UD fault at runtime.In addition, VMMs (for example, Hyper-V*) can mask CPUID results, and thus a system may support Intel SGX even though the results of the CPUID report that the Intel SGX feature flag is not set.
For our purpose, CPUID detection should be enough as we can test it on bare metal OS and later on Guest OS running inside Virtual Machine. The rest of testing is on the application itself but it is out of this blog post scope.

Another article worth to read is "CPUID — CPU Identification" [source]. The most interesting part of this document is ...
INPUT EAX = 12H: Returns Intel SGX Enumeration InformationWhen CPUID executes with EAX set to 12H and ECX = 0H, the processor returns information about Intel SGX capabilities. 
And the most useful resource is https://github.com/ayeks/SGX-hardware
There is a GNU C source code to test SGX support and clear explanation on how to identify support within the operating system. I have used my favorite OS FreeBSD and simply downloaded the code from GitHub

fetch https://raw.githubusercontent.com/ayeks/SGX-hardware/master/test-sgx.c

compile it

cc test-sgx.c -o test-sgx

and run executable application

./test-sgx

and you can see the application (test-sgx) output with information about SGX support. The output should be similar to this one.

 root@test-sgx-vmhw4:~/sgx # ./test-sgx  
 eax: 406f0 ebx: 10800 ecx: 2d82203 edx: fabfbff 
 stepping 0 
 model 15 
 family 6 
 processor type 0 
 extended model 4 
 extended family 0 
 smx: 0 
 Extended feature bits (EAX=07H, ECX=0H) 
 eax: 0 ebx: 0 ecx: 0 edx: 0 
 sgx available: 0 
 CPUID Leaf 12H, Sub-Leaf 0 of Intel SGX Capabilities (EAX=12H,ECX=0) 
 eax: 0 ebx: 440 ecx: 0 edx: 0 
 sgx 1 supported: 0 
 sgx 2 supported: 0 
 MaxEnclaveSize_Not64: 0 
 MaxEnclaveSize_64: 0 
 CPUID Leaf 12H, Sub-Leaf 1 of Intel SGX Capabilities (EAX=12H,ECX=1) 
 eax: 0 ebx: 3c0 ecx: 0 edx: 0 
 CPUID Leaf 12H, Sub-Leaf 2 of Intel SGX Capabilities (EAX=12H,ECX=2) 
 eax: 0 ebx: 0 ecx: 0 edx: 0 
 CPUID Leaf 12H, Sub-Leaf 3 of Intel SGX Capabilities (EAX=12H,ECX=3) 
 eax: 0 ebx: 0 ecx: 0 edx: 0 
 CPUID Leaf 12H, Sub-Leaf 4 of Intel SGX Capabilities (EAX=12H,ECX=4) 
 eax: 0 ebx: 0 ecx: 0 edx: 0 
 CPUID Leaf 12H, Sub-Leaf 5 of Intel SGX Capabilities (EAX=12H,ECX=5) 
 eax: 0 ebx: 0 ecx: 0 edx: 0 
 CPUID Leaf 12H, Sub-Leaf 6 of Intel SGX Capabilities (EAX=12H,ECX=6) 
 eax: 0 ebx: 0 ecx: 0 edx: 0 
 CPUID Leaf 12H, Sub-Leaf 7 of Intel SGX Capabilities (EAX=12H,ECX=7) 
 eax: 0 ebx: 0 ecx: 0 edx: 0 
 CPUID Leaf 12H, Sub-Leaf 8 of Intel SGX Capabilities (EAX=12H,ECX=8) 
 eax: 0 ebx: 0 ecx: 0 edx: 0 
 CPUID Leaf 12H, Sub-Leaf 9 of Intel SGX Capabilities (EAX=12H,ECX=9) 
 eax: 0 ebx: 0 ecx: 0 edx: 0 

Let's continue with testing of various combination

I have the home lab based on Intel NUC 6i3SYH, which have support for SGX. SGX has to be enabled on BIOS. There are three SGX options within BIOS
  • Software Controlled (default)
  • Disabled
  • Enabled

 
BIOS screenshot
First of all, let's do three tests of SGX support on bare metal (Intel NUC 6i3SYH). I have installed FreeBSD 11.0 on USB disk and tested SGX with all three BIOS options related to SGX.


Physical hardware (Software Controlled SGX)

eax: 406e3 ebx: 100800 ecx: 7ffafbbf edx: bfebfbff
stepping 3
model 14
family 6
processor type 0
extended model 4
extended family 0
smx: 0

Extended feature bits (EAX=07H, ECX=0H)
eax: 0 ebx: 29c6fbf ecx: 0 edx: 0
sgx available: 1 (TRUE)

CPUID Leaf 12H, Sub-Leaf 0 of Intel SGX Capabilities (EAX=12H,ECX=0)
eax: 0 ebx: 0 ecx: 0 edx: 0
sgx 1 supported: 0 (FALSE)
sgx 2 supported: 0
MaxEnclaveSize_Not64: 0 (FALSE)
MaxEnclaveSize_64: 0 (FALSE)

Test result: SGX is available for CPU but not enabled in BIOS


Physical hardware (Disabled SGX)

eax: 406e3 ebx: 1100800 ecx: 7ffafbbf edx: bfebfbff
stepping 3
model 14
family 6
processor type 0
extended model 4
extended family 0
smx: 0

Extended feature bits (EAX=07H, ECX=0H)
eax: 0 ebx: 29c6fbf ecx: 0 edx: 0
sgx available: 1 (TRUE)

CPUID Leaf 12H, Sub-Leaf 0 of Intel SGX Capabilities (EAX=12H,ECX=0)
eax: 0 ebx: 0 ecx: 0 edx: 0
sgx 1 supported: 0 (FALSE)
sgx 2 supported: 0
MaxEnclaveSize_Not64: 0 (FALSE)
MaxEnclaveSize_64: 0 (FALSE)

Test result: SGX is available for CPU but not enabled in BIOS


Physical hardware (Enabled SGX)

eax: 406e3 ebx: 100800 ecx: 7ffafbbf edx: bfebfbff
stepping 3
model 14
family 6
processor type 0
extended model 4
extended family 0
smx: 0

Extended feature bits (EAX=07H, ECX=0H)
eax: 0 ebx: 29c6fbf ecx: 0 edx: 0
sgx available: 1 (TRUE)

CPUID Leaf 12H, Sub-Leaf 0 of Intel SGX Capabilities (EAX=12H,ECX=0)
eax: 1 ebx: 0 ecx: 0 edx: 241f
sgx 1 supported: 1 (TRUE)
sgx 2 supported: 0
MaxEnclaveSize_Not64: 1f (OK)
MaxEnclaveSize_64: 24 (OK)

Test result: SGX is available for CPU and enabled in BIOS



So, we have validated that SGX capabilities are available on FreeBSD operating system running on bare metal when SGX is enabled in BIOS.

Next step is to repeat tests on Virtual Machines running on top of VMware hypervisor (ESXi) installed on the same physical hardware (Intel NUC 6i3SYH).  At the moment, I have vSphere 6.5 (ESXi build 7388607) which support VM hardware up to version 13. Let's run SGX tests on very old VM hardware 4 and on fresh VM hardware 13. All test with VMs were executed on physical system with explicitly enabled SGX in BIOS.



VM hardware version 4

eax: 406f0 ebx: 10800 ecx: 2d82203 edx: fabfbff
stepping 0
model 15
family 6
processor type 0
extended model 4
extended family 0
smx: 0

Extended feature bits (EAX=07H, ECX=0H)
eax: 0 ebx: 0 ecx: 0 edx: 0
sgx available: 0 (FALSE)

CPUID Leaf 12H, Sub-Leaf 0 of Intel SGX Capabilities (EAX=12H,ECX=0)
eax: 0 ebx: 440 ecx: 0 edx: 0
sgx 1 supported: 0 (FALSE)
sgx 2 supported: 0
MaxEnclaveSize_Not64: 0 (FALSE)
MaxEnclaveSize_64: 0 (FALSE)

Test result: SGX is not available for CPU in VM hardware version 4


VM hardware version 13

eax: 406f0 ebx: 10800 ecx: fffa3203 edx: fabfbff
stepping 0
model 15
family 6
processor type 0
extended model 4
extended family 0
smx: 0

Extended feature bits (EAX=07H, ECX=0H)
eax: 0 ebx: 1c2fbb ecx: 0 edx: 0
sgx available: 0 (FALSE)

CPUID Leaf 12H, Sub-Leaf 0 of Intel SGX Capabilities (EAX=12H,ECX=0)
eax: 7 ebx: 340 ecx: 440 edx: 0
sgx 1 supported: 1 (TRUE)
sgx 2 supported: 1 (TRUE)
MaxEnclaveSize_Not64: 0 (FALSE)
MaxEnclaveSize_64: 0 (FALSE)

Test result: CPU SGX functions are deactivated or SGX is not supported

Conclusion

To leverage Intel SGX CPU capabilities in the application, the physical hardware must support SGX and SGX must be enabled on BIOS. 
Note: Explicitly enabled SGX within BIOS has been successfully tested in operating system FreeBSD 11 running on bare metal (physical servers). It might work with BIOS option "Software Controlled" but it would require software enablement within Guest OS. I was not testing such scenario, therefore another testing would be required to prove such an assumption.
Operating system FreeBSD 11 has been tested on bare metal with enabled SGX in BIOS and in such configuration SGX CPU capabilities has been successfully identified within operating system.
SGX support in virtual machines on top of VMware Hypervisor (ESXi 6.5) has been tested solely on physical hardware with SGX explicitly enabled in BIOS. 
Unfortunately, SGX has NOT been successfully identified even on the latest VM hardware for vSphere 6.5 (VM Hardware ver 13) even the CPU capabilities identified in VM hardware 13 by Guest Operating System are significantly extended in comparison to VM hardware 4.
I will try to upgrade my home lab to the latest vSphere 6.7 U1 and do additional testing with VM hardware version 14. In the meantime, I will open discussion inside VMware organization about SGX support because, at the moment, one large VMware customer cannot virtualize a specific type of applications even they would like to. 


Thursday, October 11, 2018

VMware virtual disk (VMDK) in Multi Write Mode

VMFS is a clustered file system that disables (by default) multiple virtual machines from opening and writing to the same virtual disk (vmdk file). This prevents more than one virtual machine from inadvertently accessing the same vmdk file. This is the safety mechanism to avoid data corruption in cases where the applications in the virtual machine do not maintain consistency in the writes performed to the shared disk. However, you might have some third-party cluster-aware application, where the multi-writer option allows VMFS-backed disks to be shared by multiple virtual machines and leverage third-party OS/App cluster solutions to share a single VMDK disk on VMFS filesystem. These third-party cluster-aware applications, in which the applications ensure that writes originate from multiple different virtual machines, does not cause data loss. Examples of such third-party cluster-aware applications are Oracle RAC, Veritas Cluster Filesystem, etc.

There is VMware KB “Enabling or disabling simultaneous write protection provided by VMFS using the multi-writer flag (1034165)” available at https://kb.vmware.com/kb/1034165 KB describes how to enable or disable simultaneous write protection provided by VMFS using the multi-writer flag. It is the official resource how to use multi-write flag but the operational procedure is a little bit obsolete as vSphere 6.x supports configuration from WebClient (Flash) or vSphere Client (HTML5) GUI as highlighted in the screenshot below.


However, KB 1034165 contains several important limitations which should be considered and addressed in solution design. Limitations of multi-writer mode are:
  • The virtual disk must be eager zeroed thick; it cannot be zeroed thick or thin provisioned.
  • Sharing is limited to 8 ESXi/ESX hosts with VMFS-3 (vSphere 4.x) and VMFS-5 (vSphere 5.x) and VMFS-6 in multi-writer mode.
  • Hot adding a virtual disk removes Multi-Writer Flag. 

Let’s focus on 8 ESXi host limit. The above statement about scalability is a little bit unclear. That’s the reason why one of my customers has asked me what does it really mean. I did some research on internal VMware resources and fortunately enough I’ve found internal VMware discussion about this topic, so I think sharing the info about this topic will help to broader VMware community.

Here is 8 host limit explanation in other words …

“8 host limit implies how many ESXi hosts can simultaneously open the same virtual disk (aka VMDK file). If the cluster-aware application is not going to have more than 8 nodes, it works and it is supported. This limitation applies to a group of VMs sharing the same VMDK file for a particular instance of the cluster-aware application. In case, you need to consolidate multiple application clusters into a single vSphere cluster, you can safely do it and app nodes from one app cluster instance can run on other ESXi nodes than app nodes from another app cluster instance. It means that if you have more than one app cluster instance, all app cluster instances can leverage resources from more than 8 ESXi hosts in vSphere Cluster.”
   
The best way to fully understand specific behavior is to test it. That’s why I have a pretty decent home lab. However, I do not have 10 physical ESXi host, therefore I have created a nested vSphere environment with vSphere Cluster having 9 ESXi hosts. You can see vSphere cluster with two App Cluster Instances (App1, App2) on the screenshot below.

Application Cluster instance App1 is composed of 9 nodes (9 VMs) and App2 instance just from 2 nodes. Each instance is sharing their own VMDK disk. The whole test infrastructure is conceptually depicted on the figures below.

Test Step 1: I have started 8 of 9 VMs of App1 cluster instance on 8 ESXi hosts (ESXi01-ESXi08). Such setup works perfectly fine as there is 1 to 1 mapping between VMs and ESX hosts within the limit of 8 ESXi hosts having shared VMDK1 opened.

Test Step 2: Next step is to test the Power-On operation of App1-VM9 on ESXi09. Such operation fails. This is expected result because 9th ESXi host cannot open the VMDK1 file on VMFS datastore.



The error message is visible on the screenshot below.


Test Step 3: Next step is to Power On App1-VM9 on ESXi01. This operation is successful as two app cluster nodes (virtual machines App1-VM1 and App1-VM9) are running on single ESXi host (ESX01) therefore only 8 ESXi hosts have the VMDK1 file open and we are in the supported limits.

Test Step 4: Let’s test vMotion of App1-VM9 from ESXi01 to ESX09. Such operation fails. This is expected result because of the same reason as on Power-On operation. App1 Cluster instance would be stretched across 9 ESXi hosts but 9th ESXi host cannot open VMDK1 file on VMFS datastore.




The error message is a little bit different but the root cause is the same.


Test Step 5: Let’s test vMotion of App2-VM2 from ESXi08 to ESX09. Such operation works because App2 Cluster instance is still stretched across two ESXi hosts only so it is within supported 8 ESXi hosts limit.


Test step 6: The last test is the vMotion of App2-VM2 from vSphere Cluster (ESXi08) to standalone ESXi host outside of the vSphere cluster (ESX01). Such operation works because App2 Cluster instance is still stretched across two ESXi hosts only so it is within supported 8 ESXi hosts limit. vSphere cluster is not the boundary for multi-write VMDK mode.


FAQ

Q: What exactly does it mean the limitation of 8 ESXi hosts?
A: 8 ESXi host limit implies how many ESXi hosts can simultaneously open the same virtual disk (aka VMDK file). If the cluster-aware application is not going to have more than 8 nodes, it works and it is supported. Details and various scenarios are described in this article.

Q: Where are stored the information about the locks from ESXi hosts?
A: The normal VMFS file locking mechanism is in use, therefore there are VMFS file locks which can be displayed by ESXi command: vmkfstools -D
The only difference is that multi-write VMDKs can have multiple locks as is shown in the screenshot below.


Q: Is it supported to use DRS rules for vmdk multi-write in case that is more than 8 ESXi hosts in the cluster where VMs with configured multi-write vmdks are running?
A: Yes. It is supported. DRS rules can be beneficial to keep all nodes of the particular App Cluster Instance on specified ESXi hosts. This is not necessary nor required from the technical point of view, but it can be beneficial from a licensing point of view.  

Q: How ESXi life cycle can be handled with the limit 8 ESXi hosts?
A: Let’s discuss specific VM operations and supportability of multi-write vmdk configuration. The source for the answers is VMware KB https://kb.vmware.com/kb/1034165
·      Power on, off, restart virtual machine – supported
·      Suspend VM – unsupported
·      Hot add virtual disks - only to existing adapters
·      Hot remove devices – supported
·      Hot extend virtual disk – unsupported
·      Connect and disconnect devices – supported
·      Snapshots – unsupported
·      Snapshots of VMs with independent-persistent disks – supported
·      Cloning – unsupported
·      Storage vMotion – unsupported
·      Changed Block Tracking (CBT) – unsupported
·      vSphere Flash Read Cache (vFRC) – unsupported
·      vMotion – supported by VMware for Oracle RAC only and limited to 8 ESX/ESXi hosts. Note: other cluster-aware applications are not supported by VMware but can be supported by partners. For example, Veritas products have supportability documented here https://sort.veritas.com/public/documents/sfha/6.2/vmwareesx/productguides/html/sfhas_virtualization/ch01s05s01.htm Please, verify current supportability directly with specific partners.

Q: Is it possible to migrate VMs with multi-write vmdks to different cluster when it will be offline?
A: Yes. VM can be Shut Down or Power Off and Power On on any ESXi host outside of the vSphere cluster. The only requirement is to have the same VMFS datastore available on source and target ESXi host. Please, keep in mind that the maximum supported number of ESXi hosts connected to a single VMFS datastore is 64.