Saturday, September 01, 2018

New with vSphere 6.7 U1 - Enhanced Load Balancing Path Selection Policy

With the release of vSphere 6.7 U1, there are now sub-policy options for VMW_PSP_RR to enable active monitoring of the paths. The policy considers path latency and pending IOs on each active path. This is accomplished with an algorithm that monitors active paths and calculates average latency per path based on either time and/or the number of IOs. When the module is loaded, the latency logic will get triggered and the first 16 IOs per path are used to calculate the latency. The remaining IOs will then be directed based on the results of the algorithm’s calculations to use the path with the least latency. When using the latency mechanism, the Round Robin policy can dynamically select the optimal path and achieve better load balancing results.

The user must enable the configuration option to use latency based sub-policy for VMW_PSP_RR:
esxcfg-advcfg -s 1 /Misc/EnablePSPLatencyPolicy
To switch to latency based sub-policy, use the following command:
esxcli storage nmp psp roundrobin deviceconfig set -d --type=latency
If you want to change the default evaluation time or the number of sampling IOs to evaluate latency,
use the following commands.

For Latency evaluation time (default is 15000 = 15 sec):
esxcli storage nmp psp roundrobin deviceconfig set -d --type=latency --latency-eval-time=18000
For the number of sampling IOs:
esxcli storage nmp psp roundrobin deviceconfig set -d -- type=latency --num-sampling-cycles=32
To check the device configuration and sub-policy:
esxcli storage nmp device list -d
The diagram below shows how sampling IOs are monitored on paths P1, P2, and P3 and eventually selected. The time “t” sampling window starts. In the sampling window, IOs are issued on each path in Round Robin fashion and their round-trip time is monitored. Path P1 took 10ms to complete in total for 16 sampling IOs. Similarly, path P2 took 20ms for the same number of sampling IOs and path P3 took 30ms. As path P1 has the lowest latency, path P1 will be selected more often for IOs. Then the sampling window again starts at ‘T’. Both “m” and “T” are tunable parameters but we would suggest to not change these parameters as they are set to a default value based on the experiments ran internally while implementing it.

The diagram, how sampling IOs are monitored and selected.
Legend: 
T = Interval after sampling should start again 
m = Sampling IOs per path

t1 < t2 < t3 ---------------> 10ms < 20ms < 30ms 
t1/m < t2/m < t3/m -----> 10/16 < 20/16 < 30/16

With the testing, VMware found that with the new latency monitoring policy, even with latency introduced up to 100ms on half the paths, the PSP sub-policy maintained almost full throughput.
Setting the values for the round robin sub-policy can be accomplished via CLI or using host-profiles.

VMworld US 2018 - VIN2416BU - Core Storage Best Practices

As in previous years, William Lam (www.virtuallyghetto.com) has published URLs to VMworld US 2018 Breakout Sessions. William wrote the blog post about it and created GitHub repo vmworld2018-session-urls available at http://vmwa.re/vmworld2018. Direct link to US sessions is here https://github.com/lamw/vmworld2018-session-urls/blob/master/vmworld-us-playback-urls.md

I'm going to watch sessions from areas of my interest and write my thoughts and interesting findings in my blog. So stay tuned and come back to read future posts, if interested. Let's start with VMworld 2018 session VIN2416BU - Core Storage Best Practices Speakers: Jason Massae, Cody Hosterman.  This technical session is about vSphere core storage topics. In the beginning, Jason shared with the audience GSS top storage issues and customers challenges. These are PSA, iSCSI, VMFS, NFS, VVols, Trim/Unmap, Queueing, Troubleshooting. General recommendation by Jason and Cody is to validate any vSphere storage change and adjustment with particular storage vendor and VMware. Customers should change advanced settings only when recommended by storage vendor or VMware GSS.

SATP and PSP
Cody follows with the basic explanation of how SATP and PSP works. Then he explains why some storage vendors recommend adjusting the default Round Robin I/Os quantity per single path until switching to the next path. The reason is not the performance but faster failover in case of some storage paths issues. If you want to know how to change such setting, read VMware KB 2069356.

iSCSI
Jason continues with the iSCSI topic which is, based on VMware GSS, the #1 problem. The first recommendation is to not expose LUNs used for the virtual environment to some other external systems or functions. The only exception might be RDM LUNs used for OS level clustering, but this is the special use case. Another topic is the teaming and port binding. Some kind of the teaming is highly recommended. The iSCSI port binding is preferred. The port binding will give you load balancing and fail-over not only on link failure but also on SCSI sense code. This will help you in situations when the network path is OK but the target LUN is not available for whatever reasons. I wrote the blog post about this topic here. It is about the advanced option enable_action_OnRetryErrors and as far as I know, it was not enabled by default in vSphere 6.5 but probably is in 6.7. I did not test it so it should be validated before implemented into production. Jason explained, that in vSphere 6.0 and below, port binding did not allow you to do network L3 routing, therefore NIC teaming was the only way to go in environments where initiators and targets (iSCSI portals) were in different IP subnets. However, since vSphere 6.5, iSCSI can leverage dedicated TCP/IP stack, therefore default gateway can be specified for VMkernel port used for iSCSI. So from vSphere 6.5 port-binding is highly recommended over NIC teaming. After, Jason explains the difference among software iSCSI adapter, dependent iSCSI adapters (iSCSI offloaded to the hardware), iSER (iSCSI over RDMA).

The mic is handed over to Cody, and Cody is sharing with the audience the information about the new Round Robin enhanced policy, available in vSphere 6.7 U1 which considers I/O latency for path selection. For more info read my blog post here.

VMFS
Cody moves to VMFS topic. VMFS 6 has a lot of enhancements. You cannot upgrade VMFS 5 to VMFS 6, therefore you have to create new datastore, format it to VMFS 6 and use storage vMotion to migrate VMs from VMFS 5. Cody is trying to answer typical customers questions. How big datastores I should do? How many VMs I should accommodate in a single datastore? Do not expect the exact answer. The answer is, it depends on your storage architecture, performance, and capabilities. Everything begins with the question if your array supports VAAI (VMware API for Array Integration) but there are other questions you have to answer your self, like what granularity of recoverability do you expect, etc. Cody joked a little bit and at the beginning of VMFS section of the presentation, he shared his opinion, the best VMFS datastore is VVols datastore :-)

NFS
Jason continued with NFS best practices. The interesting one was about the usage of Jumbo Frames. Jason's Jumbo Frames recommendation is to use it only when it is already configured in your network. In other words, it is not worth to enable it and believe you will get significantly better performance.

VVols
Another topic is VVols. Cody starts with general explanation what VVols are and are NOT. Cody highlight that with VVols you can finally use VVols snapshots without any performance overhead because it is offloaded to storage hardware. Cody explains the basic of VVols terminology and architecture. VVols use one or more Protocol Endpoints. Protocol Endpoint is nothing else then LUN with ID 254 and used as Administrative Logical Unit (ALU). Protocol Endpoint is used to handle the data but within the storage, there are Virtual Volumes also known as Subsidiary Logical Units (SLU) having VVol Sub-LUN IDs (for example 254:7) where vSphere storage objects (VM home directory, VMDKs, SWAPs, Snapshots) are stored. This session is not about VVols, therefore, it is really just a brief overview.

Trim/UNMAP and Space Reclamation on Thin Provisioned storage
The mic is handed over to Jason who starts another topic - Trim/UNMAP and Space Reclamation. Jason informs auditorium that Trim/UNMAP functionality depends on vSphere version and other circumstances.

In vSphere 6.0, Trim/UNMAP is a manual operation (esxcli storage vmfs unmap). In vSphere 6.5, it is automated with VMFS-6, it is enabled by default, can be configured to Low Priority or Off and it takes 12-24 hours to clean up space. In vSphere 6.7 adds configurable throughput limits where the default is Low Priority.

Space reclamation also works differently on each vSphere edition.

  • In vSphere 6.0 it works when virtual disks are thin, VM hardware is 11+, and only for MS Windows OS. It does NOT work with VMware snapshots, CBT enabled, Linux OS, UNMAPs are misaligned. 
  • In vSphere 6.5, it works when virtual disks are thin, it works for Windows and Linux OS, VM hardware is 11+ for MS Windows OS, VM hardware 13 for Linux OS, CBT can be enabled, UNMAPs can be misaligned. It does NOT work when VM has VMware snapshots, with thick virtual disks, when Virtual NVMe adapter is used.
  • In vSphere 6.7it works when virtual disks are thin, it works for Windows and Linux OS, VM hardware is 11+ for MS Windows OS, VM hardware 13 for Linux OS, CBT can be enabled, UNMAPs can be misaligned, VM snapshots supported, Virtual NVMe adapter is supported. It does NOT work for thick provisioned virtual disks.
Queuing
The mic is handed over back to Cody to speak about queueing. At the beginning of this section, Cody explains how storage queuing works in vSphere. He shares the default values of HBA Device Queues (aka HBA LUN queues) for different HBA types:

  • QLogic - 64
  • Brocade - 32
  • Emulex - 32
  • Cisco UCS (VIC) - 32
  • Software iSCSI - 128
HBA Device Queue is an HBA setting which controls how many I/Os may be queued on a device (aka LUN). Default values are configurable via esxcli. Changing requires reboot. Details are documented in VMware KB 1267. After the explanation of "HBA Device Queue", Cody explains DQLEN which is Hypervisor level device queue limit. The actual device queue depth is a minimum from "HBA Device Queue" and DQLEN. In a mathematical formula, it is a MIN("HBA Device Queue", DQLEN). Therefore, if you increase DQLEN you have to also adjust "HBA Device Queue" for some real effect. VMFS DQLEN defaults are:

  • VMFS - 32
  • RDM - 32
  • VVols Protocol Endpoints - 128 (Scsi.ScsiVVolPESNRO)
After basic problem explanation, Cody does some quick math to stress that default settings are ok for most environments and it usually does not make sense to change it. However, if you have specific storage performance requirements, you have to understand the whole end-to-end storage stack and then you can adjust it appropriately and do a performance tunning. If you do any change, you should do it on all ESXi hosts in the cluster to keep performance consistent after migration from one ESXi host to another.

Storage DRS (SDRS) and Storage I/O Control (SIOC)
Cody continues with SDRS and SIOC. SDRS and SIOC are two different technologies.

SDRS moves VMS around based on hitting a latency threshold. This is the VM observed latency which includes any latency induced by queueing.

SIOC controls throttles VMs based on hitting a datastore latency threshold. SIOC is using a device (LUN) latency to throttle device queue depth automatically when the device is stressed. If SIOC kicks in, it takes into consideration VMDK shares, therefore VMDKs with higher shares have more frequent access to stressed (overloaded) storage device.

Jason notes that what Cody explained is how SIOC version 1 works but there is also SIOC version 2 introduced in vSphere 6.5. SIOC v1 and v2 are different. SIOC v1 looks at datastore level, SIOC v2 is policy based setting. It is good to know that SIOC v1 and SIOC v2 can co-exist on vSphere 6.5+. SIOC V2 is considerably different from a user experience perspective when compared to V1. SIOCv2 is implemented using IO Filter framework Storage IO Control category. SIOC V2 can be managed using SPBM Policies. What this means is that you create a policy which contains your SIOC specifications, and these policies are then attached to virtual machines. One thing to note is that IO Filter based IOPS does not look at the size of the IO. For example, there is no normalization like in SIOC v1 so that a 64K IOP is not equal to 2 x 32K IOPS. It is a fixed value of IOPS irrespective of the size of the IO. For more information about SIOC look here.

Troubleshooting 
The last section of the presentation is about troubleshooting. It is presented by Jason. When you have to do a storage performance troubleshooting, start with reviewing performance graph and try to narrow down the issue. Looks at VM and ESXi host. Select only problem component. You have to build your troubleshooting toolbox. It should include

  • Performance Graph
  • ESXTOP
  • vRealize LogInsight
  • vSphere On-disk Metadata Analyzer (VOMA)
  • Enable CEIP (Customer Experience Improvement Program) which shares troubleshooting data with VMware Support (GSS). CEIP is actually call-home functionality which is opt-out in vSphere 6.5+

Session evaluation 
This is a very good technical session for infrastructure folks responsible for VMware vSphere and storage interoperability. I highly recommend to watch it.

Monday, August 27, 2018

VMworld 2018 announcements

In this post, I would like to summarize the coolest VMworld 2018 announcements.

Project Dimension
On-premise managed vSphere infrastructure in a cloudy fashion. Project Dimension will extend VMware Cloud to deliver SDDC infrastructure and hardware as-a-service to on-premises locations.  Because this will be a service, it means that VMware can take care of managing the infrastructure, troubleshooting issues, and performing patching and maintenance. For more info read this blog post.

Project Magna
Project Magna will make possible a self-driving data center based on machine learning. It is focused on applying reinforcement learning to a data center environment to drive greater performance and efficiencies. The demonstration illustrated how Project Magna can learn and understand application behavior to the point that it can model, test, and then reconfigure the network to a make it more optimal to improve performance. Project Magna relies on artificial intelligence algorithms to help connect the dots across huge data sets and gain deep insights across applications and the stack from application code, to software to hardware infrastructure, to the public cloud and the edge.

vSphere Platinum
VMware vSphere Platinum is a new edition of vSphere that delivers advanced security capabilities fully integrated into the hypervisor. This new release combines the industry-leading capabilities of vSphere with VMware AppDefense, delivering purpose-built VMs to secure applications. For more info read this blog post.

VMware ESXi 64-bit Arm Support. 
ESXi will probably run on Cavium ThunderX2 servers. Cavium ThunderX2 has very interesting specifications.

vSphere 6.7 Update 1
VMware announced vSphere 6.7 Update 1, which includes some key new and enhanced capabilities. Here are some highlights:
  • Fully Featured HTML5-based vSphere Client
  • Enhanced support for NVIDIA Quadro vDWS powered VMs (vSphere vMotion with NVIDIA Quadro vDWS vGPU powered VMs)
  • Support for Intel FPGA 
  • New vCenter Server Convergence Tool (allows migration from an external PSC architecture into embedded PSC architecture and also combine, merge, or separate vSphere SSO Domains)
  • Enhancements for HCI and vSAN
  • Enhanced vSphere Content Library (import of OVA templates from a HTTPS endpoint and local storage, native support of VM templates)
For more info read this blog post.

VMware vSAN 6.7 Update 1
vSAN 6.7 U1 will be available together with vSphere 6.7 U1. Here are some highlights:
  • Firmware Updates through VUM
  • Cluster Quickstart wizard
  • UNMAP support (capability of unmapping blocks when the Guest OS sends an unmap/trim command)
  • Mixed MTU support for vSAN Stretched Clusters (different MTU for Witness traffic then vSAN traffic)
  • Historical capacity reporting
Amazon Relational Database Service (RDS) on VMware
AWS and VMware Announce Amazon Relational Database Service on VMware. It is a database as a service managed by Amazon. Amazon RDS on VMware will be generally available soon and will support Microsoft SQL Server, Oracle, PostgreSQL, MySQL, and MariaDB databases. Read announcement or register for preview here.

VVols support for SRM is now officially on the roadmap
It is not coming in the latest SRM version but it is officially in the roadmap so VMware announced the commitment to develop it soon. [Source]

VMware vCloud Director 9.5
The new vCloud Director 9.5 enhances easy and intuitive cloud provisioning and consumption by adding highly-requested capabilities including self-service data protection, disaster recovery, and container-orchestration for cloud consumers, along with multi-site management, multi-tenancy and cross-platform networking for Cloud Providers.  6 Key New Innovations in vCloud Director 9.5:
  • Cross-site networking improvements powered by deeper integration with NSX
  • Initial integration with NSX-T
  • Additional integration with NSX including e.g. the possibility to stretch networks across virtual datacenters on different vCenter Servers or vCD instances residing at different sites right from the UI.
  • Cross-platform networking for Cloud Providers. Makes it possible for NSX-T and NSX-V managers in same vCD instance to create isolated logical L2 networks with a directly connected network.
  • Full transition to an HTML5 UI for the cloud consumer
  • Improvements to role-based access control
  • Natively integrated data protection capabilities, powered by Dell-EMC Avamar
  • vCD virtual appliance deployment model
  • Container-orchestration for cloud consumers. Deploy both VMs and containers, consumed via Kubernetes.
  • Data protection capabilities. EMC Avamar is added to the vCD UI to make it easier for end consumers to manage these tasks. This is made possible based on the extensible tools available via vCD which means you (software vendor, cloud provider) can publish services to the vCD UI as needed. Hopefully, other vendors will follow.
For more information read VMware blog.

Introducing VMware Cloud Provider Pod: Custom Designs
VMware announced a new product that will revolutionize the deployment of Cloud Provider environments through the first flexible, validated VMware cloud stack with 1-click deployment: VMware Cloud Provider Pod. Cloud Provider environments are complex to deploy thanks to interoperability, scalability, reliability and performance issues that constantly plague cloud admins and architects. It is a time-consuming process that takes weeks to months. Yes, there are “one-click” deployment products out there, but these are rigid, have stringent hardware compatibility requirements, and end up creating yet another datacenter silo to manage. The Cloud Provider Pod has been designed to deliver three key capabilities:

  • Allows Cloud Providers to design a custom cloud environment of their choice
  • Automates the deployment of the designed cloud environment in adherence with VMware Validated Designs for Cloud Providers
  • Generates customized documentation and guidelines for their environment that radically simplifies operations
For more information read VMware blog.

VMworld US 2018 General Sessions & Breakout Sessions Playback
You can watch general sessions and also technical (breakout) session online.  General sessions on VMworld US 2018 is available at https://www.vmworld.com/en/us/learning/general-sessions.html

A nice summary list of all VMworld US 2018 technical (breakout) session with the respective video playback & download URLs is available at https://github.com/lamw/vmworld2018-session-urls/blob/master/vmworld-us-playback-urls.md

Friday, June 15, 2018

vRealize Orchestrator - useful tips and commands

This week I have worked with one my customer on vRealize Orchestrator (vRO) Proof of Concept. vRealize Orchestrator is a pretty good tool for data center orchestration but it is a very hidden tool and customers usually do not know they are entitled to use such great way how to automate and orchestrate not only their infrastructure but almost anything.

Here are some good vRO resources
And here are some of my technical notes from the PoC.

TCP Ports

  • VRO Server Service is running on port 8281. This is the port where vRO Client has to connect. Direct access is https://[VRO-HOSTNAME]:8281/vco/
  • VRO Server Control is running on port 8283. This is the port where vRO Orchestrator Control Center is running.  Direct access is https://[VRO-HOSTNAME]:8283/vco-controlcenter/
  • VRO VAMI (Virtual Appliance Management Interface) is running on port 5480. 

Network configuration from the console or over ssh session
/opt/vmware/share/vami/vami_config_net

Reset vRO Authentication
/var/lib/vco/tools/configuration-cli/bin/vro-configure.sh reset-authentication

Restart vRO Configurator service
service vco-configurator restart

Restart vRO Server service
service vco-server restart

vRO 7.4 configuration synchronization across cluster nodes
(source)
In vRO 7.4 there are changes in regards to the way the configuration synchronization is done. Configuration changes done through the control center are going to be "permanent" and going to be replicated to the other nodes as well. Changes done manually like through editing a file by a text editor are going to be overwritten with the latest configuration.
Therefore if you make manual changes like editing the js-io-rights file you need to execute the following CLI command to apply the changes to the latest configuration which going to be replicated.
you should do the following
  1. stop the control center
  2. make manual changes
  3. execute /var/lib/vco/tools/configuration-cli/bin/vro-configure.sh sync-local
  4. start control center
Time considerations
Time on vRO and Authentication Provider (vCenter/PSC, vRealize Automation/vIDM) must be in sync otherwise you will see error message like "Server returned 'request expired' less than 0 seconds after request was issued". This issue occurs due to a time skew between vCenter Server and the vSphere Replication Appliance.

You should use NTP or sync time with the host. If you want to synchronize vRO time with host use this command from Guest OS
vmware-toolbox-cmd timesync enable
to disable time sync with host use this command
vmware-toolbox-cmd timesync disable

vRealize Orchestrator 7.x - Unlocking vRO Root Account after too many failed login attempts

When you did too many failed login attempts as root account, your vRO root account will be locked. As SSH does not work, you need console access to the vRO server.

Step 1 - Gain access vRO server root shell via Console

Step 2 - Reboot server

Step 3 - When the GRUB bootloaders appear, press spacebar to disable autoboot.

Step 4 - Select VMware vRealize Orchestrator Appliance and type “e” to edit the boot commands. Then move down to the second line showing kernel boot parameter and type “e” again.

Step 5 - Append the init=/bin/bash to the kernel options.

Step 6 - Hit Enter and the GRUB menu will appear again. This time hit “b” to start the boot process.

Step 7 - Now you should be in the shell - ready to issue commands to unlock or reset the password.

Step 8 - To unlock account use type following command:
# pam_tally2 --user root --reset

Optional Step 9 - If you cannot remember the password change password by using passwd command:
# passwd root 

Optional Step 10 - Disabling the lockout possible can come in handy. To do so modify the /etc/pam.d/common-auth file. Use vi or any preferred editor to modify the common-auth file. Comment out the line where “pam_tally2.so deny=3….” 

Undocumented SDRS Advanced Options

Almost two years ago, I was challenged by one VMware customer who has experienced failed VM's provisioning in case of parallel VM deployments. SDRS default behavior is not optimized for fast multiple parallel deployments because it returns just SDRS recommendations (step 1) and later (step 2) these recommendations are applied by someone else who is executing VM provisioning. Back in the days, when SDRS was designed and developed, it was optimized for few VM provisioning, however nowadays when we are in a cloud era, more parallel provisionings are common.

Disclaimer: Please note that default behavior works best for most of the cases, do use advanced options only when it is required. Below are some of the Storage DRS advanced options user can configure on Storage DRS cluster. These options may or may not have been properly documented from my research/digging and it is most likely not supported by VMware. Please take caution if you decide to play with this advanced settings.

SDRS Advanced settings

VraInitPlacement
The reason behind this option is explained in this video - "VMware vSphere SDRS VM provisioning process". Based on my customer feedback, VMware engineering has introduced 'VraInitialPlacement' - SDRS advanced setting - which reconfigures SDRS to generate only one recommendation and reserve the resource till "lease" expires. Lease time is defined in API call as property resourceLeaseDurationSec. It is by default 0 (no lease) but it can be reconfigured on VRA side. It is used in conjunction with VRA custom property "VirtualMachine.Admin.Datastore.Cluster.ResourceLeaseDurationSec" at VMware VRA Documentation here - https://docs.vmware.com/en/vRealize-Automation/7.4/com.vmware.vra.prepare.use.doc/GUID-FA2ED665-4973-435C-A93B-8E4EAB5D1F8A.html
Custom property description: When provisioning to multiple VMs and using SDRS, specifies a value in seconds, in the range of 30 to 3600, for reserving storage resources during the RecommendDataStore API call. You can add this property to a business group or blueprint or when you request provisioning. The lease lock is only applied to the datastore that is used by the deployment, not all datastores in the storage cluster. The lease lock is released when provisioning either completes or fails. If not specified, no lock is applied to the storage resources at provisioning time.

Long story short, when 'VraInitPlacement' is set to "1", it generates only one recommendation. This is VMware internal undocumented option but in theory, it can be done by vSphere admin. But Lease Time is not configurable by vSphere Admin and it is the parameter (property) during vSphere SOAP API call asking for SDRS recommendations. 

EnforceStorageProfiles
To configure Storage DRS interop with SPBM, below options need to be set.
  • 0 – disabled (default)
  • 1 – soft enforcement
  • 2 – hard enforcement
percentIdleMBinSpaceDemand
The PercentIdleMBinSpaceDemand setting defines the percentage of IdleMB that is added to the allocated space of a VMDK during free space calculation of the datastore. The default value is set to 25%. This value can range from 0 to 100.

For more info look at http://frankdenneman.nl/2012/10/01/avoiding-vmdk-level-over-commitment-while-using-thin-disks-and-storage-drs/

EnforceCorrelationForAffinity
Use datastore correlation while enforcing/fixing anti-affinity rules
  • 0 – disabled (default)
  • 1 – soft enforcement
  • 2 – hard enforcement
OTHER VMWARE VSPHERE ADVANCED SETTINGS
Back in the days, William Lam published the blog post "New vSphere 5 HA, DRS and SDRS Advanced/Hidden Options". It is worth to read it.


Tuesday, May 22, 2018

VMware Response to Speculative Execution security issues, CVE-2018-3639 and CVE-2018-3640

This will be a relatively short blog post. The whole industry is aware of Spectre/Meltdown security vulnerabilities. I wrote recently the blog post "VMware Response to Speculative Execution security issues, CVE-2017-5753, CVE-2017-5715, CVE-2017-5754 (aka Spectre and Meltdown)".

A few days ago NCF announced additional CPU vulnerabilities (CVE-2018-3639 and CVE-2018-3640) and VMware released yesterday the official response in following documents:


What does it mean for IT infrastructure practitioners / VMware vSphere administrators?

Well, actually nothing new. The update process is the same as for previous Spectre/Meltdown remediations. VMware vSphere administrator must apply following update procedure.
  1. Update vCenter to apply patches to EVC. Note: Patches add new CPU features (IBRS, IBPB, STIBP) into existing EVC baselines.
  2. (optional but recommended) Validate that EVC is enabled on vSphere Clusters. Note: Without EVC you can experience vMotion issues of newly Powered On VMs within vSphere Cluster. 
  3. Update the latest BIOS with patched CPU microcode. Note: VMware delivers ESXi patch with updated CPU microcode but CPU microcode from hardware vendor is recommended.
  4. Apply appropriate ESXi security patches
  5. Validate VM hardware is at least in version 9 (PCID enabled) but for better performance VM hardware 11 is recommended because Virtual Hardware Version 11 supports INVPCID. 
  6. Apply all applicable security patches for your Guest OS which have been made available from the OS vendor.
  7. Power Off / Power On VMs (VM restart is not sufficient)
Why a lot of VMware customers still did not apply patches?
  • It has some hardly predictable negative performance impact which is workload specific, therefore application owners have to evaluate the specific impact on their application.
  • IT management is afraid of the unpredictable performance impact, lack of computing resources and tremendous impact on capacity planning. 
  • If VM hardware upgrade is required, a maintenance window with application owners. Note: Virtual hardware upgrade can bring a certain risk because you are actually changing motherboard and chipset.
  • Power Off / Power On VMs is required, therefore maintenance window must be planned by or with application owners.

Conclusion

So even all patches exist and the update process is well known, it is definitely not a simple project, especially in large organizations where collaboration among multiple teams and departments is required.

It is obvious that remediations have some negative performance impact on applications, however, all these remediations can be disabled in operating systems, therefore hardware and vSphere layers can be patched and application owner can decide between security and performance. However, please note that even disabling security remediation has a positive impact on the performance, the final performance can still be worse than the original performance on unpatched systems.

Wednesday, April 18, 2018

What's new in vSphere 6.7

VMware vSphere 6.7 has been released and all famous VMware bloggers released their blog posts about new features and capabilities. It is worth to read all of these blog posts as each blogger is focused on a different area of SDDC so it can give you a broader context to newly available product features and capabilities. Anyway, industry veterans should start reading product Release Notes and official VMware blog posts first.

Please note, that this blog post is just an aggregation of information published in other places. All used sources are listed below.

Release Notes:
vSphere 6.7 Release Notes

VMware KB:
Important information before upgrading to vSphere 6.7 

VMware official blog posts:
Introducing VMware vSphere 6.7!
Introducing vCenter Server 6.7
Introducing Faster Lifecycle Management Operations in VMware vSphere 6.7
Introducing vSphere 6.7 Security
What’s new with vSphere 6.7 Core Storage
vSphere 6.7 Videos

Community blog posts:
Emad Younis : vCenter Server 6.7 What’s New Rundown
Duncan Epping : vSphere 6.7 announced!
Cormac Hogan : What's new in vSphere and vSAN 6.7 release?
Cody Hosterman : What's new in core storage in vSphere 6.7 part I: in-guest unmap and snapshots
Cody Hosterman :  What's new in core storage in vSphere 6.7 part V: Rate control for automatic VMFS unmap
William Lam : All vSphere 6.7 release notes & download links
Florian Greh (Virten) : VMware vSphere 6.7 introduces Skylake EVC Mode
Florian Greh (Virten) : New ESXCLI Commands in vSphere 6.7

So after reading all resources above let's aggregate and document interesting features area by area.

vSphere Management

vCenter with embedded platform services controller in enhanced linked mode. This is nice because you can leverage "vCenter Server High Availability" to achieve higher availability for PSC without the external load balancer. All benefits listed below.
  • No load balancer required for high availability and fully supports native vCenter Server High Availability.
  • SSO Site boundary removal provides flexibility of placement.
  • Supports vSphere scale maximums.
  • Allows for 15 deployments in a vSphere Single Sign-On Domain.
  • Reduces the number of nodes to manage and maintain.

vSphere 6.7 introduces vCenter Server Hybrid Linked Mode, which makes it easy and simple for customers to have unified visibility and manageability across an on-premises vSphere environment running on one version and a vSphere-based public cloud environment, such as VMware Cloud on AWS, running on a different version of vSphere.

vSphere 6.7 also introduces Cross-Cloud Cold and Hot Migration, further enhancing the ease of management across and enabling a seamless and non-disruptive hybrid cloud experience for customers.

vSphere 6.7 enables customers to use different vCenter versions while allowing cross-vCenter, mixed-version provisioning operations (vMotion, Full Clone and cold migrate) to continue seamlessly.

vCenter Server Appliance (VCSA) Syslog now supports up to three syslog forwarding targets.

The HTML5-based vSphere Client provides a modern user interface experience that is both responsive and easy to use and includes 95% of functionality available in Flash Client. Some of the newer workflows in the updated vSphere Client release include:
  • vSphere Update Manager
  • Content Library
  • vSAN
  • Storage Policies
  • Host Profiles
  • vDS Topology Diagram
  • Licensing
PSC/SSO CLI (cmsso-util) has some improvements. Repointing an external vCenter Server Appliance across SSO Sites within a vSphere SSO domain is supported. Repoint of vCenter Server Appliance across vSphere SSO domains is also supported. This is huge! It seems that SSO domain consolidation is now possible. The domain repoint feature only supports external deployments running vSphere 6.7. The repoint tool can migrate licenses, tags, categories, and permissions from one vSphere SSO Domain to another.

Brand-new Update Manager interface that is part of the HTML5 Web Client. The new UI provides a much more streamlined remediation process. 


New vROps plugin for the vSphere Client. This plugin is available out-of-the-box and provides some great new functionality. When interacting with this plugin, you will be greeted with 6 vRealize Operations Manager (vROps) dashboards directly in the vSphere client! 


Compute

vSphere 6.7 delivers a new capability that is key for the hybrid cloud, called Per-VM EVC. Per-VM EVC enables the EVC (Enhanced vMotion Compatibility) mode to become an attribute of the VM rather than the specific processor generation it happens to be booted on in the cluster. This allows for seamless migration across different CPUs by persisting the EVC mode per-VM during migrations across clusters and during power cycles.

A new EVC mode (Intel Skylake Generation) has been introduced.  Compared to Intel "Broadwell " EVC mode, the Skylake EVC mode exposes following additional CPU features:
  • Advanced Vector Extensions 512
  • Persistent Memory Support Instructions
  • Protection Key Rights
  • Save Processor Extended States with Compaction
  • Save Processor Extended States Supervisor

Single Reboot when updating ESXi hosts. It is reducing maintenance time by eliminating one of two reboots normally required for major version upgrades.

vSphere Quick Boot is a new innovation that restarts the ESXi hypervisor without rebooting the physical host, skipping time-consuming hardware initialization (aka POST, Power-On Self Tests).

New ESXCLI Commands. In vSphere 6.7 the command line interface esxcli has been extended with new features. vSphere 6.7 introduced 62 new ESXCLI commands including:
  • 3 Device
  • 6 Hardware
  • 1 iSCSI
  • 14 Network
  • 14 NVMe
  • 2 RDMA
  • 9 Storage
  • 6 System
  • 7 vSAN
for more information look here.

Fault Tolerance maximums increased. Up to 8 Virtual CPUs per virtual machine and up to 128 vRAM per FT VM. For more info look at https://configmax.vmware.com/ (ESXi Host Maximums)

Storage

Support for 4K native HDD. Customers may now deploy ESXi on servers with 4Kn HDDs used for local storage (SSD and NVMe drives are currently not supported). ESXi providing a software read-modify-write layer within the storage stack allowing the emulation of 512B sector drives. ESXi continues to expose 512B sector VMDKs to the guest OS. Servers having UEFI BIOS can boot from 4Kn drives.

XCOPY enhancement. XCOPY is used to offload storage-intensive operations such as copying, cloning, and zeroing to the storage array instead of the ESXi host. With the release of vSphere 6.7, XCOPY will now work with specific vendor VAAI primitives and any vendor supporting the SCSI T10 standard. Additionally, XCOPY segments and transfer sizes are now configurable. By default, the Maximum Transfer Size of an XCOPY ranges between 4MB-16MB. In vSphere 6.7, through the use of PSA claim-rules, this functionality is extended to additional storage arrays. Further details should be documented by particular storage vendor.

Configurable Automatic UNMAP. Automatic UNMAP was released with vSphere 6.5 with a selectable priority of none or low. Storage vendors and customers have requested higher, configurable rates rather than a fixed 25MBps. With vSphere 6.7 we’ve added a new method, “fixed” which allows you to configure an automatic UNMAP rate between 100MBps and 2000MBps, configurable both in the UI and CLI. I recommend reading this blog post for details how it works on Pure Storage.

UNMAP for SESparse. SESparse is a sparse virtual disk format used for snapshots in vSphere as a default for VMFS-6. In this release, automatic space reclamation for VM’s with SESparse snapshots on VMFS-6 is provided. This only works when the VM is powered on and only affect the top-most snapshot.

VVols enhancements. As VMware continues the development of Virtual Volumes, in this release is added support for IPv6 and SCSI-3 persistent reservations. With end-to-end support of IPv6, this enables organizations, including government, to implement VVols using IPv6. With SCSI-3 reservations, this substantial feature allows shared disks/volumes between virtual machines across nodes/hosts. Often used for Microsoft WSFC clusters, with this new enhancement it allows for the removal of RDMs!

Increased maximum number of LUNs/Paths (1K/4K LUN/Path). The maximum number of LUNs per host is now 1024 instead of 512 and the maximum number of paths per host is 4096 instead of 2048. Customers may now deploy virtual machines with up to 256 disks using PVSCSI adapters. Each PVSCSI adapter can support up to 64 devices. Devices can be virtual disks or RDMs. A major change in 6.7 is the increased number of LUNs supported for Microsoft WSFC clusters. The number increased from 15 disks to 64 disks per adapter, PVSCSI only. This changes the number of LUNs available for a VM running MICROSOFT WSFC from 45 to 192 LUNs.

The increased maximums for Virtual SCSI adapter (PVSCSI only). Up to 64 Virtual SCSI Targets Per Virtual SCSI Adapter and up to 256 Virtual SCSI Targets Per Virtual Machine.

VMFS-3 EOL. Starting with vSphere 6.7, VMFS-3 will no longer be supported. Any volume/datastore still using VMFS-3 will automatically be upgraded to VMFS-5 during the installation or upgrade to vSphere 6.7. Any new volume/datastore created going forward will use VMFS-6 as the default.

Support for PMEM /NVDIMMs. Persistent Memory or PMem is a type of non-volatile DRAM (NVDIMM) that has the speed of DRAM but retains contents through power cycles. It’s a new layer that sits between NAND flash and DRAM providing faster performance and it’s non-volatile unlink DRAM.

Intel VMD (Volume Management Device). With vSphere 6.7, there is now native support for Intel VMD technology to enable the management of NMVe drives. This technology was introduced as an installable option in vSphere 6.5. Intel VMD currently enables hot-swap management, as well as NVMe drive, LED control allowing similar control used for SAS and SATA drives.

RDMA (Remote Direct Memory Access) over Converged Ethernet (RoCE). This release introduces RDMA using RoCE v2 support for ESXi hosts. RDMA provides low latency, and higher-throughput interconnects with CPU offloads between the end-points. If a host has RoCE capable network adaptor(s), this feature is automatically enabled.

Para-virtualized RDMA (PV-RDMA). In this release, ESXi introduces the PV-RDMA for Linux guest OS with RoCE v2 support. PV-RDMA enables customers to run RDMA capable applications in the virtualized environments. PV-RDMA enabled VMs can also be live migrated.

iSER (iSCSI Extension for RDMA). Customers may now deploy ESXi with external storage systems supporting iSER targets. iSER takes advantage of faster interconnects and CPU offload using RDMA over Converged Ethernet (RoCE). We are providing iSER initiator function, which allows ESXi storage stack to connect with iSER capable target storage systems.

SW-FCoE (Software Fiber Channel over Ethernet). In this release, ESXi introduces software-based FCoE (SW-FCoE) initiator than can create FCoE connection over Ethernet controllers. The VMware FCoE initiator works on lossless Ethernet fabric using Priority-based Flow Control (PFC). It can work in Fabric and VN2VN modes. Please check VMware Compatibility Guide (VCG) for supported NICs.

Performance

vSphere 6.7 VCSA delivers phenomenal performance improvements (all metrics compared at cluster scale limits, versus vSphere 6.5):
  • 2X faster performance in vCenter operations per second
  • 3X reduction in memory usage
  • 3X faster DRS-related operations (e.g. power-on virtual machine)

Security

vSphere 6.7 adds support for Trusted Platform Module (TPM) 2.0 hardware devices and also introduces Virtual TPM 2.0, significantly enhancing protection and assuring integrity for both the hypervisor and the guest operating system.

vSphere 6.7 introduces support for the entire range of Microsoft’s Virtualization Based Security technologies aka “Credential Guard” support.

Recoverability

vCenter Server Appliance (VCSA) File-Based Backup introduced in vSphere 6.5 now has a scheduler. Now customers can schedule the backups of their vCenter Server Appliances and select how many backups to retain. Another new section for File-Based backup is Activities. Once the backup job is complete it will be logged in the activity section with detailed information. The Restore workflow now includes a backup archive browser. The browser displays all your backups without having to know the entire backup path.

Conclusion

It seems that vSphere 6.7 is the continuous evolution of the best x86 virtualization platform with a lot of interesting improvements, features, and capabilities. Keep in mind, that this is just a list of features and capabilities which have to be very carefully planned, designed and tested before implementation into production.

Just FYI, I did not finish the reading of all vSphere 6.7 documents so I will update this blog post when find something interesting.

Wednesday, April 11, 2018

How to disable Spectre and Meltdown mitigations?

Today, I have been asked again "How to disable Spectre and Meltdown mitigations on VMs running on top of ESXi". Recently I wrote about Spectre and Meltdown mitigations on VMware vSphere virtualized workloads here.

So, let's assume you have already applied patched and updates to ...
  • Guest OS (Windows, Linux, etc.)
  • Hypervisor - ESXi host (VMSA-2018-0004.3 and  VMSA-2018-0002)
  • BIOS (version having support for IBRS, IBPB, STIBP capabilities)
... therefore, you should be protected against Spectre and Meltdown vulnerabilities known as CVE-2017-5753 (Spectre - Variant 1), CVE-2017-5715 (Spectre - Variant 2), and CVE-2017-5754 (Meltdown - Variant 3).

These security mitigations do not come for free. They have a significant impact on performance. I did some testing in my lab and some results were scaring me. The biggest impact is on workloads having system calls (calls from OS userland to the OS kernel) such as memory, network, and storage I/O operations. The performance impact is the reason why some administrators and application owners are willing to disable security mitigation in systems where interprocess communication is trusted and potential data leaks between them is not a problem. 

So, let's answer the question. Spectre and Meltdown mitigations can be disabled on Guest Operating System level. This is the preferred method.

RedHat

You can disable security mitiggations at runtime with the following three commands. The change is immediately active and does not require a reboot.

    # echo 0 > /sys/kernel/debug/x86/pti_enabled
    # echo 0 > /sys/kernel/debug/x86/ibpb_enabled
    # echo 0 > /sys/kernel/debug/x86/ibrs_enabled

this is not persistent 

MS Windows
In Windows operating system you can control it via the registry.

To enable the mitigation you had to change Registry Settings
  • reg add "HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Session Manager\Memory Management" /v FeatureSettingsOverride /t REG_DWORD /d 0 /f
  • reg add "HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Session Manager\Memory Management" /v FeatureSettingsOverrideMask /t REG_DWORD /d 3 /f
  • reg add "HKLM\SOFTWARE\Microsoft\Windows NT\CurrentVersion\Virtualization" /v MinVmVersionForCpuBasedMitigations /t REG_SZ /d "1.0" /f
  • Restart the server for changes to take effect.
and to disable the mitigation you have to change Registry Settings as follows
  • reg add “HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Session Manager\Memory Management” /v FeatureSettingsOverride /t REG_DWORD /d 3 /f
  • Restart the server for changes to take effect.
 Note: After any change, please, test if your system behaves as expected (secure or not secure).

ESXi - not recommended method

Another method is to disable CPU features on ESXi level. This is not recommended by VMware but in VMware KB 52345 it was published recently as a workaround.  

Here is the procedure how to mask CPU capabilities on ESXi level.

Step 1/ Login to each ESXi host via SSH.

Step 2/ Add the following line in the /etc/vmware/config file:
cpuid.7.edx = "----:00--:----:----:----:----:----:----"

Step 3/ run command /sbin/auto-backup.sh to backup config file and keep the configuration change persistent across ESXi reboot

Step 4/ Power-cycle VMs running on top of ESXi host 

This will hide the speculative-execution control mechanism for virtual machines which are power-cycled afterward on the ESXi host.  So, you have to Power-cycle virtual machines on the ESXi host. Rebooting of the ESXi host is not required. The effect is that the speculative execution control mechanism is no longer available to virtual machines even if the server firmware provides the same microcode independently.

Conclusion

It is important to mention that Guest Operating System inside VM may or may not use CPU Capabilities IBRS, IBPB, STIBP provided by CPU microcode to mitigate security issues. As far as I'm aware these instructions are leveraged by Guest OSes just to mitigate only Spectre Variant 2 (CVE-2017-5715). In some cases, Guest OS can use some other mitigation methods even for Spectre Variant 2. For example, linux kernel is currently trying to leverage “Retpoline” code sequences to decrease the performance impact but “Retpoline” is not applicable for all CPU models. So, there is no single recommendation which would fit all situations.

That's the reason why performance tuning by disabling security enhancements should be always done on Guest Operating System level and not on ESXi level. ESXi workaround is just a workaround which can be useful in case some new bug in CPU microcode will be discovered but performance is always handled by Guest OS.

Monday, April 09, 2018

What is vCenter PNID?

Today I have got the question what is PNID in vCenter.

Well, PNID (primary network identifier) is a VMware internal term and it is officially called "system name".

But my question to the questioner was why he needs to know something about PNID. I have got the expected answer. The questioner did some research how to change vCenter IP address and hostname.

So let's discuss these two requests independently.

First thing first, vCenter hostname cannot be changed. At least for vCenter 6.0 and 6.5. It may or may not change in the future.

On the other hand, vCenter IP can be changed.  However, system name (aka PNID) is very important when you are trying to change vCenter IP address. vCenter IP address can be changed only when you have entered FQDN during vCenter installation. In such case, PNID is the hostname. In case, you did not enter FQDN during vCenter installation, the IP address is used as PNID which would end up in the inability to change the vCenter IP Address.

Below is the command how to check in VCSA what is your vCenter PNID.

root@vc01 [ ~ ]# /usr/lib/vmware-vmafd/bin/vmafd-cli get-pnid --server-name localhost
vc01.home.uw.cz

root@vc01 [ ~ ]# 

In this case above, PNID is the hostname (vc01.home.uw.cz) so I would be able to change IP address.