Friday, June 15, 2018

vRealize Orchestrator - useful tips and commands

This week I have worked with one my customer on vRealize Orchestrator (vRO) Proof of Concept. vRealize Orchestrator is a pretty good tool for data center orchestration but it is a very hidden tool and customers usually do not know they are entitled to use such great way how to automate and orchestrate not only their infrastructure but almost anything.

Here are some good vRO resources
And here are some of my technical notes from the PoC.

TCP Ports

  • VRO Server Service is running on port 8281. This is the port where vRO Client has to connect. Direct access is https://[VRO-HOSTNAME]:8281/vco/
  • VRO Server Control is running on port 8283. This is the port where vRO Orchestrator Control Center is running.  Direct access is https://[VRO-HOSTNAME]:8283/vco-controlcenter/
  • VRO VAMI (Virtual Appliance Management Interface) is running on port 5480. 

Network configuration from the console or over ssh session
/opt/vmware/share/vami/vami_config_net

Reset vRO Authentication
/var/lib/vco/tools/configuration-cli/bin/vro-configure.sh reset-authentication

Restart vRO Configurator service
service vco-configurator restart

Restart vRO Server service
service vco-server restart

vRO 7.4 configuration synchronization across cluster nodes
(source)
In vRO 7.4 there are changes in regards to the way the configuration synchronization is done. Configuration changes done through the control center are going to be "permanent" and going to be replicated to the other nodes as well. Changes done manually like through editing a file by a text editor are going to be overwritten with the latest configuration.
Therefore if you make manual changes like editing the js-io-rights file you need to execute the following CLI command to apply the changes to the latest configuration which going to be replicated.
you should do the following
  1. stop the control center
  2. make manual changes
  3. execute /var/lib/vco/tools/configuration-cli/bin/vro-configure.sh sync-local
  4. start control center
Time considerations
Time on vRO and Authentication Provider (vCenter/PSC, vRealize Automation/vIDM) must be in sync otherwise you will see error message like "Server returned 'request expired' less than 0 seconds after request was issued". This issue occurs due to a time skew between vCenter Server and the vSphere Replication Appliance.

You should use NTP or sync time with the host. If you want to synchronize vRO time with host use this command from Guest OS
vmware-toolbox-cmd timesync enable
to disable time sync with host use this command
vmware-toolbox-cmd timesync disable

vRealize Orchestrator 7.x - Unlocking vRO Root Account after too many failed login attempts

When you did too many failed login attempts as root account, your vRO root account will be locked. As SSH does not work, you need console access to the vRO server.

Step 1 - Gain access vRO server root shell via Console

Step 2 - Reboot server

Step 3 - When the GRUB bootloaders appear, press spacebar to disable autoboot.

Step 4 - Select VMware vRealize Orchestrator Appliance and type “e” to edit the boot commands. Then move down to the second line showing kernel boot parameter and type “e” again.

Step 5 - Append the init=/bin/bash to the kernel options.

Step 6 - Hit Enter and the GRUB menu will appear again. This time hit “b” to start the boot process.

Step 7 - Now you should be in the shell - ready to issue commands to unlock or reset the password.

Step 8 - To unlock account use type following command:
# pam_tally2 --user root --reset

Optional Step 9 - If you cannot remember the password change password by using passwd command:
# passwd root 

Optional Step 10 - Disabling the lockout possible can come in handy. To do so modify the /etc/pam.d/common-auth file. Use vi or any preferred editor to modify the common-auth file. Comment out the line where “pam_tally2.so deny=3….” 

Undocumented SDRS Advanced Options

Almost two years ago, I was challenged by one VMware customer who has experienced failed VM's provisioning in case of parallel VM deployments. SDRS default behavior is not optimized for fast multiple parallel deployments because it returns just SDRS recommendations (step 1) and later (step 2) these recommendations are applied by someone else who is executing VM provisioning. Back in the days, when SDRS was designed and developed, it was optimized for few VM provisioning, however nowadays when we are in a cloud era, more parallel provisionings are common.

Disclaimer: Please note that default behavior works best for most of the cases, do use advanced options only when it is required. Below are some of the Storage DRS advanced options user can configure on Storage DRS cluster. These options may or may not have been properly documented from my research/digging and it is most likely not supported by VMware. Please take caution if you decide to play with this advanced settings.

SDRS Advanced settings

VraInitPlacement
The reason behind this option is explained in this video - "VMware vSphere SDRS VM provisioning process". Based on my customer feedback, VMware engineering has introduced 'VraInitialPlacement' - SDRS advanced setting - which reconfigures SDRS to generate only one recommendation and reserve the resource till "lease" expires. Lease time is defined in API call as property resourceLeaseDurationSec. It is by default 0 (no lease) but it can be reconfigured on VRA side. It is used in conjunction with VRA custom property "VirtualMachine.Admin.Datastore.Cluster.ResourceLeaseDurationSec" at VMware VRA Documentation here - https://docs.vmware.com/en/vRealize-Automation/7.4/com.vmware.vra.prepare.use.doc/GUID-FA2ED665-4973-435C-A93B-8E4EAB5D1F8A.html
Custom property description: When provisioning to multiple VMs and using SDRS, specifies a value in seconds, in the range of 30 to 3600, for reserving storage resources during the RecommendDataStore API call. You can add this property to a business group or blueprint or when you request provisioning. The lease lock is only applied to the datastore that is used by the deployment, not all datastores in the storage cluster. The lease lock is released when provisioning either completes or fails. If not specified, no lock is applied to the storage resources at provisioning time.

Long story short, when 'VraInitPlacement' is set to "1", it generates only one recommendation. This is VMware internal undocumented option but in theory, it can be done by vSphere admin. But Lease Time is not configurable by vSphere Admin and it is the parameter (property) during vSphere SOAP API call asking for SDRS recommendations. 

EnforceStorageProfiles
To configure Storage DRS interop with SPBM, below options need to be set.
  • 0 – disabled (default)
  • 1 – soft enforcement
  • 2 – hard enforcement
percentIdleMBinSpaceDemand
The PercentIdleMBinSpaceDemand setting defines the percentage of IdleMB that is added to the allocated space of a VMDK during free space calculation of the datastore. The default value is set to 25%. This value can range from 0 to 100.

For more info look at http://frankdenneman.nl/2012/10/01/avoiding-vmdk-level-over-commitment-while-using-thin-disks-and-storage-drs/

EnforceCorrelationForAffinity
Use datastore correlation while enforcing/fixing anti-affinity rules
  • 0 – disabled (default)
  • 1 – soft enforcement
  • 2 – hard enforcement
OTHER VMWARE VSPHERE ADVANCED SETTINGS
Back in the days, William Lam published the blog post "New vSphere 5 HA, DRS and SDRS Advanced/Hidden Options". It is worth to read it.


Tuesday, May 22, 2018

VMware Response to Speculative Execution security issues, CVE-2018-3639 and CVE-2018-3640

This will be a relatively short blog post. The whole industry is aware of Spectre/Meltdown security vulnerabilities. I wrote recently the blog post "VMware Response to Speculative Execution security issues, CVE-2017-5753, CVE-2017-5715, CVE-2017-5754 (aka Spectre and Meltdown)".

A few days ago NCF announced additional CPU vulnerabilities (CVE-2018-3639 and CVE-2018-3640) and VMware released yesterday the official response in following documents:


What does it mean for IT infrastructure practitioners / VMware vSphere administrators?

Well, actually nothing new. The update process is the same as for previous Spectre/Meltdown remediations. VMware vSphere administrator must apply following update procedure.
  1. Update vCenter to apply patches to EVC. Note: Patches add new CPU features (IBRS, IBPB, STIBP) into existing EVC baselines.
  2. (optional but recommended) Validate that EVC is enabled on vSphere Clusters. Note: Without EVC you can experience vMotion issues of newly Powered On VMs within vSphere Cluster. 
  3. Update the latest BIOS with patched CPU microcode. Note: VMware delivers ESXi patch with updated CPU microcode but CPU microcode from hardware vendor is recommended.
  4. Apply appropriate ESXi security patches
  5. Validate VM hardware is at least in version 9 (PCID enabled) but for better performance VM hardware 11 is recommended because Virtual Hardware Version 11 supports INVPCID. 
  6. Apply all applicable security patches for your Guest OS which have been made available from the OS vendor.
  7. Power Off / Power On VMs (VM restart is not sufficient)
Why a lot of VMware customers still did not apply patches?
  • It has some hardly predictable negative performance impact which is workload specific, therefore application owners have to evaluate the specific impact on their application.
  • IT management is afraid of the unpredictable performance impact, lack of computing resources and tremendous impact on capacity planning. 
  • If VM hardware upgrade is required, a maintenance window with application owners. Note: Virtual hardware upgrade can bring a certain risk because you are actually changing motherboard and chipset.
  • Power Off / Power On VMs is required, therefore maintenance window must be planned by or with application owners.

Conclusion

So even all patches exist and the update process is well known, it is definitely not a simple project, especially in large organizations where collaboration among multiple teams and departments is required.

It is obvious that remediations have some negative performance impact on applications, however, all these remediations can be disabled in operating systems, therefore hardware and vSphere layers can be patched and application owner can decide between security and performance. However, please note that even disabling security remediation has a positive impact on the performance, the final performance can still be worse than the original performance on unpatched systems.

Wednesday, April 18, 2018

What's new in vSphere 6.7

VMware vSphere 6.7 has been released and all famous VMware bloggers released their blog posts about new features and capabilities. It is worth to read all of these blog posts as each blogger is focused on a different area of SDDC so it can give you a broader context to newly available product features and capabilities. Anyway, industry veterans should start reading product Release Notes and official VMware blog posts first.

Please note, that this blog post is just an aggregation of information published in other places. All used sources are listed below.

Release Notes:
vSphere 6.7 Release Notes

VMware KB:
Important information before upgrading to vSphere 6.7 

VMware official blog posts:
Introducing VMware vSphere 6.7!
Introducing vCenter Server 6.7
Introducing Faster Lifecycle Management Operations in VMware vSphere 6.7
Introducing vSphere 6.7 Security
What’s new with vSphere 6.7 Core Storage
vSphere 6.7 Videos

Community blog posts:
Emad Younis : vCenter Server 6.7 What’s New Rundown
Duncan Epping : vSphere 6.7 announced!
Cormac Hogan : What's new in vSphere and vSAN 6.7 release?
Cody Hosterman : What's new in core storage in vSphere 6.7 part I: in-guest unmap and snapshots
Cody Hosterman :  What's new in core storage in vSphere 6.7 part V: Rate control for automatic VMFS unmap
William Lam : All vSphere 6.7 release notes & download links
Florian Greh (Virten) : VMware vSphere 6.7 introduces Skylake EVC Mode
Florian Greh (Virten) : New ESXCLI Commands in vSphere 6.7

So after reading all resources above let's aggregate and document interesting features area by area.

vSphere Management

vCenter with embedded platform services controller in enhanced linked mode. This is nice because you can leverage "vCenter Server High Availability" to achieve higher availability for PSC without the external load balancer. All benefits listed below.
  • No load balancer required for high availability and fully supports native vCenter Server High Availability.
  • SSO Site boundary removal provides flexibility of placement.
  • Supports vSphere scale maximums.
  • Allows for 15 deployments in a vSphere Single Sign-On Domain.
  • Reduces the number of nodes to manage and maintain.

vSphere 6.7 introduces vCenter Server Hybrid Linked Mode, which makes it easy and simple for customers to have unified visibility and manageability across an on-premises vSphere environment running on one version and a vSphere-based public cloud environment, such as VMware Cloud on AWS, running on a different version of vSphere.

vSphere 6.7 also introduces Cross-Cloud Cold and Hot Migration, further enhancing the ease of management across and enabling a seamless and non-disruptive hybrid cloud experience for customers.

vSphere 6.7 enables customers to use different vCenter versions while allowing cross-vCenter, mixed-version provisioning operations (vMotion, Full Clone and cold migrate) to continue seamlessly.

vCenter Server Appliance (VCSA) Syslog now supports up to three syslog forwarding targets.

The HTML5-based vSphere Client provides a modern user interface experience that is both responsive and easy to use and includes 95% of functionality available in Flash Client. Some of the newer workflows in the updated vSphere Client release include:
  • vSphere Update Manager
  • Content Library
  • vSAN
  • Storage Policies
  • Host Profiles
  • vDS Topology Diagram
  • Licensing
PSC/SSO CLI (cmsso-util) has some improvements. Repointing an external vCenter Server Appliance across SSO Sites within a vSphere SSO domain is supported. Repoint of vCenter Server Appliance across vSphere SSO domains is also supported. This is huge! It seems that SSO domain consolidation is now possible. The domain repoint feature only supports external deployments running vSphere 6.7. The repoint tool can migrate licenses, tags, categories, and permissions from one vSphere SSO Domain to another.

Brand-new Update Manager interface that is part of the HTML5 Web Client. The new UI provides a much more streamlined remediation process. 


New vROps plugin for the vSphere Client. This plugin is available out-of-the-box and provides some great new functionality. When interacting with this plugin, you will be greeted with 6 vRealize Operations Manager (vROps) dashboards directly in the vSphere client! 


Compute

vSphere 6.7 delivers a new capability that is key for the hybrid cloud, called Per-VM EVC. Per-VM EVC enables the EVC (Enhanced vMotion Compatibility) mode to become an attribute of the VM rather than the specific processor generation it happens to be booted on in the cluster. This allows for seamless migration across different CPUs by persisting the EVC mode per-VM during migrations across clusters and during power cycles.

A new EVC mode (Intel Skylake Generation) has been introduced.  Compared to Intel "Broadwell " EVC mode, the Skylake EVC mode exposes following additional CPU features:
  • Advanced Vector Extensions 512
  • Persistent Memory Support Instructions
  • Protection Key Rights
  • Save Processor Extended States with Compaction
  • Save Processor Extended States Supervisor

Single Reboot when updating ESXi hosts. It is reducing maintenance time by eliminating one of two reboots normally required for major version upgrades.

vSphere Quick Boot is a new innovation that restarts the ESXi hypervisor without rebooting the physical host, skipping time-consuming hardware initialization (aka POST, Power-On Self Tests).

New ESXCLI Commands. In vSphere 6.7 the command line interface esxcli has been extended with new features. vSphere 6.7 introduced 62 new ESXCLI commands including:
  • 3 Device
  • 6 Hardware
  • 1 iSCSI
  • 14 Network
  • 14 NVMe
  • 2 RDMA
  • 9 Storage
  • 6 System
  • 7 vSAN
for more information look here.

Fault Tolerance maximums increased. Up to 8 Virtual CPUs per virtual machine and up to 128 vRAM per FT VM. For more info look at https://configmax.vmware.com/ (ESXi Host Maximums)

Storage

Support for 4K native HDD. Customers may now deploy ESXi on servers with 4Kn HDDs used for local storage (SSD and NVMe drives are currently not supported). ESXi providing a software read-modify-write layer within the storage stack allowing the emulation of 512B sector drives. ESXi continues to expose 512B sector VMDKs to the guest OS. Servers having UEFI BIOS can boot from 4Kn drives.

XCOPY enhancement. XCOPY is used to offload storage-intensive operations such as copying, cloning, and zeroing to the storage array instead of the ESXi host. With the release of vSphere 6.7, XCOPY will now work with specific vendor VAAI primitives and any vendor supporting the SCSI T10 standard. Additionally, XCOPY segments and transfer sizes are now configurable. By default, the Maximum Transfer Size of an XCOPY ranges between 4MB-16MB. In vSphere 6.7, through the use of PSA claim-rules, this functionality is extended to additional storage arrays. Further details should be documented by particular storage vendor.

Configurable Automatic UNMAP. Automatic UNMAP was released with vSphere 6.5 with a selectable priority of none or low. Storage vendors and customers have requested higher, configurable rates rather than a fixed 25MBps. With vSphere 6.7 we’ve added a new method, “fixed” which allows you to configure an automatic UNMAP rate between 100MBps and 2000MBps, configurable both in the UI and CLI. I recommend reading this blog post for details how it works on Pure Storage.

UNMAP for SESparse. SESparse is a sparse virtual disk format used for snapshots in vSphere as a default for VMFS-6. In this release, automatic space reclamation for VM’s with SESparse snapshots on VMFS-6 is provided. This only works when the VM is powered on and only affect the top-most snapshot.

VVols enhancements. As VMware continues the development of Virtual Volumes, in this release is added support for IPv6 and SCSI-3 persistent reservations. With end-to-end support of IPv6, this enables organizations, including government, to implement VVols using IPv6. With SCSI-3 reservations, this substantial feature allows shared disks/volumes between virtual machines across nodes/hosts. Often used for Microsoft WSFC clusters, with this new enhancement it allows for the removal of RDMs!

Increased maximum number of LUNs/Paths (1K/4K LUN/Path). The maximum number of LUNs per host is now 1024 instead of 512 and the maximum number of paths per host is 4096 instead of 2048. Customers may now deploy virtual machines with up to 256 disks using PVSCSI adapters. Each PVSCSI adapter can support up to 64 devices. Devices can be virtual disks or RDMs. A major change in 6.7 is the increased number of LUNs supported for Microsoft WSFC clusters. The number increased from 15 disks to 64 disks per adapter, PVSCSI only. This changes the number of LUNs available for a VM running MICROSOFT WSFC from 45 to 192 LUNs.

The increased maximums for Virtual SCSI adapter (PVSCSI only). Up to 64 Virtual SCSI Targets Per Virtual SCSI Adapter and up to 256 Virtual SCSI Targets Per Virtual Machine.

VMFS-3 EOL. Starting with vSphere 6.7, VMFS-3 will no longer be supported. Any volume/datastore still using VMFS-3 will automatically be upgraded to VMFS-5 during the installation or upgrade to vSphere 6.7. Any new volume/datastore created going forward will use VMFS-6 as the default.

Support for PMEM /NVDIMMs. Persistent Memory or PMem is a type of non-volatile DRAM (NVDIMM) that has the speed of DRAM but retains contents through power cycles. It’s a new layer that sits between NAND flash and DRAM providing faster performance and it’s non-volatile unlink DRAM.

Intel VMD (Volume Management Device). With vSphere 6.7, there is now native support for Intel VMD technology to enable the management of NMVe drives. This technology was introduced as an installable option in vSphere 6.5. Intel VMD currently enables hot-swap management, as well as NVMe drive, LED control allowing similar control used for SAS and SATA drives.

RDMA (Remote Direct Memory Access) over Converged Ethernet (RoCE). This release introduces RDMA using RoCE v2 support for ESXi hosts. RDMA provides low latency, and higher-throughput interconnects with CPU offloads between the end-points. If a host has RoCE capable network adaptor(s), this feature is automatically enabled.

Para-virtualized RDMA (PV-RDMA). In this release, ESXi introduces the PV-RDMA for Linux guest OS with RoCE v2 support. PV-RDMA enables customers to run RDMA capable applications in the virtualized environments. PV-RDMA enabled VMs can also be live migrated.

iSER (iSCSI Extension for RDMA). Customers may now deploy ESXi with external storage systems supporting iSER targets. iSER takes advantage of faster interconnects and CPU offload using RDMA over Converged Ethernet (RoCE). We are providing iSER initiator function, which allows ESXi storage stack to connect with iSER capable target storage systems.

SW-FCoE (Software Fiber Channel over Ethernet). In this release, ESXi introduces software-based FCoE (SW-FCoE) initiator than can create FCoE connection over Ethernet controllers. The VMware FCoE initiator works on lossless Ethernet fabric using Priority-based Flow Control (PFC). It can work in Fabric and VN2VN modes. Please check VMware Compatibility Guide (VCG) for supported NICs.

Performance

vSphere 6.7 VCSA delivers phenomenal performance improvements (all metrics compared at cluster scale limits, versus vSphere 6.5):
  • 2X faster performance in vCenter operations per second
  • 3X reduction in memory usage
  • 3X faster DRS-related operations (e.g. power-on virtual machine)

Security

vSphere 6.7 adds support for Trusted Platform Module (TPM) 2.0 hardware devices and also introduces Virtual TPM 2.0, significantly enhancing protection and assuring integrity for both the hypervisor and the guest operating system.

vSphere 6.7 introduces support for the entire range of Microsoft’s Virtualization Based Security technologies aka “Credential Guard” support.

Recoverability

vCenter Server Appliance (VCSA) File-Based Backup introduced in vSphere 6.5 now has a scheduler. Now customers can schedule the backups of their vCenter Server Appliances and select how many backups to retain. Another new section for File-Based backup is Activities. Once the backup job is complete it will be logged in the activity section with detailed information. The Restore workflow now includes a backup archive browser. The browser displays all your backups without having to know the entire backup path.

Conclusion

It seems that vSphere 6.7 is the continuous evolution of the best x86 virtualization platform with a lot of interesting improvements, features, and capabilities. Keep in mind, that this is just a list of features and capabilities which have to be very carefully planned, designed and tested before implementation into production.

Just FYI, I did not finish the reading of all vSphere 6.7 documents so I will update this blog post when find something interesting.

Wednesday, April 11, 2018

How to disable Spectre and Meltdown mitigations?

Today, I have been asked again "How to disable Spectre and Meltdown mitigations on VMs running on top of ESXi". Recently I wrote about Spectre and Meltdown mitigations on VMware vSphere virtualized workloads here.

So, let's assume you have already applied patched and updates to ...
  • Guest OS (Windows, Linux, etc.)
  • Hypervisor - ESXi host (VMSA-2018-0004.3 and  VMSA-2018-0002)
  • BIOS (version having support for IBRS, IBPB, STIBP capabilities)
... therefore, you should be protected against Spectre and Meltdown vulnerabilities known as CVE-2017-5753 (Spectre - Variant 1), CVE-2017-5715 (Spectre - Variant 2), and CVE-2017-5754 (Meltdown - Variant 3).

These security mitigations do not come for free. They have a significant impact on performance. I did some testing in my lab and some results were scaring me. The biggest impact is on workloads having system calls (calls from OS userland to the OS kernel) such as memory, network, and storage I/O operations. The performance impact is the reason why some administrators and application owners are willing to disable security mitigation in systems where interprocess communication is trusted and potential data leaks between them is not a problem. 

So, let's answer the question. Spectre and Meltdown mitigations can be disabled on Guest Operating System level. This is the preferred method.

RedHat

You can disable security mitiggations at runtime with the following three commands. The change is immediately active and does not require a reboot.

    # echo 0 > /sys/kernel/debug/x86/pti_enabled
    # echo 0 > /sys/kernel/debug/x86/ibpb_enabled
    # echo 0 > /sys/kernel/debug/x86/ibrs_enabled

this is not persistent 

MS Windows
In Windows operating system you can control it via the registry.

To enable the mitigation you had to change Registry Settings
  • reg add "HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Session Manager\Memory Management" /v FeatureSettingsOverride /t REG_DWORD /d 0 /f
  • reg add "HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Session Manager\Memory Management" /v FeatureSettingsOverrideMask /t REG_DWORD /d 3 /f
  • reg add "HKLM\SOFTWARE\Microsoft\Windows NT\CurrentVersion\Virtualization" /v MinVmVersionForCpuBasedMitigations /t REG_SZ /d "1.0" /f
  • Restart the server for changes to take effect.
and to disable the mitigation you have to change Registry Settings as follows
  • reg add “HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Session Manager\Memory Management” /v FeatureSettingsOverride /t REG_DWORD /d 3 /f
  • Restart the server for changes to take effect.
 Note: After any change, please, test if your system behaves as expected (secure or not secure).

ESXi - not recommended method

Another method is to disable CPU features on ESXi level. This is not recommended by VMware but in VMware KB 52345 it was published recently as a workaround.  

Here is the procedure how to mask CPU capabilities on ESXi level.

Step 1/ Login to each ESXi host via SSH.

Step 2/ Add the following line in the /etc/vmware/config file:
cpuid.7.edx = "----:00--:----:----:----:----:----:----"

Step 3/ run command /sbin/auto-backup.sh to backup config file and keep the configuration change persistent across ESXi reboot

Step 4/ Power-cycle VMs running on top of ESXi host 

This will hide the speculative-execution control mechanism for virtual machines which are power-cycled afterward on the ESXi host.  So, you have to Power-cycle virtual machines on the ESXi host. Rebooting of the ESXi host is not required. The effect is that the speculative execution control mechanism is no longer available to virtual machines even if the server firmware provides the same microcode independently.

Conclusion

It is important to mention that Guest Operating System inside VM may or may not use CPU Capabilities IBRS, IBPB, STIBP provided by CPU microcode to mitigate security issues. As far as I'm aware these instructions are leveraged by Guest OSes just to mitigate only Spectre Variant 2 (CVE-2017-5715). In some cases, Guest OS can use some other mitigation methods even for Spectre Variant 2. For example, linux kernel is currently trying to leverage “Retpoline” code sequences to decrease the performance impact but “Retpoline” is not applicable for all CPU models. So, there is no single recommendation which would fit all situations.

That's the reason why performance tuning by disabling security enhancements should be always done on Guest Operating System level and not on ESXi level. ESXi workaround is just a workaround which can be useful in case some new bug in CPU microcode will be discovered but performance is always handled by Guest OS.

Monday, April 09, 2018

What is vCenter PNID?

Today I have got the question what is PNID in vCenter.

Well, PNID (primary network identifier) is a VMware internal term and it is officially called "system name".

But my question to the questioner was why he needs to know something about PNID. I have got the expected answer. The questioner did some research how to change vCenter IP address and hostname.

So let's discuss these two requests independently.

First thing first, vCenter hostname cannot be changed. At least for vCenter 6.0 and 6.5. It may or may not change in the future.

On the other hand, vCenter IP can be changed.  However, system name (aka PNID) is very important when you are trying to change vCenter IP address. vCenter IP address can be changed only when you have entered FQDN during vCenter installation. In such case, PNID is the hostname. In case, you did not enter FQDN during vCenter installation, the IP address is used as PNID which would end up in the inability to change the vCenter IP Address.

Below is the command how to check in VCSA what is your vCenter PNID.

root@vc01 [ ~ ]# /usr/lib/vmware-vmafd/bin/vmafd-cli get-pnid --server-name localhost
vc01.home.uw.cz

root@vc01 [ ~ ]# 

In this case above, PNID is the hostname (vc01.home.uw.cz) so I would be able to change IP address.

Sunday, April 08, 2018

Blog.iGICS.com moved to www.vcdx200.com

This Friday, I have got a very nice e-mail from one my long-time reader. I'm not going to publish the e-mail but the reader has mentioned that he was shocked and panicked when he realized that blog.iGICS.com does not exist. Fortunately, this particular reader found the new address of my blog which is www.VCDX200.com. The e-mail forced me to stop for a moment and think about my blogging history.

I have started blogging back in June 2006 on the address http://davidpasek.blogspot.com.

By the way, this address is still available because I still use Google's blogging platform which is free of charge and good enough. My first blog post was very short and simple. It was written in my native (Czech) language and the translation is following ...
May 2006: I have been reading for a long time in various magazines and websites that write blogs is in and trendy. It is said to write blogs! It is quite easy to start, but to write something on a regular basis, and even to have a head and heel, it will not be that simple. And what do I want to write about at all? It will, as it were, mainly IT, which has been for years, and I could say even a decade of my hobby and passion. So let's see how this blog eventually turns out.
Only first four blog posts were written in my native language because I have very soon realized that blog about Information Technologies must be written in the English language to address global IT community. My English language is far from perfect, but it is the only global language, therefore there is no other choice.

Since the beginning, I was thinking about the blog name. The first blog name iGICS was chosen back in times when I worked for Dell GICS (Global Infrastructure Consulting Services) and because domain GICS.com was not available I registered domain igics.com and moved the blog to address
http://blog.iGICS.com

Over the time I thought that domain igics.com is not easy to remember for my readers and because I moved from DELL to VMware and currently blogging mainly about Vmware topics I have decided to change the internet domain to VCDX200.com as all Vmware enthusiasts hopefully know what VCDX is so it is easier to remember.

So this blog has become available at
http://www.vcdx200.com 

For some time there was an automated redirect from *.igics.com to www.vcdx200.com but I have recently decided to not pay igics.com domain anymore. However, all content is preserved and available in the new domain.

After reading the e-mail from one my reader and double checked the blog view statistics I have realized that some my long-time readers probably lost the access to the content I have produced over the years. Therefore I have decided to continue paying the domain iGICS.com and keep the redirect to www.vcdx200.com

It is fair to say, that I'm blogging mainly for my self to document the knowledge which I would like to preserve and share with my customers who are paying my bills. The public content sharing is just a side effect :-) but it is nice to contribute to some communities so this is what I do here as well.

This weekend I stopped for a minute (actually two or three hours) and did some statistics. Over times I published 567 posts but to be honest, first few years I was doing "microblogging" so publishing interesting links to other resources on the internet just to quickly find it when necessary. Sometime in 2013, I have started publishing bigger posts which can be qualified as full articles.

To name some blog posts which are, IMHO, full articles with some value see the list below

2013 - High latency on vSphere datastore backed by NFS (9,069 views)
2013 - Two (2) or four (4) socket servers for vSphere infrastructure? (1,949 views)
2014 - Disk queue depth in an ESXi environment (5,660 views)
2014 - vSphere HA Cluster Redundancy (271 views) + 2018 - Admission Control "Dedicated fail-over hosts" (631 views)
2014 ... 2016 - DELL Force10 : Series (more than 27,000 views)
2015 - End to End QoS solution for Vmware vSphere with NSX on top of Cisco UCS (2,959 views)
2015 - VMware Tools 10 and "shared productLocker" (1,380 views)
2016 - PowerCLI script to report VMtools version(s) (2,415 views)
2016 - Leveraging VMware LogInsight for VM hardware inventory (1,796 views)
2016 - ESXi host vCPU/pCPU reporting via PowerCLI to LogInsight (2,594 views)
2017 - Back to the basics - VMware vSphere networking (3,260 views)
2017 - vSphere Switch Independent Teaming or LACP? (950 views)
2018 - No Storage, No vSphere, No Datacenter (1,506 views)
2018 - Storage QoS with vSphere virtual disk IOPS limits (775 views)
2018 - vSphere 6.5 - DRS CPU Over-Commitment (990 views)
2018 - VMware Response to Speculative Execution security issues, CVE-2017-5753, CVE-2017-5715, CVE-2017-5754 (aka Spectre and Meltdown) (2,586 views)

Anyway, the most popular (visited) posts in the history are

2014 - How to clear all jobs on DELL Lifecycle Controller via iDRAC (10,422 views)
2013 - High latency on vSphere datastore backed by NFS (9,069 views)
2013 - Calculating optimal segment size and stripe size for storage LUN backing vSphere VMFS Datastore (8,981 views)
2014 - DELL Force10 : VLT - Virtual Link Trunking (8,974 views)
2014 - Heads Up! VMware virtual disk IOPS limit bad behavior in VMware ESX 5.5 (5,114 views)
2017 - How to install VMware tools on FreeBSD server - (4,828 views)

The majority of my readers are from the United States. I personally think that US is the leader in Information Technologies so I'm proud my articles are interesting for US IT community. Almost, 10 times fewer readers are from Russia, Germany, Israel, and France which are top IT technology countries in EMEA region.

Pageviews by Countries
So Michael (he is the US reader who has sent me the e-mail), thank you again for your nice e-mail and hope not only you but also others will appreciate that the address blog.iGICS.com is back. Paying $10 per year for another internet domain is probably not ruined me :-) so I hope it helps at least a little bit to global IT community.

Cheers.

Friday, March 23, 2018

Deploying vCenter High Availability with network addresses in separate subnets

VMware vCenter High Availability is a very interesting feature included in vSphere 6.5. Generally, it provides higher availability of vCenter service by having three vCenter nodes (active/passive/witness) all serving the single vCenter service.

This is written in the official vCenter HA documentation
vCenter High Availability (vCenter HA) protects vCenter Server Appliance against host and hardware failures. The active-passive architecture of the solution can also help you reduce downtime significantly when you patch vCenter Server Appliance.
After some network configuration, you create a three-node cluster that contains Active, Passive, and Witness nodes. Different configuration paths are available.   
The last sentence is very true. The simplest VCHA deployment is within the same SSO domain and within the single datacenter with two Layer2 networks, one for management and second for the heartbeat. Such design can be deployed in fully automated manner and you just need to provide dedicated network (portgroup/VLAN) for the heartbeat network and use 3 IP addresses from separated heartbeat subnet. Easy. But is it what you are expecting from vCenter HA? To be honest, the much more attractive use case is to spread vCenter HA nodes across three datacenters to keep vSphere management up and running even one of two datacenters experiences some issue. Conceptually it is depicted in the figure below.

Conceptual vCenter HA Design
In this particular concept, I have embedded PSC controllers because of simplicity and vCenter HA can increase availability even of PSC services. The most interesting challenge in this concept is networking so let's look into the intended network logical design.

vCenter HA - networking logical design
Networking logical design:

  • Each vCenter Server Appliance node has two NICs
  • One NIC is connected to management network and second NIC to heartbeat network
  • Layer 2 Management network (VLAN 4) is stretched across datacenters A and B because vCenter IP address must work without human intervention in datacenter B after VCHA fail-over.
  • In each datacenter we have independent heartbeat network (VCHA-HB-A, VCHA-HB-B, VCHA-HB-C) with different IP subnets to not stretch Layer 2 across datacenters, especially not to datacenter C where is the witness. This requires specific static routes in each vCenter Server Appliance node to have IP reachability over heartbeat network.
  • Specific VCHA network tcp/udp ports must be allowed among VCHA nodes across heartbeat network.
Helpful documents:

Implementation Notes: 

Note 1:
VMware KB 2148442 (Deploying vCenter High Availability with network addresses in separate subnets) is very important to deploy such design but one information is missing there. After cloning of vCenter Server Appliances, you have to go to passive node and configure on eth0 the same IP address you use in active node. Configuration is in file  /etc/systemd/network/10-eth0.network.manual
    Note 2:
    In case of badly destroyed VCHA cluster use following commands to destroy VCHA from the command line
    cd /etc/systemd/network 
    mv 10-eth0.network.manual 20-eth0.networkdestroy-vchareboot
      The solution was found at https://communities.vmware.com/thread/552084
        Link to the official documentation (Resolving Failover Failures) - https://docs.vmware.com/en/VMware-vSphere/6.5/com.vmware.vsphere.avail.doc/GUID-FE5106A8-5FE7-4C38-91AA-D7140944002D.html