Saturday, September 01, 2018

VMworld US 2018 - VIN2416BU - Core Storage Best Practices

As in previous years, William Lam ( has published URLs to VMworld US 2018 Breakout Sessions. William wrote the blog post about it and created GitHub repo vmworld2018-session-urls available at Direct link to US sessions is here

I'm going to watch sessions from areas of my interest and write my thoughts and interesting findings in my blog. So stay tuned and come back to read future posts, if interested. Let's start with VMworld 2018 session VIN2416BU - Core Storage Best Practices Speakers: Jason Massae, Cody Hosterman.  This technical session is about vSphere core storage topics. In the beginning, Jason shared with the audience GSS top storage issues and customers challenges. These are PSA, iSCSI, VMFS, NFS, VVols, Trim/Unmap, Queueing, Troubleshooting. General recommendation by Jason and Cody is to validate any vSphere storage change and adjustment with particular storage vendor and VMware. Customers should change advanced settings only when recommended by storage vendor or VMware GSS.

Cody follows with the basic explanation of how SATP and PSP works. Then he explains why some storage vendors recommend adjusting the default Round Robin I/Os quantity per single path until switching to the next path. The reason is not the performance but faster failover in case of some storage paths issues. If you want to know how to change such setting, read VMware KB 2069356.

Jason continues with the iSCSI topic which is, based on VMware GSS, the #1 problem. The first recommendation is to not expose LUNs used for the virtual environment to some other external systems or functions. The only exception might be RDM LUNs used for OS level clustering, but this is the special use case. Another topic is the teaming and port binding. Some kind of the teaming is highly recommended. The iSCSI port binding is preferred. The port binding will give you load balancing and fail-over not only on link failure but also on SCSI sense code. This will help you in situations when the network path is OK but the target LUN is not available for whatever reasons. I wrote the blog post about this topic here. It is about the advanced option enable_action_OnRetryErrors and as far as I know, it was not enabled by default in vSphere 6.5 but probably is in 6.7. I did not test it so it should be validated before implemented into production. Jason explained, that in vSphere 6.0 and below, port binding did not allow you to do network L3 routing, therefore NIC teaming was the only way to go in environments where initiators and targets (iSCSI portals) were in different IP subnets. However, since vSphere 6.5, iSCSI can leverage dedicated TCP/IP stack, therefore default gateway can be specified for VMkernel port used for iSCSI. So from vSphere 6.5 port-binding is highly recommended over NIC teaming. After, Jason explains the difference among software iSCSI adapter, dependent iSCSI adapters (iSCSI offloaded to the hardware), iSER (iSCSI over RDMA).

The mic is handed over to Cody, and Cody is sharing with the audience the information about the new Round Robin enhanced policy, available in vSphere 6.7 U1 which considers I/O latency for path selection. For more info read my blog post here.

Cody moves to VMFS topic. VMFS 6 has a lot of enhancements. You cannot upgrade VMFS 5 to VMFS 6, therefore you have to create new datastore, format it to VMFS 6 and use storage vMotion to migrate VMs from VMFS 5. Cody is trying to answer typical customers questions. How big datastores I should do? How many VMs I should accommodate in a single datastore? Do not expect the exact answer. The answer is, it depends on your storage architecture, performance, and capabilities. Everything begins with the question if your array supports VAAI (VMware API for Array Integration) but there are other questions you have to answer your self, like what granularity of recoverability do you expect, etc. Cody joked a little bit and at the beginning of VMFS section of the presentation, he shared his opinion, the best VMFS datastore is VVols datastore :-)

Jason continued with NFS best practices. The interesting one was about the usage of Jumbo Frames. Jason's Jumbo Frames recommendation is to use it only when it is already configured in your network. In other words, it is not worth to enable it and believe you will get significantly better performance.

Another topic is VVols. Cody starts with general explanation what VVols are and are NOT. Cody highlight that with VVols you can finally use VVols snapshots without any performance overhead because it is offloaded to storage hardware. Cody explains the basic of VVols terminology and architecture. VVols use one or more Protocol Endpoints. Protocol Endpoint is nothing else then LUN with ID 254 and used as Administrative Logical Unit (ALU). Protocol Endpoint is used to handle the data but within the storage, there are Virtual Volumes also known as Subsidiary Logical Units (SLU) having VVol Sub-LUN IDs (for example 254:7) where vSphere storage objects (VM home directory, VMDKs, SWAPs, Snapshots) are stored. This session is not about VVols, therefore, it is really just a brief overview.

Trim/UNMAP and Space Reclamation on Thin Provisioned storage
The mic is handed over to Jason who starts another topic - Trim/UNMAP and Space Reclamation. Jason informs auditorium that Trim/UNMAP functionality depends on vSphere version and other circumstances.

In vSphere 6.0, Trim/UNMAP is a manual operation (esxcli storage vmfs unmap). In vSphere 6.5, it is automated with VMFS-6, it is enabled by default, can be configured to Low Priority or Off and it takes 12-24 hours to clean up space. In vSphere 6.7 adds configurable throughput limits where the default is Low Priority.

Space reclamation also works differently on each vSphere edition.

  • In vSphere 6.0 it works when virtual disks are thin, VM hardware is 11+, and only for MS Windows OS. It does NOT work with VMware snapshots, CBT enabled, Linux OS, UNMAPs are misaligned. 
  • In vSphere 6.5, it works when virtual disks are thin, it works for Windows and Linux OS, VM hardware is 11+ for MS Windows OS, VM hardware 13 for Linux OS, CBT can be enabled, UNMAPs can be misaligned. It does NOT work when VM has VMware snapshots, with thick virtual disks, when Virtual NVMe adapter is used.
  • In vSphere 6.7it works when virtual disks are thin, it works for Windows and Linux OS, VM hardware is 11+ for MS Windows OS, VM hardware 13 for Linux OS, CBT can be enabled, UNMAPs can be misaligned, VM snapshots supported, Virtual NVMe adapter is supported. It does NOT work for thick provisioned virtual disks.
The mic is handed over back to Cody to speak about queueing. At the beginning of this section, Cody explains how storage queuing works in vSphere. He shares the default values of HBA Device Queues (aka HBA LUN queues) for different HBA types:

  • QLogic - 64
  • Brocade - 32
  • Emulex - 32
  • Cisco UCS (VIC) - 32
  • Software iSCSI - 128
HBA Device Queue is an HBA setting which controls how many I/Os may be queued on a device (aka LUN). Default values are configurable via esxcli. Changing requires reboot. Details are documented in VMware KB 1267. After the explanation of "HBA Device Queue", Cody explains DQLEN which is Hypervisor level device queue limit. The actual device queue depth is a minimum from "HBA Device Queue" and DQLEN. In a mathematical formula, it is a MIN("HBA Device Queue", DQLEN). Therefore, if you increase DQLEN you have to also adjust "HBA Device Queue" for some real effect. VMFS DQLEN defaults are:

  • VMFS - 32
  • RDM - 32
  • VVols Protocol Endpoints - 128 (Scsi.ScsiVVolPESNRO)
After basic problem explanation, Cody does some quick math to stress that default settings are ok for most environments and it usually does not make sense to change it. However, if you have specific storage performance requirements, you have to understand the whole end-to-end storage stack and then you can adjust it appropriately and do a performance tunning. If you do any change, you should do it on all ESXi hosts in the cluster to keep performance consistent after migration from one ESXi host to another.

Storage DRS (SDRS) and Storage I/O Control (SIOC)
Cody continues with SDRS and SIOC. SDRS and SIOC are two different technologies.

SDRS moves VMS around based on hitting a latency threshold. This is the VM observed latency which includes any latency induced by queueing.

SIOC controls throttles VMs based on hitting a datastore latency threshold. SIOC is using a device (LUN) latency to throttle device queue depth automatically when the device is stressed. If SIOC kicks in, it takes into consideration VMDK shares, therefore VMDKs with higher shares have more frequent access to stressed (overloaded) storage device.

Jason notes that what Cody explained is how SIOC version 1 works but there is also SIOC version 2 introduced in vSphere 6.5. SIOC v1 and v2 are different. SIOC v1 looks at datastore level, SIOC v2 is policy based setting. It is good to know that SIOC v1 and SIOC v2 can co-exist on vSphere 6.5+. SIOC V2 is considerably different from a user experience perspective when compared to V1. SIOCv2 is implemented using IO Filter framework Storage IO Control category. SIOC V2 can be managed using SPBM Policies. What this means is that you create a policy which contains your SIOC specifications, and these policies are then attached to virtual machines. One thing to note is that IO Filter based IOPS does not look at the size of the IO. For example, there is no normalization like in SIOC v1 so that a 64K IOP is not equal to 2 x 32K IOPS. It is a fixed value of IOPS irrespective of the size of the IO. For more information about SIOC look here.

The last section of the presentation is about troubleshooting. It is presented by Jason. When you have to do a storage performance troubleshooting, start with reviewing performance graph and try to narrow down the issue. Looks at VM and ESXi host. Select only problem component. You have to build your troubleshooting toolbox. It should include

  • Performance Graph
  • vRealize LogInsight
  • vSphere On-disk Metadata Analyzer (VOMA)
  • Enable CEIP (Customer Experience Improvement Program) which shares troubleshooting data with VMware Support (GSS). CEIP is actually call-home functionality which is opt-out in vSphere 6.5+

Session evaluation 
This is a very good technical session for infrastructure folks responsible for VMware vSphere and storage interoperability. I highly recommend to watch it.

No comments: