Wednesday, February 07, 2018

Storage QoS with vSphere virtual disk IOPS limits

I'm a long time protagonist of storage QoS applied per each VM virtual disk (aka vDisk). In the past, vSphere virtual disk shares and IOPS limits were the only solutions. Nowadays, there are new architectural options - vSphere virtual disk reservations and VVols QoS. Anyway, whatever option you will decide to use, the reason to use QoS (IOPS limits) is the architecture of all modern shared storages. Let's dig a little bit deeper. In modern storage systems, you usually have a single pool of a large number of disks and volumes (aka LUNs) created on top of this pool. This modern storage array approach has a lot of advantages but the key advantage is the positive impact on performance because single LUN is spread across a large number of disks which multiplies available performance. However, you usually have multiple LUNs and all these LUNs resides on the same disk pool so it sits on the same disks, therefore all these LUNs interfere with each other from the performance point of view. Even you would have a single LUN you would have multiple VMs per single LUN (aka VMFS datastore) therefore VMs would affect each other.

Design consideration

What it all means? Well, if you have a typical vSphere and storage configuration depicted in the figure below, single VM can use all performance from the underlying storage system (disk pool) and I/O workload from one VM impacts another VM. We call this situation "noisy neighbor" where one VM has significantly higher storage traffic than others.

Modern Storage System Architecture - Disk Pool
The problem with such design is not only performance interference but the fact, that performance is unpredictable. VMs are getting excellent performance when storage is NOT overloaded but very poor performance during peak times.  From use experience point of view is better to limit VMs to decrease the difference between BEST and WORST performance.

How to design your infrastructure to keep your storage performance predictable? The short answer is to implement QoS. Of course, there is the longer answer.

Storage QoS

vSphere Storage QoS is called Storage I/O Control or simply SIOC. Storage-based per LUN QoS will not help you on VMFS because multiple VM virtual disks (vDisks) are accomodated on single LUN (datastore), however, all modern storages support Virtual Volumes (aka VVols) which is VMware framework how to manage vDisks directly on storage but the implementation is always specific to particular storage vendor. Nevertheless, in general, VVols can do QoS per vDisk (aka VMDK) as each vDisk is represented on storage as independent volume (aka LUN or better say sub-LUN or micro-LUN).

vSphere Storage QoS (SIOC) supports

  • IOPS Limits
  • IOPS Reservations
  • IOPS Shares
Storage I/O Control (SIOC) was initially introduced in vSphere 4.1. Nowadays it is called SIOC V1. vSphere 6.5 has introduced SIOC V2.

vSphere SIOC V1

In case you want to use vSphere SIOC V1 storage QoS, it is good to know, at least some, SIOC V1 "IOPS limit" implementation details
  • I/Os are normalized at 32KB so 128KB I/O is counted as 4 I/Os. This is vSphere default which can be changed by advanced parameter Disk.SchedCostUnit (default value is 32768 bytes) to satisfy different customer's specific requirements. Disk.SchedCostUnit is configurable and allowable values range between 4K to 256K.
  • In the latest ESXi version  ESXi 6.5 build 7526125) IOPS limit does not interfere with SVmotion (Storage vMotion). However, this is different behavior than in previous ESXi versions where storage vMotion was billed to the VM.
  • If VM has multiple disks with IOPS limits on the same datastore (LUN) then all limits are aggregated and enforced on such datastore per the whole VM. If vDisks of single VM are located on different datastores then limits are enforced independently per each datastore. The behavior is different on NFS datastores. All this behavior is explained in VMware KB "Limiting disk I/O from a specific virtual machine" -

vSphere SIOC V2

It is worth to mention that SIOC V1 and SIOC V2 can co-exist. SIOC V2 is very different when compared to V1.

SIOC V2 implementation details

  • I/Os are NOT normalized at static I/O size like SIOC V1. In other words, SIOC V2 does not have Disk.SchedCostUnit implemented.
  • SIOCv2 is implemented using IO Filter framework and is managed by using SPBM Policies. 

VVols QoS

VVols QoS is out of the scope of this article.

Hope it is informative but as always do not hesitate to contact me for further details or discussions.


Unknown said...

Hi David!

Some comments below

1. IOs are normalized at 32KB so 128KB I/O would be counted as 4 I/O

You can change this behaviour using advanced parameter
esxcfg-advcfg -g /Disk/SchedCostUnit

see KB

2. IOPS limit does not interfere with SVmotion, XVmotion, and cloning

That's not true.
When VM is powered on and you svmotion it IOPS limit affects this operation.
It's easy to check. Apply 16 iops limit on powered on VM's vmdk and try svmotion it.
You'll notice that it goes very-very slowly.

David Pasek said...

Hi Dmitry,

thanks for your comments.

Ad 1/ You are absolutely right I/O size for accounting can be changed by ESXi Advanced System Setting Disk.SchedCostUnit = 32768. Generally, I would not recommend changing default settings unless there is very good justification, but I agree it is good to know it is changeable.

Ad 2/ I have been asked about this behavior by one of my customers and I did the research a few days ago and found this information in internal materials. I did the quick test in my lab and svMotion was NOT limited by IOPS limits. However, I also remember different behavior in the past. It seems that the latest vSphere version changed this behavior. Someone else reported the same new behavior here ... ...See. the latest comments in the thread. Duncan Epping documented there the old behavior few years back.

I have double checked that I have ESXi 6.5 build 7526125 in my lab where is new behavior. I have just finished a quick test on another ESXi host with build ESXi, 6.5.0, 7388607 and there is old behavior. So I will do additional research and will update the blog post with further details.

By the way, I'm also investigating what is the VADP (image backup) behavior because if I recall correctly it was counted into IOPS limit in the past.

Stay tuned.

Unknown said...

Thanks for the link on community thread.
Everything I've tested before with limits and svmotion was on ESXi 6.0 and earlier release.

Goot to know that it's changing now.