IOmeter is good storage test tool. However, you have to understand basic storage principles to plan and interpret your storage performance test properly. The storage is the most crucial component for any vSphere infrastructure, therefore I have some experience with IOmeter and storage performance tests in general and here are my thoughts about this question.I am running a test with my customer how many IOPS we can get from a single VM working with HDS all flash array. The best that I could get with IOmeter was 32K IOPS with 3ms latency at 8KB blocks. No matter what other block size I choose or outstanding IOs, I am unable to have more then 32k. On the other hand I can't find any bottlenecks across the paths or storage. I use PVSCSI storage controller. Latency and queues looks to be ok.
First thing first, every shared storage system requires specific I/O scheduling to NOT give the whole performance to a single worker. The storage worker is the compute process or thread sending storage I/Os down the storage subsystem. If you think about it, it makes a perfect sense as it mitigates the problem of a noisy neighbor. When you invest a lot of money to a shared storage system, you most probably want to use it for multiple servers, right? Does not matter if these servers are physical (ESXi hosts) or virtual (VMs). To get the most performance from shared storage you must use multiple workers and optimally spread them across multiple servers and multiple storage devices (aka LUNs, volumes, datastores).
IOmeter allows you to use
- Multiple workers on a single server (aka Manager)
- Outstanding I/Os within a single worker (asynchronous I/O to a disk queue without waiting for acknowledge)
- Multiple Managers – the manager is the server generating storage workload (multiple workers) and reporting results to a central IOmeter GUI. This is where IOmeter dynamos come in to play.
- We have a single vDisk queue having default queue depth 64 (we use Paravirtual SCSI adapter. Non-paravirtualized SCSI adapters have queue depth 32)
- We have an HBA QLogic having queue default depth 64 (other HBA vendors like Emulex, have default queue depth 32, so it would be another bottleneck on the storage path)
- The storage has average service time (response time) around 3ms
- IOPS is the number of I/O operations per second
- 64 queue depth = 64 I/O operations in parallel = 64 slices for I/O operations
- Each I/O from these 64 I/Os are in the vDisk queue until SCSI response from the LUN will come back
- All other I/Os have to wait until there is the free I/O slice in the queue.
- increase queue depth, but not only on vDISK itself but END-2-END. IT IS GENERALLY NOT RECOMMENDED as you really must know what you are doing and it can have a negative impact on overall shared storage. However, if you need it and have the justification for it, you can try to tune the system.
- use the storage system with low service time (response time). For example, the sub-millisecond storage system (for example 0.5 ms) will give you more IOPS for the same queue depth as a storage system having higher service time (for example 3 ms).
- leverage multiple vDisks spread across multiple vSCSI controllers and datastores (LUNs). This would give you more (total) queue depth in a distributed fashion. However, this would have additional requirements for your real application as it would need a filesystem or other mechanism supporting multiple storage devices (vDisks).
"There are only two storage performance types - good enough and not good enough."