Tuesday, January 24, 2017

VMware vSphere 6.0 PSC and SSO Domain useful resources

I do not have real numbers but it seems obvious and logical that SMB and midrange customers are adopting the latest VMware software much quicker then large enterprise customers. To be more precise, they are probably already running vSphere 6.0 and planing to upgrade to 6.5 now or soon. Some of them just waiting for 6.5 U1 which is expected soon.

On the other hand, the largest VMware customers are logically more conservative and starting migrations from vSphere 5.5 to 6.0 just now, in time of writing this article (beginning of 2017). These large customers have significantly larger scale therefore their PSC/SSO topology is much more complex.

During last few weeks I have discussed some vSphere 6 PSC/vCenter topology design decision points with these customers and I have decided to write down blog post about few useful, publicly available, resources / documents for such discussions.

First and foremost,  FAQ below is the most comprehensive VMware KB article about this topic.

FAQ: VMware Platform Services Controller in vSphere 6.0 (2113115)

The most surprised information, even for long time VMware customers, are following two Q&A's from FAQ above.

Q: Can I merge two vSphere Domains together?
A: No, there is no way to merge two vSphere domains together.

Q: Can I get Enhanced Linked Mode (ELM) between two, separate vSphere domains?
A: No, Enhanced Linked Mode requires that all PSCs be in the same domain and replicating. Since two separate vSphere Domains do not have a means of replicating, the new APIs that provide ELM cannot display the contents of both domains.

What does it mean?
Well, if you have multiple independent vSphere 5.5 SSO domains and you want to merge them, you have to do it in vSphere 5.5 before upgrade to 6.0 because you will not be able to do so in vSphere 6 and later.
Note: I do not know how it will change in longer term but it is the true even for vSphere 6.5 which is the latest version in time of writing this blog post.

Q: One of my customers asked me if the same vSphere SSO name (vsphere.local) in their two separate datacenters means that it is the same vSphere domain.
A: No. If you do not have replication between domains, there are not the same domain even they have the same name.

Another good question, you have to ask yourselves is, if you should or should not merge your vSphere domains. The typical reason for single vSphere domain is requirement for Enhanced Linked Mode (ELM). What Enhanced Linked Mode will give you? Below are several benefits of ELM:
  • You can log in to all linked vCenter Server systems simultaneously with a single user name and password.
  • With Enhanced Linked Mode, you can view and search across all linked vCenter Server systems. This mode replicates roles, permissions, licenses, and other key data across systems.
  • You can view and search the inventories of all linked vCenter Server systems within the vSphere Web Client.
  • Roles, permission, licenses, tags, and policies are replicated across linked vCenter Server systems.
  • You can use WebClient GUI to do cross vCenter vMotion
However, any technology has some limits. In case of vSphere, we should always look at vSphere Configuration Maximums. The relevant information from configuration maximums are

  • Maximum PSCs per vSphere Domain - 8
  • Maximum PSCs per site, behind a load balancer - 4
  • Maximum number of VMware Solutions connected to a single PSC - 4
  • Maximum number of VMware Solutions in a vSphere Domain - 10
What are VMware Solutions?
A VMware Solution is defined as a product that creates a Machine Account and one or more Solution User (a collection of vSphere services) within the VMware Directory Service when the product is joined to the PSC, thus the vSphere Domain. The Machine Account and Solution User(s) are used to broker and secure communication between other Solutions available within the vSphere environment. In order to count against these maximums, the Machine Account and Solution Users must be fully integrated with all of the PSC's available feature sets (Identity Management and Authentication Brokering, Certificate Management, Licensing, etc.) such that the product makes full use of the PSC. At this time, only vCenter Server is defined as a fully integrated solution and counts against these maximums. Partially integrated solutions, such as vCenter Site Recovery Manager, vCloud Director vRealize Orchestrator, vRealize Automation Center, and vRealize Operations, do not count against these defined maximums.
So in other words, vCenters are currently the only solutions which counts into maximum of 10 VMware solutions. 

Now, when you know if you really need and want to merge vSphere domains it must be done in vSphere 5.5 because in vSphere 6 it is not possible.

I was asked by one of my customers, where is written that vSphere domain merging is supported and how it can be done.

Bellow are two blog post written by blogger Thom Greene ...

Merging SSO Domains in vCenter 5.5 part 1: Why?

Merging SSO Domains in vCenter Server 5.5 pt 2: How?

and very detailed blog post of Andreas Peetz referred by Thom in his posts.

Re-pointing vCenter Server 5.5: A Survival Guide to KB2033620

... but resources above are not VMware official documents so where are VMware official documents? Andreas' blog posts are referring to following VMware KB's

Migrating two VMware vCenter Single Sign-On embedded VMware vCenter Servers in the same VMware vCenter Single Sign-On domain (2130433)

How to repoint and re-register vCenter Server 5.1 / 5.5 and components (2033620)

VMware vCenter Server 5.1/5.5 fails to start after re-registering with vCenter Single Sign-On (2048753)

Old but still informative blog post ... vSphere Datacenter Design – vCenter Architecture Changes in vSphere 6.0 – Part 1

Additional VMware resources:

Platform Services Controller Topology Decision Tree

vCenter Server Topology Considerations

Reconfigure a Standalone vCenter Server with an Embedded Platform Services Controller to a vCenter Server with an External Platform Services Controllerlink

How to repoint vCenter Server 6.x between External PSC within a site (2113917)

Using the cmsso command to unregister vCenter Server from Single Sign-On (2106736)

and just another related blog post from William Lam
How to split vCenter Servers configured in an Enhanced Linked Mode (ELM)?

Understanding the Impacts of Mixed-Version vCenter Server Deployments

Useful VMware KB article before upgrade to vSphere 6.5

I have just found following very useful VMware KB articles and blog posts which should be read before any vSphere 6.5 upgrade and design refresh.

Update sequence for vSphere 6.5 and its compatible VMware products (2147289)

Important information before upgrading to vSphere 6.5 (2147548)

Best practices for upgrading to vCenter Server 6.5 (2147686)

Platform Services Controller Topology Decision Tree

Reconfigure a Standalone vCenter Server with an Embedded Platform Services Controller to a vCenter Server with an External Platform Services Controller

How to repoint vCenter Server 6.x between External PSC within a site (2113917)

Wednesday, January 11, 2017

Using esxtop to identify storage performance issues for ESX / ESXi

ESXi performance are exposing to administrators through vSphere Clients. You can see real-time performance statistics which are collected in 5 minute intervals where each interval consists of fifteen 20 seconds samples. It is obvious that 20 second sample is pretty large for storage performance where we are working in mili or even micro second scale.
20 seconds contains 20,000 milliseconds
Let's be clear here, we will never have full visibility but smaller monitoring sample will give as better clue what is really happening inside the system. It is similar to microscope device.

The smallest monitoring samples can be achieved by ESXi utility ESXTOP. The default esxtop delay between monitoring points (sample) is 5 seconds. However, it can be lowered up to 2 seconds by parameter -d 2

For real analytics the esxtop data must be exprted to external file. In esxtop terminology it is batch mode and it is achieved by parameter -b 

Another important factor is what statistics (metrics) we are going to collect. The best is to collect all statistics because during performance analytics you have to correlate multiple values against each other. It is achieved by parameter -a

And last parameter is -n which defines how many iterations you want to perform in batch mode. So in example below we will have 30 iterations with delay between each other 2 seconds. So we will do total monitoring for 60 seconds.

esxtop -b -a -d 2 -n 30 > esxtop-data.csv

For all esxtop parameters see screenshot below.

 [root@esx11:~] esxtop -h  
 usage: esxtop [-h] [-v] [-b] [-l] [-s] [-a] [-c config file] [-R vm-support-dir-path]   
         [-d delay] [-n iterations]  
        [-export-entity entity-file] [-import-entity entity-file]   
        -h prints this help menu.  
        -v prints version.  
        -b enables batch mode.  
        -l locks the esxtop objects to those available in the first snapshot.  
        -s enables secure mode.  
        -a show all statistics.  
        -c sets the esxtop configuration file, which by default is .esxtop60rc  
        -R enables replay mode.  
        -d sets the delay between updates in seconds.  
        -n runs esxtop for only n iterations. Use "-n infinity" to run esxtop forever.  
        -----Experimental Features-------------  
        -export-entity writes the entity ids into a file, which can be modified  
         to select interesting entities.  
        -import-entity reads the file of selected entities. If this opion   
         is used, esxtop only shows the data for the selected entities.  

It is important to know, that esxtop will give you significantly more statistics you can see in vSphere Client level. That's another important benefit of esxtop. But each benefit has also some drawbacks or impact. The impact is, that single esxtop output line can have several thousands statistic counters. For example ESXi 6.0 host with just 2 running VMs in my home lab has 27,314 counters. My customer's product ESXi host has over 330,000 counters! So the output file can be pretty large in case you run it for 24 hours. Count on it.

In the file are very interesting counters. Following counters for physical disk devices are the most interesting
### Reponse times
Average Guest MilliSec/Command
Average Kernel MilliSec/Command
Average Queue MilliSec/Command
Average Queue MilliSec/Read
Average Driver MilliSec/Command
Average Driver MilliSec/Write
### Queue
Adapter Q Depth
### IOPS
### MB/s
MBytes Read/sec
MBytes Written/sec"
### Split commands
Split Commands/sec
### SCSI Reservations
Failed Reserves/sec
### Failures
Failed Commands/sec
Failed Reads/sec
Failed Writes/sec
Failed Bytes Read/sec
Failed Bytes Written/sec
Some of above counters are not available in vSphere Client but the big benefit is that esxtop will give you data in 2 second interval which is much better granularity.

I hear your questions - So what now? How to analyze esxtop output file?
Well, you can replay it back in esxtop or you can use any of following tools

  • VisualEsxtop
  • perfmon
  • excel
  • esxplot
To be honest, none of tools above fulfilled my requirements therefore I'm writing my own python script for esxtop output analysis.

I will blog about it in next post when script will be good enough for public usage and published on github.

Stay tuned.