Tuesday, September 16, 2014

Compellent Storage Center Live Volume and vSphere Metro Cluster

Are you interested in metro clusters (aka stretched clusters)?

Watch this video which introduces the new Synchronous Live Volume features available in Dell Compellent Storage Center 6.5.

And if you need more technical deep dive use this guide focuses on two main data protection and mobility features available in Dell Compellent Storage Center: synchronous replication and Live Volume. In this paper, each feature is discussed and sample use cases are highlighted where these technologies fit independently or together.

Compellent Live Volume curretnly doesn't support automated fail-over based on arbiter on third site so that's the reason why it is not certified as VMware vSphere Metro Cluster storage. Certification is just a matter of time. However, you can leverago Compellent Live Volume with vSphere. The only drawback is that whole storage node fail-over has to be done manually which can be enough or preferred method in some environments.

Wednesday, September 10, 2014

Tool for Network Assessment and Documentation

Do you need tool for Automated Network Assessment and Documentation? Try NetBrain and let me know how do you like it. I'm writing this tool to my todo list I need to test in my lab so I'll write another blog post after test.

NetBrain's deep network discovery will build a rich mathematical model of the network’s topology and underlying design. The data collected by the system is automatically embedded within every diagram and exportable to MS Visio, Word, or Excel.

NetBrain Personal Edition is the totally free version of NetBrain. It will let you discover up to 20 network devices and will never expire.

iSCSI and Ethernet

Each manufacturer of Ethernet switch may implement features unique to their specific model. Below are some general tips to look for when implementing an iSCSI network infrastructure. Each tip may or may not apply to a specific installation. Be aware that this is list is inspired by DELL Compellent iSCSI bets practices and it is not an all-inclusive list.
  • Bi-Directional Flow Control enabled for all Switch Ports that carry iSCSI traffic, including any inter switch links.
  • Separate networks or VLANs from data.
  • Separate iSCSI traffic multi-path traffic also.
  • Unicast storm control disabled on every switch that handles iSCSI traffic.
  • Multicast disabled at the switch level for any iSCSI VLANs - Multicast storm control enabled (if available) when multicast cannot disabled.
  • Broadcast disabled at the switch level for any iSCSI VLANs - Broadcast storm control enabled (if available) when broadcast cannot disabled.
  • Routing disabled between regular network and iSCSI VLANs - Use extreme caution if routing any storage traffic, performance of the network can be severely affected. This should only be done under controlled and monitored conditions.
  • Disable Spanning Tree (STP or RSTP) on ports which connect directly to end nodes (the server or Dell Compellent controller's iSCSI ports.) You can do it by enabling PortFast or EdgePort option  on these ports so that they are configured as edge ports.
  • Ensure that any switches used for iSCSI are of a non-blocking design.
  • Hard set for all switch ports and server ports for Gigabit Full Duplex if applicable.
  • When deciding which switches to use, remember that you are running SCSI traffic over it. Be sure to use a quality managed enterprise class networking equipment. It is not recommended to use SBHO (small business/home office) class equipment outside of lab/test environments.
Do you want configuration examples for DELL PowerConnect and DELL Force10 switches? Leave a comment with particular switch model and firmware version and I'll try my best to prepare it for you.

Tuesday, September 09, 2014

DELL Force10 switch and NIC Teaming

NIC teaming is a feature that allows multiple network interface cards in a server to be represented by one MAC address and one IP address in order to provide transparent redundancy, balancing, and to fully utilize network adapter resources. If the primary NIC fails, traffic switches to the secondary NIC because they are represented by the same set of addresses.

Let's assume we have the host with two NICs where primary NIC is connected to Force10 switch port 0/1 and secondary NIC to switch port 0/5. When you use NIC teaming, consider that the server MAC address is originally learned on Port 0/1 of the switch and Port 0/5 is the failover port. When the NIC fails, the system automatically sends an ARP request for the gateway or host NIC to resolve the ARP and refresh the egress interface. When the ARP is resolved, the same MAC address is learned on the same port where the ARP is resolved (in the previous example, this location is Port 0/5 of the switch). To ensure that the MAC address is disassociated with one port and re-associated with another port in the ARP table, configure the
mac-address-table station-move refresh-arp 
command on the Dell Networking switch at the time that NIC teaming is being configured on the server.

! NOTE: If you do not configure the mac-address-table station-move refresh-arp command, traffic continues to be forwarded to the failed NIC until the ARP entry on the switch times out.

UPDATE 2015-03-16:
I have just discovered another FTOS command ...
arp learn-enable
   Enable ARP learning using gratuitous ARP.

NIC Teaming solutions can leverage gratuitous ARP so it is worth to enable it in my opinion.

This command should be very beneficial on VMware environments where VMware vSwitch sends gratuitous ARP after VM is vMotioned from one ESXi host to another.
ESXi host doesn't use gratuitous arp but reverse arp (aka RARP). Anyway these two commands are beneficial for VMware vMotion.

Monday, September 08, 2014

Redirect ESXi syslog and coredump over network

Let's assume we have syslog server on IP address [SYSLOG-SERVER] and coredump server at [COREDUMP-SERVER]. Here are CLI commands how to quickly and effectively configure network redirection.

esxcli system syslog config set --loghost=udp://[SYSLOG-SERVER]
esxcli network firewall ruleset set --ruleset-id=syslog --enabled=true
esxcli network firewall refresh
esxcli system syslog reload

esxcli system syslog config get

esxcli system coredump network set --interface-name vmk0 --server-ipv4 [COREDUMP-SERVER] --server-port 6500
esxcli system coredump network set --enable true
esxcli system coredump network check
By the way, do you know that VMware vCenter Server Appliance works as syslog and coredump server? So why not use it? It is free of charge.

vCenter Log Insight is much better syslog server because you can easily search centralized logs and do some advanced analytic however that's another topic.

Sunday, September 07, 2014

vSphere HA Cluster Redundancy

All vSphere administrators and implementers know how easily vSphere HA Cluster can be configured. However sometimes quick and simple configuration doesn't do exactly what is expected. You can, and typically you should, enable Admission Control in vSphere HA Cluster configuration settings. VMware vSphere HA Admission Control is control mechanism checking if another VM can be powered on in HA enabled cluster and still satisfy redundancy requirement. So far so good however complexity starts from here because you have several options what algorithm you will use to fulfill your spare capacity redundancy requirement. So what options do you have?

Admission Control can be configured for following three algorithms:
  1. Define fail-over capacity by static number of hosts
  2. Define fail-over capacity by reserving a percentage of cluster resources
  3. Use dedicated fail-over hosts
Let's deep dive into each option ...

Algorithm 1 is generally N+X host redundancy 
When N+X redundancy is required most vSphere designers go with this option because it looks like most suitable choice. However, it is important to know that this particular algorithm is working with HA Slot Size. HA Slot Size is calculated based on defined reservations on powered VMs. If you don't use CPU/MEM reservations per VM than default reservation values (32 MHz, memory virtualization overhead)  are used for HA Slot Size calculation. By the way, VMware recommends to set  reservations per resource pools and not per VM so there is relatively high probability you don't have VM reservations and you will have very low HA Slot Size which means that Admission Control will allow to power on lot of VMs which introduce high resource over-allocation and your N+1 redundancy can significantly suffer. On the other hand, if you have just one VM with huge CPU/MEM reservations it can significantly impact and skew HA Slot Size with a negative impact on your VM consolidation ratio.  

How can we solve this problem? One solution is HA Cluster Advanced Options described below.

Maximum HA Slot size can be limited to two following advanced options.
  • das.slotcpuinmhz - Defines the maximum bound on the CPU slot size. If this option is used, the slot size is the smaller of this value or the maximum CPU reservation of any powered-on virtual machine in the cluster.
  • das.slotmeminmb - Defines the maximum bound on the memory slot size. If this option is used, the slot size is the smaller of this value or the maximum memory reservation plus memory overhead of any powered-on virtual machine in the cluster.
It helps in a situation when you have one VM with high CPU or RAM reservations. Such VM will not increase HA Slot Size but it consumes smaller HA Slots.

Default VM reservation values for HA slot calculation can be defined by another two advanced options.
  • das.vmcpuminmhz - Defines the default CPU resource value assigned to a virtual machine if its CPU reservation is not specified or zero. This is used for the Host Failures Cluster Tolerates admission control policy. If no value is specified, the default is 32MHz.
  • das.vmmemoryminmb - Defines the default memory resource value assigned to a virtual machine if its memory reservation is not specified or zero. This is used for the Host Failures Cluster Tolerates admission control policy. If no value is specified, the default is 0 MB.
Default VM reservation values can help you to define HA Slot Size you want but it doesn't automatically correspond with required overbooking and planed spare fail-over capacity because HA Slot Size is not proportional to VM sizes on a particular cluster. If you really want to have one real spare host fail-over capacity you have to go with option 3 (Use dedicated fail-over hosts).

Algorithm 2 : percentage cluster spare capacity
This algorithm doesn't use HA Slot size but it simply calculates total cluster CPU/MEM resources and decrease these cluster resources by spare capacity defined in percentage.  The rest of cluster available resources is also decreased by powered on VM reservations and new VMs can be powered on only when some cluster resources are available. Quite clear and simple, right? However, it also requires to have VM reservations otherwise you will end up with over-allocated cluster and your overbooking ratio will be too high which can introduce some performance issues. So once again, if you really want to have one real spare host fail-over capacity without dealing with VM reservations the best way is to go with option 3 (Use dedicated fail-over hosts).
Note that algorithm 2 doesn't use HA Cluster Advanced Options related to HA Slot mentioned above. However das.vmCpuMinMHz and das.vmMemoryMinMB can be used  to set default reservations. For more details read this.

Algorithm 3 : dedicated fail-over hosts
This algorithm simply dedicates specified hosts to be unused during normal conditions and used only in case of ESXi host failure. Multiple fail-over dedicated hosts are supported since vSphere 5.0. This algorithm will keep your capacity and performance absolutely predictable and independent on VM reservations. You'll get exactly what you configure.

UPDATE 2018-01-09: for some additional details about dedicated fail-over hosts read the blog post Admission Control - Dedicated fail-over hosts.

So what option to use? The correct answer is, as usually , ...  it depends :-)   

However, if VM reservations are not used and absolutely predictable N+X redundancy is required I currently recommend Option 3.

If you have a mental problem with not using some ESXi host during non-degraded cluster state (isn't it exactly what is required?) I recommend Option 1 but VM reservations must be used to have a realistic size of HA Slot. In this options, artificial HA Slot can be designed leveraging advanced options.

If you don't want elaborate with HA Slot and use all ESXi hosts in the cluster you can use Option 2 but VM reservations must be used for some capacity guarantee to avoid high overbooking ratio.

It would be great if VMware vSphere has some kind of Cluster Reservation policy for VMs. For example, if you want to guarantee cluster resources overbooking 2:1 you would set up 50% CPU and 50% RAM reservations for each VM running in HA Cluster. This policy should be dynamic so if someone changes VM size from CPU or RAM perspective reservations would be recalculated automatically.

Let's break down our example above. We are assuming following HA CLUSTER RESERVATION POLICY => CPU 50%, RAM 50% assigned to our HA Cluster. Let's powered on VM with 2x vCPUs and 6GB RAM. Dynamic reservation calculation is quite easy from RAM perspective because memory reservation would be 3GB (50% from 6GB). It is a little bit more complicated from CPU reservation perspective. CPU dynamic reservation has to be calculated based on physical CPU where VM is running. So let's assume we have Intel Xeon E5-2450 @ 2.1GHz. So 50% from 2.1GHz is 1.05GHz but we have 2 vCPUs so we have to multiply it by 2. Therefore dynamic CPU reservation for our VM is 2.1GHz.  I believe with such dynamic reservation policy we would be able to guarantee overbooking ratio and define cluster redundancy more predictable from overbooking and performance degradation point of view.

I would like to know what is your preferred HA Cluster Admission Control setting. So, don't hesitate to leave a comment and share your thoughts with the community. Any feedback is very welcome and highly appreciated. 

Friday, September 05, 2014

EVO:RAIL Introduction Video

EVO:RAIL introduction video is quite impressive. Check it your self at


I'm really looking forward for first EVO:RAIL implementation.