Friday, April 05, 2019

vSAN : Number of required ESXi hosts

As you have found this article, I would assume that you know what vSAN is. For those who are new to vSAN, below is the definition from
VMware vSAN (formerly Virtual SAN) is a hyper-converged, software-defined storage (SDS) product developed by VMware that pools together direct-attached storage devices across a VMware vSphere cluster to create a distributed, shared data store. The user defines the storage requirements, such as performance and availability, for virtual machines (VMs) on a VMware vSAN cluster and vSAN ensures that these policies are administered and maintained.
VMware vSAN aggregates local or direct-attached data storage devices to create a single storage pool shared across all ESXi hosts in the vSAN (aka vSphere) cluster. vSAN eliminates the need for external shared storage and simplifies storage configuration and virtual machine provisioning. Data are protected across ESXi hosts. To be more accurate across failure domains, but let's assume we stick with the vSAN default failure domain, which is ESXi host.

vSAN is policy-based storage and policy dictates how data will be redundant, distributed, reserved, etc. You can treat a policy as a set of requirements you can define and storage system will try to deploy and operate the storage object in compliance with these requirements. If it cannot satisfy requirements defined in a policy, the object cannot be deployed or, if already deployed, it becomes in the non-compliant state, therefore at risk.

vSAN is object storage, therefore each object is composed of multiple components.

Let's start with RAID-1. For RAID-1, components can be replicas or witnesses.
Replicas are components containing the data.
Witnesses are components containing just metadata used to avoid split-brain scenario.

Objects components are depicted on the screenshot below where you can see three objects
  1. VM Home 
  2. VM Swap
  3. VM Disk
where each object has two components (data replicas) and one witness (component containing just metadata). 
vSAN Components

The key concept of data redundancy is FTT.  FTT is the number of failures to tolerate. To tolerate failures, vSAN supports two methods of data distribution across vSAN nodes (actually ESXi hosts). It is often referenced as an FTM (Failure Tolerance Method). FTM can be
  • RAID-1 (aka Mirroring)
  • RAID-5/6 (aka Erasure Coding)
As data are distributed across nodes to achieve redundancy and not disks, I'd rather call it RAIN than RAID. Anyway, vSAN terminology uses RAID, so let stick with RAID.

In the table below, you can see how many hosts you need to achieve particular FTT for FTM RAID-1 (Mirroring):

FTTReplicasWitness componentsMinimum # of hosts

In the table below, you can see how many hosts you need to achieve particular FTT for FTM RAID-5/6 (Erasure Coding):
FTTErasure codingRedundancyMinimum # of hosts
0NoneNo redundancy1

Design consideration: 
The above number of ESXi hosts are minimal. What does it mean? In case of longer ESXi host maintenance or long-time server failure, vSAN will not be able to rebuild components from affected ESXi node somewhere else. That's the reason why at least one additional ESXi host is highly recommended. Without one additional ESXi host, there can be situations, your data are not redundant, therefore unprotected. 

I have written this article mainly for myself to use it as a quick reference during conversations with customers. Hope you will find it useful as well.

No comments: