VMware vSAN (formerly Virtual SAN) is a hyper-converged, software-defined storage (SDS) product developed by VMware that pools together direct-attached storage devices across a VMware vSphere cluster to create a distributed, shared data store. The user defines the storage requirements, such as performance and availability, for virtual machines (VMs) on a VMware vSAN cluster and vSAN ensures that these policies are administered and maintained.VMware vSAN aggregates local or direct-attached data storage devices to create a single storage pool shared across all ESXi hosts in the vSAN (aka vSphere) cluster. vSAN eliminates the need for external shared storage and simplifies storage configuration and virtual machine provisioning. Data are protected across ESXi hosts. To be more accurate across failure domains, but let's assume we stick with the vSAN default failure domain, which is ESXi host.
vSAN is policy-based storage and policy dictates how data will be redundant, distributed, reserved, etc. You can treat a policy as a set of requirements you can define and storage system will try to deploy and operate the storage object in compliance with these requirements. If it cannot satisfy requirements defined in a policy, the object cannot be deployed or, if already deployed, it becomes in the non-compliant state, therefore at risk.
vSAN is object storage, therefore each object is composed of multiple components.
Let's start with RAID-1. For RAID-1, components can be replicas or witnesses.
Replicas are components containing the data.
Witnesses are components containing just metadata used to avoid split-brain scenario.
Objects components are depicted on the screenshot below where you can see three objects
- VM Home
- VM Swap
- VM Disk
The key concept of data redundancy is FTT. FTT is the number of failures to tolerate. To tolerate failures, vSAN supports two methods of data distribution across vSAN nodes (actually ESXi hosts). It is often referenced as an FTM (Failure Tolerance Method). FTM can be
- RAID-1 (aka Mirroring)
- RAID-5/6 (aka Erasure Coding)
In the table below, you can see how many hosts you need to achieve particular FTT for FTM RAID-1 (Mirroring):
|FTT||Replicas||Witness components||Minimum # of hosts|
In the table below, you can see how many hosts you need to achieve particular FTT for FTM RAID-5/6 (Erasure Coding):
|FTT||Erasure coding||Redundancy||Minimum # of hosts|
The above number of ESXi hosts are minimal. What does it mean? In case of longer ESXi host maintenance or long-time server failure, vSAN will not be able to rebuild components from affected ESXi node somewhere else. That's the reason why at least one additional ESXi host is highly recommended. Without one additional ESXi host, there can be situations, your data are not redundant, therefore unprotected.
I have written this article mainly for myself to use it as a quick reference during conversations with customers. Hope you will find it useful as well.