The customer has vSphere 6.5 U2 (build 9298722) and IBM TSM VE 18.104.22.168. They observed the problem just on VMs where VM hardware was upgraded to version 13. The customer opened a support case with VMware GSS and IBM support.
IBM Support observed VADP/VDDK API function QueryChangedDiskAreas was failing with TSM log message similar to ...
10/19/2018 12:04:26.230   : ..\..\common\vm\vmvisdk.cpp(2436): ANS9385W Error returned from VMware vStorage API for virtual machine '
"Error caused by file /vmfs/volumes/583eb2d3-4345fd68-0c28-3464a9908b34/
VMware Support (GSS) instructed my customer to reset CBT - https://kb.vmware.com/kb/2139574 or disable and re-enable CBT - https://kb.vmware.com/kb/1031873 and observe if it solves the problem.
A few days after CBT reset, the problem with backup occurred again, therefore it was not a resolution.
I did some research and found another KB - CBT reports larger area of changed blocks than expected if guest OS performed unmap on a disk (59608). We believe that this the root cause and KB contains workaround and final resolution.
The root cause mentioned in VMware KB 59608 ...
When an unmap is triggered in the guest, the OS issues UNMAP requests to underlying storage. However, the requested blocks include not only unmapped blocks but also unallocated blocks. And all those blocks are captured by CBT and considered as changed blocks then returned to backup software upon calling the vSphere API queryChangedDiskAreas(changeId).Workaround recommended in KB ...
Disable unmap in guest VM.For example, in MS Windows Operating Systems UNMAP can be disabled by command
fsutil behavior set Disable DeleteNotify 1
and re-enabled by command
fsutil behavior set Disable DeleteNotify 0
Warning! Disabling UNMAP in guest OS can have a tremendous negative impact on storage space reclamation, therefore, fixing space issue in secondary storage can cause storage space issue on your primary storage. Check your specific design before the final decision on how to workaround this issue.
Anyway, the final problem resolution has to be done by the backup software vendor ...
If you have VDDK 6.7 or later libraries, take the intersection of VixDiskLib_QueryAllocatedBlocks() and queryChangedDiskAreas(changeId) to calculate the actually changed blocks.The backup software should not use just API function QueryChangedDiskAreas but also function QueryAllocatedBlocks and calculate disk blocks for incremental backups. Based on VDDK 6.7 Release Notes, VDDK 6.7 can be leveraged even for vSphere 6.5 and 6.0. For more info read Release Notes here.
I believe the problem occurs only on the following conditions
- The virtual disk must be thin-provisioned.
- VM Hardware is 11 and later - older VM hardware versions do not pass UNMAP SCSI commands through
- The guest operating system must be able to identify the virtual disk as thin and issuing UNMAP SCSI commands down to the storage system