Thursday, May 28, 2015

How large is my ESXi core dump partition?

Today I have been asked to check core dump size on ESXi 5.1 host because this particular ESXi experienced PSOD (Purple Screen of Death) with message that core dump was not saved completely because out of space.

To be honest it took me some time to find the way how to find core dump partition size therefore I documented here.

All commands and outputs are from my home lab where I have ESXi 6 booted from USB but principle should be same.

To run these commands you have to login to ESXi shell for example over ssh or ESXi troubleshooting console.

First step is to get information what disk partition is used for core dump.
[root@esx01:~] esxcli system coredump partition get   Active: mpx.vmhba32:C0:T0:L0:9
   Configured: mpx.vmhba32:C0:T0:L0:9
Now we know that core dump is configured on disk mpx.vmhba32:C0:T0:L0 partition 9.

Second step is to list disks and disks partitions together with sizes.
[root@esx01:~] ls -lh /dev/disks/total 241892188
-rw-------    1 root     root        3.7G May 28 11:25 mpx.vmhba32:C0:T0:L0
-rw-------    1 root     root        4.0M May 28 11:25 mpx.vmhba32:C0:T0:L0:1
-rw-------    1 root     root      250.0M May 28 11:25 mpx.vmhba32:C0:T0:L0:5
-rw-------    1 root     root      250.0M May 28 11:25 mpx.vmhba32:C0:T0:L0:6
-rw-------    1 root     root      110.0M May 28 11:25 mpx.vmhba32:C0:T0:L0:7
-rw-------    1 root     root      286.0M May 28 11:25 mpx.vmhba32:C0:T0:L0:8
-rw-------    1 root     root        2.5G May 28 11:25 mpx.vmhba32:C0:T0:L0:9

You can get the same information by partedUtil.
[root@esx01:~] partedUtil get /vmfs/devices/disks/mpx.vmhba32:C0:T0:L0:9326 255 63 5242880
Here you can see the partition has 5,242,880 sectors where each sector is 512 bytes. That's mean 5,242,880 * 512 / 1024 / 1024 / 1024 = 2.5GB

Note: It is 2.5GB because ESXi is installed on 4GB USB. If you have regular hard drive core dump partition should be 4 GB.

BUT all above information is not valid if you have changed your Scratch Location (here is VMware KB how to do it). If your Scratch Location is changed you can display current scratch location which is stored on /etc/vmware/locker.conf

[root@esx01:~] cat /etc/vmware/locker.conf
/vmfs/volumes/02c3c6c5-53c72a35/scratch/ 0

and you can list sub directories in your custom scratch location
[root@esx01:~] ls -la /vmfs/volumes/02c3c6c5-53c72a35/scratch/esx01.home.uw.cztotal 28d---------    7 root     root          4096 May 12 21:45 .d---------    4 root     root          4096 May  3 20:47 ..d---------    2 root     root          4096 May  3 21:17 cored---------    2 root     root          4096 May  3 21:17 downloadsd---------    2 root     root          4096 May 28 09:30 logd---------    3 root     root          4096 May  3 21:17 vard---------    2 root     root          4096 May 12 21:45 vsantraces
Please note that new scratch location contains custom core dump sub directory (core) and also log sub directory (log).  

Other considerations
I usually change ESXi coredump partition and log directory location to shared datastore. This is done by following ESXi host advanced settings fully described in this VMware KB:
  • CORE DUMP Location: ScratchConfig.ConfiguredScratchLocation
  • Log Location: and optionaly if you want redirect all ESXi hosts to the same directory
I also recommend to send logs to remote syslog server over network which is done with advanced setting 
  • Remote Syslog Server(s):
ESXi core dumps can be also transferred over to network to central Core Dump Server. It has to be configured with following esxcli commands.
esxcli system coredump network set --interface-name vmk0 --server-ipv4 [Core_Dump_Server_IP] --server-port 6500
esxcli system coredump network set --enable true
esxcli system coredump network check


polishpaul said...

"BUT all above information is not valid if you have changed your Scratch Location" - why? Where is this explained?

David Pasek said...

I do not know why. I have just test it in my home lab. So it works like that - or at least worked at the time i tested it. Unfortunately, not everything is documented that's why the lab is essential to really understand how things works.