Recreating a missing VMFS datastore partition in VMware vSphere 5.x and 6.x

Symptoms

A datastore has become inaccessible.
A VMFS partition table is missing.

Purpose

The partition table is required only during a rescan. This means that the datastore may become inaccessible on a host during a rescan if the VMFS partition was deleted after the last rescan. The partition table is physically located on the LUN, so all vSphere hosts that have access to this LUN can see the change has taken place. However, only the hosts that do a rescan will be affected.

This article provides information on:

Determining whether this is the same problem
Resolving the problem

Cause

This issue occurs because the VMFS partition can be deleted by deleting the datastore from the vSphere Client. This is prevented by the software, if the datastore is in use. It can also happen if a physical server has access to the LUN on the SAN and does an install, for example.

Resolution

To resolve this issue:

Run the partedUtil command on the host with the issues and verify if your output is similar to

# partedUtil getptbl /vmfs/devices/disks/naa.6006016045502500c20a2b3ccecfe011

Verify if the output of the command is similar to:

gpt
52216 255 63 838860800
1 2048 838850039 AA31E02A400F11DB9590000C2911D1B8 vmfs 0

If your output appears similar to the following, it indicates the partition is missing:

gpt
52216 255 63 838860800

In this case, you must recreate the partition. To recreate the partition:

Find the beginning and end blocks of the VMFS partition. To find the beginning of the partition, run this command (one line script) on the host:

# offset="128 2048"; for dev in `esxcfg-scsidevs -l | grep "Console Device:" | awk {'print $3'}`; do disk=$dev; echo $disk; partedUtil getptbl $disk; { for i in `echo $offset`; do echo "Checking offset found at $i:"; hexdump -n4 -s $((0x100000+(512*$i))) $disk; hexdump -n4 -s $((0x1300000+(512*$i))) $disk; hexdump -C -n 128 -s $((0x130001d + (512*$i))) $disk; done; } | grep -B 1 -A 5 d00d; echo "---------------------"; done

Note: The preceding script checks all of the storage devices and the list may be lengthy. This script is not applicable for local disks.

You see output similar to:

/vmfs/devices/disks/naa.60060160455025009839a9ed4cfee011
msdos
78325 255 63 1258291200
1 128 1258291124 251 0
Checking offset found at 128:
0110000 d00d c001
0110004
1310000 f15e 2fab
1310004
0131001d 46 43 5f 53 68 61 72 65 64 00 45 76 65 72 5f 47 |old_VMFS3.......|
0131002d 65 74 74 69 6e 67 5f 55 70 00 00 00 00 00 00 00 |................|
---------------------
/vmfs/devices/disks/naa.6006016045502500c20a2b3ccecfe011
gpt
52216 255 63 838860800
Checking offset found at 2048:
0200000 d00d c001
0200004
1400000 f15e 2fab
1400004
0140001d 4a 55 50 48 41 4d 5f 53 52 4d 35 00 00 00 00 00 |new_VMFS5.......|
0140002d 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
---------------------

The preceding output has two example storage devices. The first example was created on an ESXi host prior to version 5 and it reports:

Checking offset found at 128.

Where 128 is the beginning block.

The second storage device was created on vSphere 5 or later and reports:

Checking offset found at 2048.

Note: In this example, you are using the second device, so the beginning of the partition is 2048.
To get the end block for the partition, run this command:

# partedUtil getUsableSectors /vmfs/devices/disks/naa.6006016045502500c20a2b3ccecfe011

You see this output:

34 838860766

Notes:
- If you do not see this output and you get an Unknown partition table on disk error, run this command to label the table as a GPT partition table:
  
  # partedUtil mklabel /vmfs/devices/disks/naa.6006016045502500c20a2b3ccecfe011 gpt
  
  Rerun the partedUtil getUsableSectors command. If you do not get the expected output of 2 numbers, run the partition type identification commands in the next bullet also.
- If you do not see the specified output and receive an error message stating partition table invalid,unable to satisfy all constraints on the partition or a similar error, run this command:
  
  # partedUtil setptbl /vmfs/devices/disks/naa.6006016045502500c20a2b3ccecfe011 gpt "1 2048 4123456 AA31E02A400F11DB9590000C2911D1B8 0"
  
  This creates a temporary partition. You can now read the disk information. You should now see the correct output. You should now be able to calculate the correct last usable block.
  
  The partition type identifies the purpose of a partition, and may be represented by either a decimal identifier (for example, 251) or a GUID (for example, AA31E02A400F11DB9590000C2911D1B8). Partitions created on ESXi 5.x and higher with the gpt disklabel must be specified using the GUID.
Run this command to temporarily turn off Storage IO Control:

# /etc/init.d/storageRM stop
Run this command to set the correct values for the partition table:

Note: Ensure to use appropriate values in this command depending on your environment.

# partedUtil setptbl /vmfs/devices/disks/naa.6006016045502500c20a2b3ccecfe011 gpt "1 2048 838860766 AA31E02A400F11DB9590000C2911D1B8 0"

The number in Red indicates the last usable block, so the end of the partition cannot be any higher. It is unknown whether this was the number used when the datastore was created, so you can try it and adjust if necessary.
Run this command to attempt to mount the VMFS datastore:

# vmkfstools -V

Note: If the datastore mounts, the numbers are correct and you need not adjust the value.
If the datastore does not mount, you may see a message in /var/log/vmkernel.log similar to:

... cpu0:44828)LVM: 2891: [naa.6006016045502500c20a2b3ccecfe011:1] Device expanded (actual size 838858719 blocks, stored size 838847992 blocks)

In this case, add the offset value, minus one, to the stored size to get the actual end block.

For example:

838847992 + 2047 = 838850039

Run the command with the new end value:

# partedUtil setptbl /vmfs/devices/disks/naa.6006016045502500c20a2b3ccecfe011 gpt "1 2048 838850039 AA31E02A400F11DB9590000C2911D1B8 0"

Now you have the correct partition. Run the VMFS rescan again:

# vmkfstools -V
Run this command to temporarily turn off Storage IO Control:

# /etc/init.d/storageRM start

After the datastore is successfully mounted on one host, you can expect that the same VMFS rescan command will mount the VMFS datastore when run on other hosts that have access to this LUN.

Alternatively, you can run a full cluster rescan from the vCenter Server using the vSphere Client.

Related Information

In VMware Sphere 5.x and later, newly-created VMFS datastores use GPT partition tables instead of MBR partition tables.

The benefit of using GPT partition tables is thatmore than one copy of the partition table is kept on the LUN. If a physical Windows host has access to the LUN on the SAN, it, by default, automatically assigns a drive letter to the LUN, which destroys an MBR partition table. This type of problem does not occur with GPT, since vSphere uses the backup partition table.

To avoid lengthy delays, please add in addtional script for single device interrogation:

disk="/vmfs/devices/disks/naa.....xxxxx"; offset="128 2048"; echo $disk; partedUtil getptbl $disk; { for i in `echo $offset`; do echo "Checking offset found at $i:"; hexdump -n4 -s $((0x100000+(512*$i))) $disk; hexdump -n4 -s $((0x1300000+(512*$i))) $disk; hexdump -C -n 128 -s $((0x130001d + (512*$i))) $disk; done; } | grep -B 1 -A 5 d00d; echo "---------------------"

Integration with vCloud Director failing after NSXT upgrade to 4.1.2.0 certificate expired

Issue Clarification: after upgrade from 3.1.3 to 4.1.2.0 observed certificate to be expired related to various internal services. Issue Verification: after Upgrade from 3.1.3 to 4.1.2.0 observed certificate to be expired related to various internal services. Root Cause Identification: >>we confirmed the issue to be related to the below KB NSX alarms indicating certificates have expired or are expiring (94898) Root Cause Justification: There are two main factors that can contribute to this behaviour: NSX Managers have many certificates for internal services. In version NSX 3.2.1, Cluster Boot Manager (CBM) service certificates were incorrectly given a validity period of 825 days instead of 100 years. This was corrected to 100 years in NSX 3.2.3. However any environment originally installed on NSX 3.2.1 will have the internal CBM Corfu certs expire after 825 regardless of upgrade to the fixed version or not. On NSX-T 3.2.x interna...

Welcome to Mohamed Fouad blog

Search This Blog

Recreating a missing VMFS datastore partition in VMware vSphere 5.x and 6.x

Comments

Post a Comment

Popular posts from this blog

Integration with vCloud Director failing after NSXT upgrade to 4.1.2.0 certificate expired

Calculate how much data can be transferred in 24 hours based on link speed in data center

Device expanded/shrank messages are reported in the VMkernel log for VMFS-5