记录一次存储磁盘故障处理
今天遇到个奇怪问题,在VCSA上有一个存储映射的LUN无法创建新的虚拟机,但原本在上面运行的虚拟机未见故障,经过不断的对比发现有问题的LUN在存储设备中的分区详细信息中分区格式显示为未知且容量不显示,而正常的LUN应为GPT,故而怀疑是分区表出现丢失,所幸问题LUN上的虚拟机还能往外迁移,所以在迁移所有虚拟机后开始操作问题LUN,先尝试删除问题LUN后重新分区(这是最简单也最方便的做法),但在删除时系统报错
只能再想办法,SSH随便登陆集群中的一个主机,用命令 esxcfg-scsidevs -m 查一下有问题的LUN所对应的一些信息
查到问题LUN HF40-VMWARE-3所对应的GUID之后就可以使用命令来尝试恢复分区表,结果又出新情况了
[root@node1:/dev] partedUtil getptbl "/vmfs/devices/disks/eui.6d2a97cfbe5501076c9ce900f04beca5" Error: Function not implemented during read on /dev/disks/eui.6d2a97cfbe5501076c9ce900f04beca5 Error: The primary GPT table states that the backup GPT is located beyond the end of disk. This may happen if the disk has shrunk or partition table is corrupted. Fix, by writing backup table at the end? This will also f ix the last usable sector appropriately as per the new reduced size. diskPath (/dev/disks/eui.6d2a97cfbe550107 6c9ce900f04beca5) diskSize (216895848448) AlternateLBA (227633266687) LastUsableLBA (227633266654) Warning: The available space to /dev/disks/eui.6d2a97cfbe5501076c9ce900f04beca5 appears to have shrunk. This m ay happen if the disk size has reduced. The space has been reduced by (10737418240 blocks). You can fix the GP T to correct the available space or continue with the current settings ? This will also move the backup table at the end if it is not at the end already. diskSize (216895848448) AlternateLBA (227633266687) LastUsableLBA (227633266654) NewLastUsableLBA (216895848414) Error: Can't have a partition outside the disk! Unable to read partition table for device /vmfs/devices/disks/eui.6d2a97cfbe5501076c9ce900f04beca5 [root@node1:/dev] partedUtil fixGpt "/vmfs/devices/disks/eui.6d2a97cfbe5501076c9ce900f04beca5" FixGpt tries to fix any problems detected in GPT table. Please ensure that you don't run this on any RDM (Raw Device Mapping) disk. Are you sure you want to continue (Y/N): y Error: Function not implemented during read on /dev/disks/eui.6d2a97cfbe5501076c9ce900f04beca5 Retry/Ignore/Cancel? ignore Error: The primary GPT table states that the backup GPT is located beyond the end of disk. This may happen if the disk has shrunk or partition table is corrupted. Fix, by writing backup table at the end? This will also f ix the last usable sector appropriately as per the new reduced size. diskPath (/dev/disks/eui.6d2a97cfbe550107 6c9ce900f04beca5) diskSize (216895848448) AlternateLBA (227633266687) LastUsableLBA (227633266654) Fix/Ignore/Cancel? fix Error: Can't have a partition outside the disk! Unable to read partition table on device /vmfs/devices/disks/eui.6d2a97cfbe5501076c9ce900f04beca5
根据提示,我又对比了一下存储上的LUN大小和VCSA上看到的确实有区别,存储上只分配了101T空间,但是在VCSA上的分区却又106T,这个不知道是BUG还是什么,不是分区变小了,而是变大了,网上找的修复方法也不能用,只能另辟蹊径了。
使用partedUtil命令把原来的问题LUN先改成msdos的格式,再改回GPT格式,居然让存储设备变成了未消耗的状态,这样就可以重新分区格式化了
[root@node1:/dev] partedUtil setptbl /vmfs/devices/disks/eui.6d2a97cfbe5501076c9ce900f04beca5 msdos [root@node1:/dev] partedUtil setptbl /vmfs/devices/disks/eui.6d2a97cfbe5501076c9ce900f04beca5 gpt
至此问题解决,但是故障的原因却不清楚