记一次逻辑卷磁盘故障导致逻辑卷不可用的问题
生产中一块磁盘故障,由于没有在系统中先通过--removemissing 操作,直接热插拔,做了raid
系统无法重启,进入救援模式,注释掉受影响的磁盘挂载点,重启进入系统之后,做了如下操作
# parted /dev/sdd mklabel gpt
首先是将新盘设置成gpt格式
# parted /dev/sdd mkpart primary 2048s 100%
然后是根据生产的环境划分一块主分区
# partprobe /dev/sdd
# partprobe /dev/sdd1
动态更新/dev/sdd的信息,在不重启服务器的情况下重读分区
# cat /etc/lvm/backup/vg1 # Generated by LVM2 version 2.02.171(2)-RHEL7 (2017-05-03): Tue Oct 30 13:58:05 2018 contents = "Text Format Volume Group" version = 1 description = "Created *after* executing 'lvcreate -l 100%FREE -n lv1 vg1'" creation_host = "msw8b0201" # Linux msw8b0201 3.10.0-693.el7.x86_64 #1 SMP Thu Jul 6 19:56:57 EDT 2017 x86_64 creation_time = 1540879085 # Tue Oct 30 13:58:05 2018 vg1 { id = "TxN32k-8J83-thCH-UwnD-uKkH-phn8-3rCaKs" seqno = 2 format = "lvm2" # informational status = ["RESIZEABLE", "READ", "WRITE"] flags = [] extent_size = 8192 # 4 Megabytes max_lv = 0 max_pv = 0 metadata_copies = 0 physical_volumes { pv0 { id = "UnQm8F-10fW-erxa-O4Lv-I0RA-obiI-XiZKC2" device = "/dev/sdb1" # Hint only status = ["ALLOCATABLE"] flags = [] dev_size = 1873041408 # 893.136 Gigabytes pe_start = 2048 pe_count = 228642 # 893.133 Gigabytes } pv1 { id = "brRjwc-g3lm-esHD-PA6s-oDPa-fzfN-n2vN4D" device = "/dev/sdc1" # Hint only status = ["ALLOCATABLE"] flags = [] dev_size = 1873041408 # 893.136 Gigabytes pe_start = 2048 pe_count = 228642 # 893.133 Gigabytes } pv2 { id = "d3vWoV-zMwG-U6bg-X1Yz-T929-jVFd-2QTifo" device = "/dev/sdd1" # Hint only status = ["ALLOCATABLE"] flags = [] dev_size = 1873041408 # 893.136 Gigabytes pe_start = 2048 pe_count = 228642 # 893.133 Gigabytes } pv3 { id = "84rd7v-Bblg-Xl3Y-4HD1-YvXS-Rd8L-100T90" device = "/dev/sde1" # Hint only status = ["ALLOCATABLE"] flags = [] dev_size = 1873041408 # 893.136 Gigabytes pe_start = 2048 pe_count = 228642 # 893.133 Gigabytes } } logical_volumes { lv1 { id = "7m4Q3c-Gysj-MpxU-NeDt-76iI-sqiW-jF2qBr" status = ["READ", "WRITE", "VISIBLE"] flags = [] creation_time = 1540879085 # 2018-10-30 13:58:05 +0800 creation_host = "msw8b0201" segment_count = 4 segment1 { start_extent = 0 extent_count = 228642 # 893.133 Gigabytes type = "striped" stripe_count = 1 # linear stripes = [ "pv0", 0 ] } segment2 { start_extent = 228642 extent_count = 228642 # 893.133 Gigabytes type = "striped" stripe_count = 1 # linear stripes = [ "pv1", 0 ] } segment3 { start_extent = 457284 extent_count = 228642 # 893.133 Gigabytes type = "striped" stripe_count = 1 # linear stripes = [ "pv2", 0 ] } segment4 { start_extent = 685926 extent_count = 228642 # 893.133 Gigabytes type = "striped" stripe_count = 1 # linear stripes = [ "pv3", 0 ] } } } }
这一步是为了记住逻辑卷的各项信息,比如uuid,原始系统的状态,预防新建vg lv覆盖之前的信息
# parted /dev/sdd GNU Parted 3.1 Using /dev/sdd Welcome to GNU Parted! Type 'help' to view a list of commands. (parted) mklabel gpt Warning: The existing disk label on /dev/sdd will be destroyed and all data on this disk will be lost. Do you want to continue? Yes/No? y (parted) mkpart primary 2048s 100% (parted) print Model: AVAGO HW-SAS3508 (scsi) Disk /dev/sdd: 959GB Sector size (logical/physical): 512B/4096B Partition Table: gpt Disk Flags: Number Start End Size File system Name Flags 1 1049kB 959GB 959GB ext4 primary (parted) rm align-check help mktable quit select unit disk_set mklabel name rescue set version disk_toggle mkpart print rm toggle (parted) rm Partition number? 1 (parted) print Model: AVAGO HW-SAS3508 (scsi) Disk /dev/sdd: 959GB Sector size (logical/physical): 512B/4096B Partition Table: gpt Disk Flags: Number Start End Size File system Name Flags (parted) mkpart primary 2048s 100% (parted) print Model: AVAGO HW-SAS3508 (scsi) Disk /dev/sdd: 959GB Sector size (logical/physical): 512B/4096B Partition Table: gpt Disk Flags: Number Start End Size File system Name Flags 1 1049kB 959GB 959GB ext4 primary (parted) quit Information: You may need to update /etc/fstab.
此时的/dev/sdd状态不对,正常的情况下,file system不应该显示ext4,我尝试删除分区重建,还是没有成功,后来决定直接强制创建pv,看看能不能成功,看接下来的一步
# pvcreate /dev/sdd1 --uuid 'd3vWoV-zMwG-U6bg-X1Yz-T929-jVFd- 2QTifo' --restorefile /etc/lvm/backup/vg1 Couldn't find device with uuid d3vWoV-zMwG-U6bg-X1Yz-T929-jVFd-2QTifo. WARNING: Device for PV d3vWoV-zMwG-U6bg-X1Yz-T929-jVFd-2QTifo not found or rejected by a filter. WARNING: ext4 signature detected on /dev/sdd1 at offset 1080. Wipe it? [y/n]: y Wiping ext4 signature on /dev/sdd1. 这里清除了上面的ext4,所以操作是成功了的 Physical volume "/dev/sdd1" successfully created.
# vgcfgrestore -f /etc/lvm/backup/vg1 vg1
Restored volume group vg1
根据系统配置文件重建vg1
# vgscan
Reading volume groups from cache.
Found volume group "sys" using metadata type lvm2
Found volume group "vg1" using metadata type lvm2
此时已经没有报错了
# vgscan
Reading volume groups from cache.
Found volume group "sys" using metadata type lvm2
WARNING: Device for PV d3vWoV-zMwG-U6bg-X1Yz-T929-jVFd-2QTifo not found or rejected by a filter.
Found volume group "vg1" using metadata type lvm2
这是之前的状态,有报错,现在报错已经消失,所以就可以接着进行了
# vgchange -ay vg1
1 logical volume(s) in volume group "vg1" now active
激活卷组vg1
# mkfs.xfs -m uuid=7m4Q3c-Gysj-MpxU-NeDt-76iI-sqiW-jF2qBr /dev/vg1/lv1 Illegal value uuid=7m4Q3c-Gysj-MpxU-NeDt-76iI-sqiW-jF2qBr for -m uuid option Usage: mkfs.xfs /* blocksize */ [-b log=n|size=num] /* metadata */ [-m crc=0|1,finobt=0|1,uuid=xxx] /* data subvol */ [-d agcount=n,agsize=n,file,name=xxx,size=num, (sunit=value,swidth=value|su=num,sw=num|noalign), sectlog=n|sectsize=num /* force overwrite */ [-f] /* inode size */ [-i log=n|perblock=n|size=num,maxpct=n,attr=0|1|2, projid32bit=0|1] /* no discard */ [-K] /* log subvol */ [-l agnum=n,internal,size=num,logdev=xxx,version=n sunit=value|su=num,sectlog=n|sectsize=num, lazy-count=0|1] /* label */ [-L label (maximum 12 characters)] /* naming */ [-n log=n|size=num,version=2|ci,ftype=0|1] /* no-op info only */ [-N] /* prototype file */ [-p fname] /* quiet */ [-q] /* realtime subvol */ [-r extsize=num,size=num,rtdev=xxx] /* sectorsize */ [-s log=n|size=num] /* version */ [-V] devicename <devicename> is required unless -d name=xxx is given. <num> is xxx (bytes), xxxs (sectors), xxxb (fs blocks), xxxk (xxx KiB), xxxm (xxx MiB), xxxg (xxx GiB), xxxt (xxx TiB) or xxxp (xxx PiB). <value> is xxx (512 byte blocks).
根据配置文件创建lv,创建完之后接着挂载
# mount /dev/vg1/lv1 /data/
mount: wrong fs type, bad option, bad superblock on /dev/mapper/vg1-lv1,
missing codepage or helper program, or other error
In some cases useful info is found in syslog - try
dmesg | tail or so.
挂载失败,应该是文件系统有故障,所以修复一下
# xfs_repair /dev/mapper/vg1-lv1 Phase 1 - find and verify superblock... - reporting progress in intervals of 15 minutes Phase 2 - using internal log - zero log... Log inconsistent or not a log (last==0, first!=1) empty log check failed zero_log: cannot find log head/tail (xlog_find_tail=22) fatal error -- ERROR: The log head and/or tail cannot be discovered. Attempt to mount the filesystem to replay the log or use the -L option to destroy the log and attempt a repair.
修复失败 加-L选项
# xfs_repair -L /dev/mapper/vg1-lv1 Phase 1 - find and verify superblock... - reporting progress in intervals of 15 minutes Phase 2 - using internal log - zero log... Log inconsistent or not a log (last==0, first!=1) empty log check failed zero_log: cannot find log head/tail (xlog_find_tail=22) - scan filesystem freespace and inode maps... bad magic number bad magic number bad magic number bad magic number bad magic number bad magic number bad magic number bad magic number Metadata CRC error detected at xfs_agf block 0x14eec9008/0x1000 Metadata CRC error detected at xfs_agf block 0x140f80a08/0x1000 Metadata CRC error detected at xfs_agf block 0x1250efe08/0x1000Metadata CRC error detected at xfs_agf block 0x133038408/0x1000 Metadata CRC error detected at xfs_agf block 0x1171a7808/0x1000 Metadata CRC error detected at xfs_agf block 0x10925f208/0x1000 Metadata CRC error detected at xfs_agf block 0xed3ce608/0x1000 Metadata CRC error detected at xfs_agf block 0xfb316c08/0x1000 Metadata CRC error detected at xfs_agi block 0x140f80a10/0x1000 Metadata CRC error detected at xfs_agi block 0x14eec9010/0x1000 Metadata CRC error detected at xfs_agi block 0x133038410/0x1000 Metadata CRC error detected at xfs_agi block 0x1250efe10/0x1000bad on-disk superblock 23 - bad magic number primary/secondary superblock 23 conflict - AG superblock geometry info conflicts with filesystem geometry 。
。
。
。
- 05:00:46: verify and correct link counts - 32 of 32 allocation groups done Metadata corruption detected at xfs_dir3_block block 0xdf49148/0x1000 libxfs_writebufr: write verifer failed on xfs_dir3_block bno 0xdf49148/0x1000 Metadata corruption detected at xfs_dir3_block block 0x15ce11860/0x1000 libxfs_writebufr: write verifer failed on xfs_dir3_block bno 0x15ce11860/0x1000 Metadata corruption detected at xfs_dir3_block block 0xc35f5dc0/0x1000 libxfs_writebufr: write verifer failed on xfs_dir3_block bno 0xc35f5dc0/0x1000 Metadata corruption detected at xfs_dir3_block block 0x29dd93f8/0x1000 libxfs_writebufr: write verifer failed on xfs_dir3_block bno 0x29dd93f8/0x1000 Maximum metadata LSN (8:59832) is ahead of log (1:8). Format log to cycle 11. r (bulk) to free list!done
重新检查文件系统
# xfs_repair /dev/mapper/vg1-lv1
Phase 1 - find and verify superblock...
- reporting progress in intervals of 15 minutes
Phase 2 - using internal log
- zero log...
- scan filesystem freespace and inode maps...
- 05:00:59: scanning filesystem freespace - 32 of 32 allocation groups done
- found root inode chunk
Phase 3 - for each AG...
- scan and clear agi unlinked lists...
- 05:00:59: scanning agi unlinked lists - 32 of 32 allocation groups done
- process known inodes and perform inode discovery...
- agno = 30
- agno = 0
- agno = 15
- agno = 31
- agno = 16
- agno = 17
- agno = 18
- agno = 19
- agno = 20
- agno = 1
- agno = 21
- agno = 22
- agno = 23
- agno = 24
- agno = 25
- agno = 26
- agno = 27
- agno = 28
- agno = 29
- agno = 2
- agno = 3
- agno = 4
- agno = 5
- agno = 6
- agno = 7
- agno = 8
- agno = 9
- agno = 10
- agno = 11
- agno = 12
- agno = 13
- agno = 14
- 05:00:59: process known inodes and inode discovery - 7488 of 7488 inodes done
- process newly discovered inodes...
- 05:00:59: process newly discovered inodes - 32 of 32 allocation groups done
Phase 4 - check for duplicate blocks...
- setting up duplicate extent list...
- 05:00:59: setting up duplicate extent list - 32 of 32 allocation groups done
- check for inodes claiming duplicate blocks...
- agno = 0
- agno = 1
- agno = 5
- agno = 6
- agno = 10
- agno = 12
- agno = 15
- agno = 17
- agno = 18
- agno = 23
- agno = 26
- agno = 29
- agno = 3
- agno = 14
- agno = 8
- agno = 16
- agno = 9
- agno = 19
- agno = 21
- agno = 20
- agno = 2
- agno = 22
- agno = 24
- agno = 25
- agno = 11
- agno = 30
- agno = 4
- agno = 27
- agno = 28
- agno = 31
- agno = 13
- agno = 7
- 05:00:59: check for inodes claiming duplicate blocks - 7488 of 7488 inodes done
Phase 5 - rebuild AG headers and trees...
- 05:00:59: rebuild AG headers and trees - 32 of 32 allocation groups done
- reset superblock...
Phase 6 - check inode connectivity...
- resetting contents of realtime bitmap and summary inodes
- traversing filesystem ...
- traversal finished ...
- moving disconnected inodes to lost+found ...
Phase 7 - verify and correct link counts...
- 05:00:59: verify and correct link counts - 32 of 32 allocation groups done
done
这次没有报错
# mount /dev/mapper/vg1-lv1 /data/
挂载也没报错,最后把/etc/fstab的注释掉的信息#号去掉,但是数据丢了,只能从备节点恢复数据,总结逻辑卷有磁盘故障的时候,应该先从系统中通过命令踢出去故障pv,然后换磁盘,重做pv,加入卷组,目前还没测试,下次有类似的故障,我会补上,看看按照正常的步骤会有什么现象
下次注意事项
1.首先备份逻辑卷的配置文件/etc/lvm/backup/vg1
做完之后,看看uuid怎么变化,配置文件怎么变化
2.下次可能用到的命令
vgreduce指令:从卷组中删除物理卷
《Linux指令范例速查手册》第11章磁盘管理,本章介绍的磁盘管理指令包括磁盘分区、磁盘引导和LVM逻辑卷管理等。本节为大家介绍vgreduce指令:从卷组中删除物理卷。
- 作者:黄照鹤来源:清华大学出版社
11.28 vgreduce指令:从卷组中删除物理卷
【语 法】vgreduce [选项] [参数]
【功能介绍】vgreduce指令通过删除LVM卷组中的物理卷来减少卷组容量。
【选项说明】
选 项 |
功 能 |
-a |
如果命令行中没有指定要删除的 物理卷,则删除所有的空物理卷 |
--removemissing |
删除卷组中丢失的物理卷, 使卷组恢复正常状态 |
【参数说明】
参 数 |
功 能 |
卷组 |
指定要操作的卷组名称 |
物理卷列表 |
指定要删除的物理卷列表 |
【经验技巧】不能删除LVM卷组中剩余的***一个物理卷。
【示例362】输出物理卷。具体步骤如下:
使用vgreduce指令从卷组"vg2000"中移除物理卷"/dev/sdb2"。在命令行中输入下面的命令:
- [root@hn ~]# vgreduce vg2000 /dev/sdb2
- #将物理卷"/dev/sdb2"从卷组"vg2000"中删除
输出信息如下:
- Removed "/dev/sdb2" from volume group "vg2000"