记一次逻辑卷磁盘故障导致逻辑卷不可用的问题

生产中一块磁盘故障，由于没有在系统中先通过--removemissing 操作，直接热插拔，做了raid

系统无法重启，进入救援模式，注释掉受影响的磁盘挂载点，重启进入系统之后，做了如下操作

# parted /dev/sdd mklabel gpt

首先是将新盘设置成gpt格式
# parted /dev/sdd mkpart primary 2048s 100%

然后是根据生产的环境划分一块主分区

# partprobe /dev/sdd

# partprobe /dev/sdd1

动态更新/dev/sdd的信息，在不重启服务器的情况下重读分区

# cat /etc/lvm/backup/vg1 
# Generated by LVM2 version 2.02.171(2)-RHEL7 (2017-05-03): Tue Oct 30 13:58:05 2018

contents = "Text Format Volume Group"
version = 1

description = "Created *after* executing 'lvcreate -l 100%FREE -n lv1 vg1'"

creation_host = "msw8b0201"    # Linux msw8b0201 3.10.0-693.el7.x86_64 #1 SMP Thu Jul 6 19:56:57 EDT 2017 x86_64
creation_time = 1540879085    # Tue Oct 30 13:58:05 2018

vg1 {
    id = "TxN32k-8J83-thCH-UwnD-uKkH-phn8-3rCaKs"
    seqno = 2
    format = "lvm2"            # informational
    status = ["RESIZEABLE", "READ", "WRITE"]
    flags = []
    extent_size = 8192        # 4 Megabytes
    max_lv = 0
    max_pv = 0
    metadata_copies = 0

    physical_volumes {

        pv0 {
            id = "UnQm8F-10fW-erxa-O4Lv-I0RA-obiI-XiZKC2"
            device = "/dev/sdb1"    # Hint only

            status = ["ALLOCATABLE"]
            flags = []
            dev_size = 1873041408    # 893.136 Gigabytes
            pe_start = 2048
            pe_count = 228642    # 893.133 Gigabytes
        }

        pv1 {
            id = "brRjwc-g3lm-esHD-PA6s-oDPa-fzfN-n2vN4D"
            device = "/dev/sdc1"    # Hint only

            status = ["ALLOCATABLE"]
            flags = []
            dev_size = 1873041408    # 893.136 Gigabytes
            pe_start = 2048
            pe_count = 228642    # 893.133 Gigabytes
        }

        pv2 {
            id = "d3vWoV-zMwG-U6bg-X1Yz-T929-jVFd-2QTifo"
            device = "/dev/sdd1"    # Hint only

            status = ["ALLOCATABLE"]
            flags = []
            dev_size = 1873041408    # 893.136 Gigabytes
            pe_start = 2048
            pe_count = 228642    # 893.133 Gigabytes
        }

        pv3 {
            id = "84rd7v-Bblg-Xl3Y-4HD1-YvXS-Rd8L-100T90"
            device = "/dev/sde1"    # Hint only

            status = ["ALLOCATABLE"]
            flags = []
            dev_size = 1873041408    # 893.136 Gigabytes
            pe_start = 2048
            pe_count = 228642    # 893.133 Gigabytes
        }
    }

    logical_volumes {

        lv1 {
            id = "7m4Q3c-Gysj-MpxU-NeDt-76iI-sqiW-jF2qBr"
            status = ["READ", "WRITE", "VISIBLE"]
            flags = []
            creation_time = 1540879085    # 2018-10-30 13:58:05 +0800
            creation_host = "msw8b0201"
            segment_count = 4

            segment1 {
                start_extent = 0
                extent_count = 228642    # 893.133 Gigabytes

                type = "striped"
                stripe_count = 1    # linear

                stripes = [
                    "pv0", 0
                ]
            }
            segment2 {
                start_extent = 228642
                extent_count = 228642    # 893.133 Gigabytes

                type = "striped"
                stripe_count = 1    # linear

                stripes = [
                    "pv1", 0
                ]
            }
            segment3 {
                start_extent = 457284
                extent_count = 228642    # 893.133 Gigabytes

                type = "striped"
                stripe_count = 1    # linear

                stripes = [
                    "pv2", 0
                ]
            }
            segment4 {
                start_extent = 685926
                extent_count = 228642    # 893.133 Gigabytes

                type = "striped"
                stripe_count = 1    # linear

                stripes = [
                    "pv3", 0
                ]
            }
        }
    }

}

这一步是为了记住逻辑卷的各项信息，比如uuid，原始系统的状态，预防新建vg lv覆盖之前的信息

# parted /dev/sdd
GNU Parted 3.1
Using /dev/sdd
Welcome to GNU Parted! Type 'help' to view a list of commands.
(parted) mklabel gpt                                                      
Warning: The existing disk label on /dev/sdd will be destroyed and all data on
this disk will be lost. Do you want to continue?
Yes/No? y                                                                 
(parted) mkpart primary 2048s 100%
(parted) print                                                            
Model: AVAGO HW-SAS3508 (scsi)
Disk /dev/sdd: 959GB
Sector size (logical/physical): 512B/4096B
Partition Table: gpt
Disk Flags: 

Number  Start   End    Size   File system  Name     Flags
 1      1049kB  959GB  959GB  ext4         primary

(parted) rm                                                               
align-check  help         mktable      quit         select       unit
disk_set     mklabel      name         rescue       set          version
disk_toggle  mkpart       print        rm           toggle       
(parted) rm 
Partition number? 1                                                       
(parted) print                                                            
Model: AVAGO HW-SAS3508 (scsi)
Disk /dev/sdd: 959GB
Sector size (logical/physical): 512B/4096B
Partition Table: gpt
Disk Flags: 

Number  Start  End  Size  File system  Name  Flags

(parted) mkpart primary 2048s 100%
(parted) print                                                            
Model: AVAGO HW-SAS3508 (scsi)
Disk /dev/sdd: 959GB
Sector size (logical/physical): 512B/4096B
Partition Table: gpt
Disk Flags: 

Number  Start   End    Size   File system  Name     Flags
 1      1049kB  959GB  959GB  ext4         primary

(parted) quit                                                             
Information: You may need to update /etc/fstab.

此时的/dev/sdd状态不对，正常的情况下，file system不应该显示ext4，我尝试删除分区重建，还是没有成功，后来决定直接强制创建pv，看看能不能成功，看接下来的一步

# pvcreate /dev/sdd1 --uuid 'd3vWoV-zMwG-U6bg-X1Yz-T929-jVFd-
2QTifo' --restorefile /etc/lvm/backup/vg1 
  Couldn't find device with uuid d3vWoV-zMwG-U6bg-X1Yz-T929-jVFd-2QTifo.
  WARNING: Device for PV d3vWoV-zMwG-U6bg-X1Yz-T929-jVFd-2QTifo not found or rejected by a filter.
WARNING: ext4 signature detected on /dev/sdd1 at offset 1080. Wipe it? [y/n]: y
  Wiping ext4 signature on /dev/sdd1. 　　　　　　这里清除了上面的ext4，所以操作是成功了的
  Physical volume "/dev/sdd1" successfully created.

# vgcfgrestore -f /etc/lvm/backup/vg1 vg1
Restored volume group vg1

根据系统配置文件重建vg1

# vgscan
Reading volume groups from cache.
Found volume group "sys" using metadata type lvm2
Found volume group "vg1" using metadata type lvm2

此时已经没有报错了

# vgscan
Reading volume groups from cache.
Found volume group "sys" using metadata type lvm2
WARNING: Device for PV d3vWoV-zMwG-U6bg-X1Yz-T929-jVFd-2QTifo not found or rejected by a filter.
Found volume group "vg1" using metadata type lvm2

这是之前的状态，有报错，现在报错已经消失，所以就可以接着进行了

# vgchange -ay vg1
1 logical volume(s) in volume group "vg1" now active

激活卷组vg1

# mkfs.xfs -m uuid=7m4Q3c-Gysj-MpxU-NeDt-76iI-sqiW-jF2qBr /dev/vg1/lv1 
Illegal value uuid=7m4Q3c-Gysj-MpxU-NeDt-76iI-sqiW-jF2qBr for -m uuid option
Usage: mkfs.xfs
/* blocksize */        [-b log=n|size=num]
/* metadata */        [-m crc=0|1,finobt=0|1,uuid=xxx]
/* data subvol */    [-d agcount=n,agsize=n,file,name=xxx,size=num,
                (sunit=value,swidth=value|su=num,sw=num|noalign),
                sectlog=n|sectsize=num
/* force overwrite */    [-f]
/* inode size */    [-i log=n|perblock=n|size=num,maxpct=n,attr=0|1|2,
                projid32bit=0|1]
/* no discard */    [-K]
/* log subvol */    [-l agnum=n,internal,size=num,logdev=xxx,version=n
                sunit=value|su=num,sectlog=n|sectsize=num,
                lazy-count=0|1]
/* label */        [-L label (maximum 12 characters)]
/* naming */        [-n log=n|size=num,version=2|ci,ftype=0|1]
/* no-op info only */    [-N]
/* prototype file */    [-p fname]
/* quiet */        [-q]
/* realtime subvol */    [-r extsize=num,size=num,rtdev=xxx]
/* sectorsize */    [-s log=n|size=num]
/* version */        [-V]
            devicename
<devicename> is required unless -d name=xxx is given.
<num> is xxx (bytes), xxxs (sectors), xxxb (fs blocks), xxxk (xxx KiB),
      xxxm (xxx MiB), xxxg (xxx GiB), xxxt (xxx TiB) or xxxp (xxx PiB).
<value> is xxx (512 byte blocks).

根据配置文件创建lv，创建完之后接着挂载

# mount /dev/vg1/lv1 /data/
mount: wrong fs type, bad option, bad superblock on /dev/mapper/vg1-lv1,
       missing codepage or helper program, or other error

       In some cases useful info is found in syslog - try
       dmesg | tail or so.

挂载失败，应该是文件系统有故障，所以修复一下

# xfs_repair /dev/mapper/vg1-lv1 
Phase 1 - find and verify superblock...
        - reporting progress in intervals of 15 minutes
Phase 2 - using internal log
        - zero log...
Log inconsistent or not a log (last==0, first!=1)
empty log check failed
zero_log: cannot find log head/tail (xlog_find_tail=22)

fatal error -- ERROR: The log head and/or tail cannot be discovered. Attempt to mount the
filesystem to replay the log or use the -L option to destroy the log and
attempt a repair.

修复失败加-L选项

# xfs_repair -L /dev/mapper/vg1-lv1 
Phase 1 - find and verify superblock...
        - reporting progress in intervals of 15 minutes
Phase 2 - using internal log
        - zero log...
Log inconsistent or not a log (last==0, first!=1)
empty log check failed
zero_log: cannot find log head/tail (xlog_find_tail=22)
        - scan filesystem freespace and inode maps...
bad magic number
bad magic number
bad magic number
bad magic number
bad magic number
bad magic number
bad magic number
bad magic number
Metadata CRC error detected at xfs_agf block 0x14eec9008/0x1000
Metadata CRC error detected at xfs_agf block 0x140f80a08/0x1000
Metadata CRC error detected at xfs_agf block 0x1250efe08/0x1000Metadata CRC error detected at xfs_agf block 0x133038408/0x1000

Metadata CRC error detected at xfs_agf block 0x1171a7808/0x1000
Metadata CRC error detected at xfs_agf block 0x10925f208/0x1000
Metadata CRC error detected at xfs_agf block 0xed3ce608/0x1000
Metadata CRC error detected at xfs_agf block 0xfb316c08/0x1000
Metadata CRC error detected at xfs_agi block 0x140f80a10/0x1000
Metadata CRC error detected at xfs_agi block 0x14eec9010/0x1000
Metadata CRC error detected at xfs_agi block 0x133038410/0x1000
Metadata CRC error detected at xfs_agi block 0x1250efe10/0x1000bad on-disk superblock 23 - bad magic number
primary/secondary superblock 23 conflict - AG superblock geometry info conflicts with filesystem geometry

。
。
。
。

        - 05:00:46: verify and correct link counts - 32 of 32 allocation groups done
Metadata corruption detected at xfs_dir3_block block 0xdf49148/0x1000
libxfs_writebufr: write verifer failed on xfs_dir3_block bno 0xdf49148/0x1000
Metadata corruption detected at xfs_dir3_block block 0x15ce11860/0x1000
libxfs_writebufr: write verifer failed on xfs_dir3_block bno 0x15ce11860/0x1000
Metadata corruption detected at xfs_dir3_block block 0xc35f5dc0/0x1000
libxfs_writebufr: write verifer failed on xfs_dir3_block bno 0xc35f5dc0/0x1000
Metadata corruption detected at xfs_dir3_block block 0x29dd93f8/0x1000
libxfs_writebufr: write verifer failed on xfs_dir3_block bno 0x29dd93f8/0x1000
Maximum metadata LSN (8:59832) is ahead of log (1:8).
Format log to cycle 11.
r (bulk) to free list!done

重新检查文件系统

# xfs_repair /dev/mapper/vg1-lv1
Phase 1 - find and verify superblock...
        - reporting progress in intervals of 15 minutes
Phase 2 - using internal log
        - zero log...
        - scan filesystem freespace and inode maps...
        - 05:00:59: scanning filesystem freespace - 32 of 32 allocation groups done
        - found root inode chunk
Phase 3 - for each AG...
        - scan and clear agi unlinked lists...
        - 05:00:59: scanning agi unlinked lists - 32 of 32 allocation groups done
        - process known inodes and perform inode discovery...
        - agno = 30
        - agno = 0
        - agno = 15
        - agno = 31
        - agno = 16
        - agno = 17
        - agno = 18
        - agno = 19
        - agno = 20
        - agno = 1
        - agno = 21
        - agno = 22
        - agno = 23
        - agno = 24
        - agno = 25
        - agno = 26
        - agno = 27
        - agno = 28
        - agno = 29
        - agno = 2
        - agno = 3
        - agno = 4
        - agno = 5
        - agno = 6
        - agno = 7
        - agno = 8
        - agno = 9
        - agno = 10
        - agno = 11
        - agno = 12
        - agno = 13
        - agno = 14
        - 05:00:59: process known inodes and inode discovery - 7488 of 7488 inodes done
        - process newly discovered inodes...
        - 05:00:59: process newly discovered inodes - 32 of 32 allocation groups done
Phase 4 - check for duplicate blocks...
        - setting up duplicate extent list...
        - 05:00:59: setting up duplicate extent list - 32 of 32 allocation groups done
        - check for inodes claiming duplicate blocks...
        - agno = 0
        - agno = 1
        - agno = 5
        - agno = 6
        - agno = 10
        - agno = 12
        - agno = 15
        - agno = 17
        - agno = 18
        - agno = 23
        - agno = 26
        - agno = 29
        - agno = 3
        - agno = 14
        - agno = 8
        - agno = 16
        - agno = 9
        - agno = 19
        - agno = 21
        - agno = 20
        - agno = 2
        - agno = 22
        - agno = 24
        - agno = 25
        - agno = 11
        - agno = 30
        - agno = 4
        - agno = 27
        - agno = 28
        - agno = 31
        - agno = 13
        - agno = 7
        - 05:00:59: check for inodes claiming duplicate blocks - 7488 of 7488 inodes done
Phase 5 - rebuild AG headers and trees...
        - 05:00:59: rebuild AG headers and trees - 32 of 32 allocation groups done
        - reset superblock...
Phase 6 - check inode connectivity...
        - resetting contents of realtime bitmap and summary inodes
        - traversing filesystem ...
        - traversal finished ...
        - moving disconnected inodes to lost+found ...
Phase 7 - verify and correct link counts...
        - 05:00:59: verify and correct link counts - 32 of 32 allocation groups done
done

这次没有报错

# mount /dev/mapper/vg1-lv1 /data/

挂载也没报错，最后把/etc/fstab的注释掉的信息#号去掉，但是数据丢了，只能从备节点恢复数据，总结逻辑卷有磁盘故障的时候，应该先从系统中通过命令踢出去故障pv，然后换磁盘，重做pv，加入卷组，目前还没测试，下次有类似的故障，我会补上，看看按照正常的步骤会有什么现象

下次注意事项

1.首先备份逻辑卷的配置文件/etc/lvm/backup/vg1

做完之后，看看uuid怎么变化，配置文件怎么变化

2.下次可能用到的命令

vgreduce指令：从卷组中删除物理卷

《Linux指令范例速查手册》第11章磁盘管理，本章介绍的磁盘管理指令包括磁盘分区、磁盘引导和LVM逻辑卷管理等。本节为大家介绍vgreduce指令：从卷组中删除物理卷。

作者：黄照鹤来源：清华大学出版社

11.28 vgreduce指令：从卷组中删除物理卷

【语法】vgreduce [选项] [参数]

【功能介绍】vgreduce指令通过删除LVM卷组中的物理卷来减少卷组容量。

【选项说明】

选项

功能

-a

如果命令行中没有指定要删除的

物理卷，则删除所有的空物理卷

--removemissing

删除卷组中丢失的物理卷，

使卷组恢复正常状态

【参数说明】

参数	功能
卷组	指定要操作的卷组名称
物理卷列表	指定要删除的物理卷列表

【经验技巧】不能删除LVM卷组中剩余的***一个物理卷。

【示例362】输出物理卷。具体步骤如下：

使用vgreduce指令从卷组"vg2000"中移除物理卷"/dev/sdb2"。在命令行中输入下面的命令：

[root@hn ~]# vgreduce vg2000 /dev/sdb2
#将物理卷"/dev/sdb2"从卷组"vg2000"中删除

输出信息如下：

Removed "/dev/sdb2" from volume group "vg2000"

posted @ 2020-07-06 16:19 augusite 阅读(2781) 评论(0) 编辑收藏举报

刷新页面返回顶部

augusite

知识，本该纯粹真实。不应为乱象而迷失，因避讳而藏掖。

记一次逻辑卷磁盘故障导致逻辑卷不可用的问题

vgreduce指令：从卷组中删除物理卷

公告

augusite

知识，本该纯粹真实。 不应为乱象而迷失，因避讳而藏掖。

记一次逻辑卷磁盘故障导致逻辑卷不可用的问题

vgreduce指令：从卷组中删除物理卷

公告

知识，本该纯粹真实。不应为乱象而迷失，因避讳而藏掖。