联系:手机(+86 13429648788) QQ(107644445)
标题:ORA-15042: ASM disk “N” is missing from group number “M” 故障恢复
作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]
接到一个朋友恢复请求,19个lun的asm 磁盘组,由于其中一个lun有问题,他们进行了增加一个新lun,删除老lun的方法操作,但是操作一半hang住了(因为坏的lun是底层损坏,无法完成rebalance),然后存储工程师继续修复异常lun,非常幸运异常lun修复好了,但是高兴过了头,直接从存储上删除了新加入的lun(已经rebalance一部分数据进去了),这个时候asm dg彻底趴下了,不能mount成功,请求恢复支持。由于某种原因,无法从lun层面恢复,只能让我们提供数据库层面恢复
Mon Sep 21 19:52:35 2015 SQL> alter diskgroup dg_XFF add disk '/dev/rhdisk116' size 716800M drop disk dg_XFF_0012 NOTE: Assigning number (1,20) to disk ( /dev/rhdisk116 ) NOTE: requesting all-instance membership refresh for group=1 NOTE: initializing header on grp 1 disk DG_XFF_0020 NOTE: requesting all-instance disk validation for group=1 Mon Sep 21 19:52:44 2015 NOTE: skipping rediscovery for group 1 /0xb94738f1 (DG_XFF) on local instance. NOTE: requesting all-instance disk validation for group=1 NOTE: skipping rediscovery for group 1 /0xb94738f1 (DG_XFF) on local instance. NOTE: initiating PST update: grp = 1 Mon Sep 21 19:52:44 2015 GMON updating group 1 at 25 for pid 27, osid 12124486 NOTE: PST update grp = 1 completed successfully NOTE: membership refresh pending for group 1 /0xb94738f1 (DG_XFF) GMON querying group 1 at 26 for pid 18, osid 10092734 NOTE: cache opening disk 20 of grp 1: DG_XFF_0020 path: /dev/rhdisk116 GMON querying group 1 at 27 for pid 18, osid 10092734 SUCCESS: refreshed membership for 1 /0xb94738f1 (DG_XFF) Mon Sep 21 19:52:47 2015 SUCCESS: alter diskgroup dg_XFF add disk '/dev/rhdisk116' size 716800M drop disk dg_XFF_0012 NOTE: starting rebalance of group 1 /0xb94738f1 (DG_XFF) at power 1 Starting background process ARB0 Mon Sep 21 19:52:47 2015 ARB0 started with pid=28, OS id =10944804 NOTE: assigning ARB0 to group 1 /0xb94738f1 (DG_XFF) with 1 parallel I /O NOTE: Attempting voting file refresh on diskgroup DG_XFF Mon Sep 21 20:35:06 2015 |
SQL> ALTER DISKGROUP DG_XFF MOUNT /* asm agent * // * {1:51107:7083} */ NOTE: cache registered group DG_XFF number=1 incarn=0xdd6f975a NOTE: cache began mount (first) of group DG_XFF number=1 incarn=0xdd6f975a NOTE: Assigning number (1,0) to disk ( /dev/rhdisk10 ) NOTE: Assigning number (1,1) to disk ( /dev/rhdisk11 ) NOTE: Assigning number (1,2) to disk ( /dev/rhdisk16 ) NOTE: Assigning number (1,3) to disk ( /dev/rhdisk17 ) NOTE: Assigning number (1,4) to disk ( /dev/rhdisk22 ) NOTE: Assigning number (1,5) to disk ( /dev/rhdisk23 ) NOTE: Assigning number (1,6) to disk ( /dev/rhdisk28 ) NOTE: Assigning number (1,7) to disk ( /dev/rhdisk29 ) NOTE: Assigning number (1,8) to disk ( /dev/rhdisk33 ) NOTE: Assigning number (1,9) to disk ( /dev/rhdisk34 ) NOTE: Assigning number (1,10) to disk ( /dev/rhdisk4 ) NOTE: Assigning number (1,11) to disk ( /dev/rhdisk40 ) NOTE: Assigning number (1,12) to disk ( /dev/rhdisk41 ) NOTE: Assigning number (1,13) to disk ( /dev/rhdisk45 ) NOTE: Assigning number (1,14) to disk ( /dev/rhdisk46 ) NOTE: Assigning number (1,15) to disk ( /dev/rhdisk5 ) NOTE: Assigning number (1,16) to disk ( /dev/rhdisk52 ) NOTE: Assigning number (1,17) to disk ( /dev/rhdisk53 ) NOTE: Assigning number (1,18) to disk ( /dev/rhdisk57 ) NOTE: Assigning number (1,19) to disk ( /dev/rhdisk58 ) Wed Sep 30 11:08:07 2015 NOTE: start heartbeating (grp 1) GMON querying group 1 at 33 for pid 35, osid 4194488 NOTE: Assigning number (1,20) to disk () GMON querying group 1 at 34 for pid 35, osid 4194488 NOTE: cache dismounting (clean) group 1 /0xDD6F975A (DG_XFF) NOTE: dbwr not being msg'd to dismount NOTE: lgwr not being msg'd to dismount NOTE: cache dismounted group 1 /0xDD6F975A (DG_XFF) NOTE: cache ending mount (fail) of group DG_XFF number=1 incarn=0xdd6f975a NOTE: cache deleting context for group DG_XFF 1 /0xdd6f975a GMON dismounting group 1 at 35 for pid 35, osid 4194488 NOTE: Disk in mode 0x8 marked for de-assignment NOTE: Disk in mode 0x8 marked for de-assignment NOTE: Disk in mode 0x8 marked for de-assignment NOTE: Disk in mode 0x8 marked for de-assignment NOTE: Disk in mode 0x8 marked for de-assignment NOTE: Disk in mode 0x8 marked for de-assignment NOTE: Disk in mode 0x8 marked for de-assignment NOTE: Disk in mode 0x8 marked for de-assignment NOTE: Disk in mode 0x8 marked for de-assignment NOTE: Disk in mode 0x8 marked for de-assignment NOTE: Disk in mode 0x8 marked for de-assignment NOTE: Disk in mode 0x8 marked for de-assignment NOTE: Disk in mode 0x8 marked for de-assignment NOTE: Disk in mode 0x8 marked for de-assignment NOTE: Disk in mode 0x8 marked for de-assignment NOTE: Disk in mode 0x8 marked for de-assignment NOTE: Disk in mode 0x8 marked for de-assignment NOTE: Disk in mode 0x8 marked for de-assignment NOTE: Disk in mode 0x8 marked for de-assignment NOTE: Disk in mode 0x8 marked for de-assignment NOTE: Disk in mode 0x8 marked for de-assignment ERROR: diskgroup DG_XFF was not mounted ORA-15032: not all alterations performed ORA-15040: diskgroup is incomplete ORA-15042: ASM disk "20" is missing from group number "1" ERROR: ALTER DISKGROUP DG_XFF MOUNT /* asm agent * // * {1:51107:7083} */ |
这里比较明显,由于存储工程师直接删除了lun,这里导致磁盘组DG_XFF丢失asm disk 20,使得磁盘组无法直接mount,由于该磁盘组已经进行了较长时间的rebalance,丢失的盘中已经有大量数据(包括元数据),因此就算修改pst让磁盘组mount起来(不一定成功),也会丢失大量数据,也不一定可以直接拿出来里面的数据,如果只是加入盘,但是由于某种原因没有做rebalance,那我们直接可以通过修改pst,使得磁盘组mount起来。因此对于这样的情况,我们能够做的,只能从底层扫描磁盘,生成数据文件(因为有部分文件的元数据在丢失lun之上,如果直接使用现存元数据信息,直接拷贝,或者unload数据都会丢失大量数据),然后再进一步unload数据,完成恢复。需要恢复磁盘信息
grp # dsk# bsize ausize disksize diskname groupname path ---- ---- ----- ------ -------- --------------- --------------- ------------- 1 0 4096 4096K 179200 DG_XFF_0000 DG_XFF /dev/rhdisk10 1 1 4096 4096K 179200 DG_XFF_0001 DG_XFF /dev/rhdisk11 1 2 4096 4096K 179200 DG_XFF_0002 DG_XFF /dev/rhdisk16 1 3 4096 4096K 179200 DG_XFF_0003 DG_XFF /dev/rhdisk17 1 4 4096 4096K 179200 DG_XFF_0004 DG_XFF /dev/rhdisk22 1 5 4096 4096K 179200 DG_XFF_0005 DG_XFF /dev/rhdisk23 1 6 4096 4096K 179200 DG_XFF_0006 DG_XFF /dev/rhdisk28 1 7 4096 4096K 179200 DG_XFF_0007 DG_XFF /dev/rhdisk29 1 8 4096 4096K 179200 DG_XFF_0008 DG_XFF /dev/rhdisk33 1 9 4096 4096K 179200 DG_XFF_0009 DG_XFF /dev/rhdisk34 1 10 4096 4096K 179200 DG_XFF_0010 DG_XFF /dev/rhdisk4 1 11 4096 4096K 179200 DG_XFF_0011 DG_XFF /dev/rhdisk40 1 12 4096 4096K 179200 DG_XFF_0012 DG_XFF /dev/rhdisk41 1 13 4096 4096K 179200 DG_XFF_0013 DG_XFF /dev/rhdisk45 1 14 4096 4096K 179200 DG_XFF_0014 DG_XFF /dev/rhdisk46 1 15 4096 4096K 179200 DG_XFF_0015 DG_XFF /dev/rhdisk5 1 16 4096 4096K 179200 DG_XFF_0016 DG_XFF /dev/rhdisk52 1 17 4096 4096K 179200 DG_XFF_0017 DG_XFF /dev/rhdisk53 1 18 4096 4096K 179200 DG_XFF_0018 DG_XFF /dev/rhdisk57 1 19 4096 4096K 179200 DG_XFF_0019 DG_XFF /dev/rhdisk58 |
这次运气比较好,丢失的磁盘组只是一个业务磁盘组,而且里面只有19个表空间,10个分区表,因此在数据字典完成的情况下,恢复10个分区表(一共6443个分区)的数据,整体恢复效果如下:
从整体数据量看恢复比例为:6003.26953/6027.26935*100%=99.6018127%
【推荐】国内首个AI IDE,深度理解中文开发场景,立即下载体验Trae
【推荐】编程新体验,更懂你的AI,立即体验豆包MarsCode编程助手
【推荐】抖音旗下AI助手豆包,你的智能百科全书,全免费不限次数
【推荐】轻量又高性能的 SSH 工具 IShell:AI 加持,快人一步
· AI与.NET技术实操系列(二):开始使用ML.NET
· 记一次.NET内存居高不下排查解决与启示
· 探究高空视频全景AR技术的实现原理
· 理解Rust引用及其生命周期标识(上)
· 浏览器原生「磁吸」效果!Anchor Positioning 锚点定位神器解析
· DeepSeek 开源周回顾「GitHub 热点速览」
· 物流快递公司核心技术能力-地址解析分单基础技术分享
· .NET 10首个预览版发布:重大改进与新特性概览!
· AI与.NET技术实操系列(二):开始使用ML.NET
· 单线程的Redis速度为什么快?