恢复损坏的表决磁盘(votedisk)和OCR
转至:https://blog.csdn.net/share120/article/details/52312321
如何恢复损坏的表决磁盘(votedisk)和OCR
前记:
OCR:管理集群节点的相关信息及实例到节点的映射信息。 votedisk:解决分区集群中的集群成员资格问题。
ASM磁盘组冗余的三种类型:external、normal、high,这里恢复的是normal状态,模拟OCR磁盘或votedisk不可用时,
一般OCR磁盘还有ASMPARAMETERFILE的ASM参数文件,操作前需要先备份 create pfile='/home/oracle/11' from spfile
冗余:
Voting 磁盘(不要使用偶数个):
External 需要最少1个 Voting 磁盘(或者1个 failure group)
Normal 需要最少3个 Voting 磁盘(或者3个 failure group)
High 需要最少5个 Voting 磁盘(或者5个 failure group)
缺少 failure group 会引起 voting disk 创建失败。例如 ORA-15274: Not enough failgroups (3) to create voting files
OCR:
10.2 和 11.1,最多2个 OCR 设备:OCR 和 OCRMIRROR
11.2+,最多5个 OCR。
步骤:
一、orc运行信息
1.查OCR有哪些备份:
[root@node1 ~]# /u01/app/11.2.0/grid/bin/ocrconfig -showbackup node1 2014/12/10 13:08:14 /u01/app/11.2.0/grid/cdata/lxxhscan/backup00.ocr node1 2014/12/10 09:08:13 /u01/app/11.2.0/grid/cdata/lxxhscan/backup01.ocr node1 2014/12/10 05:08:13 /u01/app/11.2.0/grid/cdata/lxxhscan/backup02.ocr node1 2014/12/09 09:08:07 /u01/app/11.2.0/grid/cdata/lxxhscan/day.ocr node2 2014/12/02 07:20:17 /u01/app/11.2.0/grid/cdata/lxxhscan/week.ocr PROT-25: Manual backups for the Oracle Cluster Registry are not available [root@node1 ~]#
2.查看表决盘信息:
[root@node1 ~]# /u01/app/11.2.0/grid/bin/crsctl query css votedisk ## STATE File Universal Id File Name Disk group -- ----- ----------------- --------- --------- 1. ONLINE 4c0a1622b1a04f48bf1425fadfbcd461 (/dev/mapper/ocr1p1) [OCR_VOTE] 2. ONLINE 627259113bb24f88bf15ed81694c8f17 (/dev/mapper/ocr2p1) [OCR_VOTE] 3. ONLINE 1086d74f5f5f4fb5bfe89bbb27054baf (/dev/mapper/ocr3p1) [OCR_VOTE] Located 3 voting disk(s). [root@node1 ~]#
3.检查信息ocr
[root@node1 ~]# /u01/app/11.2.0/grid/bin/ocrcheck Status of Oracle Cluster Registry is as follows : Version : 3 Total space (kbytes) : 262120 Used space (kbytes) : 3084 Available space (kbytes) : 259036 ID : 995290881 Device/File Name : +OCR_VOTE Device/File integrity check succeeded Device/File not configured Device/File not configured Device/File not configured Device/File not configured Cluster registry integrity check succeeded Logical corruption check succeeded [root@node1 ~]#
二、主机上添加新的存储分区(3块 每块2G NORMAL)
新添加的磁盘
1. for i in `ls /sys/class/scsi_host/`; do echo "- - -" >> /sys/class/scsi_host/$i/scan; done 磁盘uid扫描
[root@node1 home]# for i in `ls /sys/class/scsi_host/`; do echo "- - -" >> /sys/class/scsi_host/$i/scan; done [root@node1 home]# ls -l /dev/mapper/ total 0 lrwxrwxrwx 1 root root 8 Dec 10 17:09 36000d3100082db000000000000000018 -> ../dm-12 lrwxrwxrwx 1 root root 8 Dec 10 17:09 36000d3100082db000000000000000019 -> ../dm-14 lrwxrwxrwx 1 root root 8 Dec 10 17:09 36000d3100082db00000000000000001a -> ../dm-13 lrwxrwxrwx 1 root root 7 Nov 12 15:42 archlog -> ../dm-2 lrwxrwxrwx 1 root root 7 Nov 12 15:42 archlogp1 -> ../dm-7 crw-rw---- 1 root root 10, 58 Nov 12 15:42 control lrwxrwxrwx 1 root root 7 Nov 12 15:42 fra -> ../dm-9 lrwxrwxrwx 1 root root 8 Nov 12 15:42 frap1 -> ../dm-11 lrwxrwxrwx 1 root root 7 Nov 12 15:42 ocr1 -> ../dm-1 lrwxrwxrwx 1 root root 7 Nov 12 15:42 ocr1p1 -> ../dm-5 lrwxrwxrwx 1 root root 7 Nov 12 15:42 ocr2 -> ../dm-6 lrwxrwxrwx 1 root root 8 Nov 12 15:42 ocr2p1 -> ../dm-10 lrwxrwxrwx 1 root root 7 Nov 12 15:42 ocr3 -> ../dm-0 lrwxrwxrwx 1 root root 7 Nov 12 15:42 ocr3p1 -> ../dm-4 lrwxrwxrwx 1 root root 7 Nov 12 15:42 racdata -> ../dm-3 lrwxrwxrwx 1 root root 7 Nov 12 15:42 racdatap1 -> ../dm-8 [root@node1 home]# [root@node2 ~]# for i in `ls /sys/class/scsi_host/`; do echo "- - -" >> /sys/class/scsi_host/$i/scan; done [root@node2 ~]# ls -l /dev/mapper/ total 0 lrwxrwxrwx 1 root root 8 Dec 10 17:09 36000d3100082db000000000000000018 -> ../dm-12 lrwxrwxrwx 1 root root 8 Dec 10 17:09 36000d3100082db000000000000000019 -> ../dm-14 lrwxrwxrwx 1 root root 8 Dec 10 17:09 36000d3100082db00000000000000001a -> ../dm-13 lrwxrwxrwx 1 root root 7 Nov 12 15:01 archlog -> ../dm-1 lrwxrwxrwx 1 root root 7 Nov 12 15:01 archlogp1 -> ../dm-5 crw-rw---- 1 root root 10, 58 Nov 12 15:01 control lrwxrwxrwx 1 root root 7 Nov 12 15:01 fra -> ../dm-3 lrwxrwxrwx 1 root root 7 Nov 12 15:01 frap1 -> ../dm-9 lrwxrwxrwx 1 root root 7 Nov 12 15:01 ocr1 -> ../dm-2 lrwxrwxrwx 1 root root 7 Nov 12 15:01 ocr1p1 -> ../dm-7 lrwxrwxrwx 1 root root 7 Nov 12 15:01 ocr2 -> ../dm-8 lrwxrwxrwx 1 root root 8 Nov 12 15:01 ocr2p1 -> ../dm-11 lrwxrwxrwx 1 root root 7 Nov 12 15:01 ocr3 -> ../dm-0 lrwxrwxrwx 1 root root 7 Nov 12 15:01 ocr3p1 -> ../dm-4 lrwxrwxrwx 1 root root 7 Nov 12 15:01 racdata -> ../dm-6 lrwxrwxrwx 1 root root 8 Nov 12 15:01 racdatap1 -> ../dm-10 [root@node2 ~]#
2. vi /etc/multipath.conf
添加如下:
multipath { wwid 36000d3100082db000000000000000018 alias ocrnew1 path_grouping_policy multibus # path_checker readsector0 path_selector "round-robin 0" failback manual rr_weight priorities no_path_retry 5 rr_min_io 10 } multipath { wwid 36000d3100082db00000000000000001a alias ocrnew2 path_grouping_policy multibus # path_checker readsector0 path_selector "round-robin 0" failback manual rr_weight priorities no_path_retry 5 rr_min_io 10 } multipath { wwid 36000d3100082db000000000000000019 alias ocrnew3 path_grouping_policy multibus # path_checker readsector0 path_selector "round-robin 0" failback manual rr_weight priorities no_path_retry 5 rr_min_io 10 }
3.重启服务 service multipathd restart
4验证
ls -l /dev/mapper/ . multipath -ll [root@node11 etc]# service multipathd restart --重启会导致磁盘权限掉失(如果以前用chmod chown付权限的就好掉失,如果以前用udev 付权限的就不会。) ok Stopping multipathd daemon: [ OK ] Starting multipathd daemon: [ OK ] [root@node11 etc]# ls -l /dev/mapper/ total 0 lrwxrwxrwx 1 root root 7 Dec 10 17:19 archlog -> ../dm-2 lrwxrwxrwx 1 root root 7 Dec 10 17:19 archlogp1 -> ../dm-7 crw-rw---- 1 root root 10, 58 Nov 12 15:42 control lrwxrwxrwx 1 root root 7 Dec 10 17:19 fra -> ../dm-9 lrwxrwxrwx 1 root root 8 Dec 10 17:19 frap1 -> ../dm-11 lrwxrwxrwx 1 root root 7 Dec 10 17:19 ocr1 -> ../dm-1 lrwxrwxrwx 1 root root 7 Dec 10 17:19 ocr1p1 -> ../dm-5 lrwxrwxrwx 1 root root 7 Dec 10 17:19 ocr2 -> ../dm-6 lrwxrwxrwx 1 root root 8 Dec 10 17:19 ocr2p1 -> ../dm-10 lrwxrwxrwx 1 root root 7 Dec 10 17:19 ocr3 -> ../dm-0 lrwxrwxrwx 1 root root 7 Dec 10 17:19 ocr3p1 -> ../dm-4 lrwxrwxrwx 1 root root 8 Dec 10 17:19 ocrnew1 -> ../dm-12 ##########是块设备 lrwxrwxrwx 1 root root 8 Dec 10 17:19 ocrnew2 -> ../dm-13 ###########是块设备 lrwxrwxrwx 1 root root 8 Dec 10 17:19 ocrnew3 -> ../dm-14 ############是块设备 lrwxrwxrwx 1 root root 7 Dec 10 17:19 racdata -> ../dm-3 lrwxrwxrwx 1 root root 7 Dec 10 17:19 racdatap1 -> ../dm-8 [root@node1 etc]# [root@node2 ~]# service multipathd restart ok Stopping multipathd daemon: [ OK ] Starting multipathd daemon: [ OK ] [root@node2 ~]# ls -l /dev/mapper/ total 0 lrwxrwxrwx 1 root root 7 Dec 10 17:19 archlog -> ../dm-1 lrwxrwxrwx 1 root root 7 Dec 10 17:19 archlogp1 -> ../dm-5 crw-rw---- 1 root root 10, 58 Nov 12 15:01 control lrwxrwxrwx 1 root root 7 Dec 10 17:19 fra -> ../dm-3 lrwxrwxrwx 1 root root 7 Dec 10 17:19 frap1 -> ../dm-9 lrwxrwxrwx 1 root root 7 Dec 10 17:19 ocr1 -> ../dm-2 lrwxrwxrwx 1 root root 7 Dec 10 17:19 ocr1p1 -> ../dm-7 lrwxrwxrwx 1 root root 7 Dec 10 17:19 ocr2 -> ../dm-8 lrwxrwxrwx 1 root root 8 Dec 10 17:19 ocr2p1 -> ../dm-11 lrwxrwxrwx 1 root root 7 Dec 10 17:19 ocr3 -> ../dm-0 lrwxrwxrwx 1 root root 7 Dec 10 17:19 ocr3p1 -> ../dm-4 lrwxrwxrwx 1 root root 8 Dec 10 17:19 ocrnew1 -> ../dm-12 lrwxrwxrwx 1 root root 8 Dec 10 17:19 ocrnew2 -> ../dm-13 lrwxrwxrwx 1 root root 8 Dec 10 17:19 ocrnew3 -> ../dm-14 lrwxrwxrwx 1 root root 7 Dec 10 17:19 racdata -> ../dm-6 lrwxrwxrwx 1 root root 8 Dec 10 17:19 racdatap1 -> ../dm-10 [root@node2 ~]#
5.分区 fdisk成分区(分不分区自己定) :使用partprobe可以不用重启系统即可配合fdisk工具创建新的分区
fdisk /dev/mapper/ocrnew1 fdisk /dev/mapper/ocrnew2 fdisk /dev/mapper/ocrnew3
6.添加权限(权限重新付)
chown grid:asmadmin /dev/mapper/ocrnew1* chown grid:asmadmin /dev/mapper/ocrnew2* chown grid:asmadmin /dev/mapper/ocrnew3* chmod 660 /dev/mapper/ocrnew1* chmod 660 /dev/mapper/ocrnew2* chmod 660 /dev/mapper/ocrnew3* chown grid:asmadmin /dev/mapper/archlog* chown grid:asmadmin /dev/mapper/fra* chown grid:asmadmin /dev/mapper/ocr1* chown grid:asmadmin /dev/mapper/ocr2* chown grid:asmadmin /dev/mapper/ocr3* chown grid:asmadmin /dev/mapper/racdata* chmod 660 /dev/mapper/archlog* chmod 660 /dev/mapper/fra* chmod 660 /dev/mapper/ocr1* chmod 660 /dev/mapper/ocr2* chmod 660 /dev/mapper/ocr3* chmod 660 /dev/mapper/racdata*
三、开始恢复
确定ocr,votedisk,asm spfile存在一个独立asm diskgroup中
[grid@node1 ~]$ /u01/app/11.2.0/grid/bin/ocrcheck Status of Oracle Cluster Registry is as follows : Version : 3 Total space (kbytes) : 262120 Used space (kbytes) : 3084 Available space (kbytes) : 259036 ID : 995290881 Device/File Name : +OCR_VOTE Device/File integrity check succeeded Device/File not configured Device/File not configured Device/File not configured Device/File not configured Cluster registry integrity check succeeded Logical corruption check bypassed due to non-privileged user
grid以sysasm用户进入asm实例;
SQL> show parameter spfile; NAME TYPE ------------------------------------ ---------------------- VALUE ------------------------------ spfile string +OCR_VOTE/lxxhscan/asmparamete rfile/registry.253.862151267 SQL>create pfile='/home/oracle/11' from spfile [grid@node1 ~]$ crsctl query css votedisk ## STATE File Universal Id File Name Disk group -- ----- ----------------- --------- --------- 1. ONLINE 4c0a1622b1a04f48bf1425fadfbcd461 (/dev/mapper/ocr1p1) [OCR_VOTE] 2. ONLINE 627259113bb24f88bf15ed81694c8f17 (/dev/mapper/ocr2p1) [OCR_VOTE] 3. ONLINE 1086d74f5f5f4fb5bfe89bbb27054baf (/dev/mapper/ocr3p1) [OCR_VOTE] Located 3 voting disk(s). [grid@node1 ~]$ ASMCMD> lsdsk -t -G dg_sys ASMCMD-8001: diskgroup 'dg_sys' does not exist or is not mounted ASMCMD> lsdsk -t -G OCR_VOTE Create_Date Mount_Date Repair_Timer Path 2014-10-28 14:27:39 2014-11-12 15:03:21 0 /dev/mapper/ocr1p1 2014-10-28 14:27:39 2014-11-12 15:03:21 0 /dev/mapper/ocr2p1 2014-10-28 14:27:39 2014-11-12 15:03:21 0 /dev/mapper/ocr3p1 ASMCMD>
查看当前rac状态
[grid@node11 ~]$ crsctl status res -t -------------------------------------------------------------------------------- NAME TARGET STATE SERVER STATE_DETAILS -------------------------------------------------------------------------------- Local Resources -------------------------------------------------------------------------------- ora.ARCHLOG.dg ONLINE ONLINE node11 ora.DATA.dg ONLINE ONLINE node11 ora.FRA.dg ONLINE ONLINE node11 ora.LISTENER.lsnr ONLINE ONLINE node11 ora.OCR_VOTE.dg ONLINE ONLINE node11 ora.asm ONLINE ONLINE node11 Started ora.gsd OFFLINE OFFLINE node11 ora.net1.network ONLINE ONLINE node11 ora.ons ONLINE ONLINE node11 ora.registry.acfs ONLINE ONLINE node11 -------------------------------------------------------------------------------- Cluster Resources -------------------------------------------------------------------------------- ora.LISTENER_SCAN1.lsnr 1 ONLINE ONLINE node11 ora.cvu 1 ONLINE ONLINE node11 ora.gnnt.db 1 ONLINE ONLINE node11 Open 2 ONLINE OFFLINE ora.node11.vip 1 ONLINE ONLINE node11 ora.node12.vip 1 ONLINE INTERMEDIATE node11 FAILED OVER ora.oc4j 1 ONLINE ONLINE node11 ora.scan1.vip 1 ONLINE ONLINE node11
备份ocr
[root@node1 ~]# /u01/app/11.2.0/grid/bin/ocrconfig -manualbackup node1 2014/12/11 11:15:01 /u01/app/11.2.0/grid/cdata/lxxhscan/backup_20141211_111501.ocr [root@node1~]# /u01/app/11.2.0/grid/bin/ocrconfig -showbackup node1 2014/12/11 09:08:19 /u01/app/11.2.0/grid/cdata/lxxhscan/backup00.ocr node1 2014/12/11 05:08:18 /u01/app/11.2.0/grid/cdata/lxxhscan/backup01.ocr node1 2014/12/11 01:08:15 /u01/app/11.2.0/grid/cdata/lxxhscan/backup02.ocr node1 2014/12/10 09:08:13 /u01/app/11.2.0/grid/cdata/lxxhscan/day.ocr node2 2014/12/02 07:20:17 /u01/app/11.2.0/grid/cdata/lxxhscan/week.ocr node1 2014/12/11 11:15:01 /u01/app/11.2.0/grid/cdata/lxxhscan/backup_20141211_111501.ocr [root@node1 ~]#
1.破坏ocr磁盘
dd if=/dev/zero of=/dev/mapper/ocr1p1 bs=1024K count=1 dd if=/dev/zero of=/dev/mapper/ocr2p1 bs=1024K count=1 dd if=/dev/zero of=/dev/mapper/ocr3p1 bs=1024K count=1 [root@node1 ~]# dd if=/dev/zero of=/dev/mapper/ocr2p1 bs=1024K count=1 1+0 records in 1+0 records out 1048576 bytes (1.0 MB) copied, 0.00242945 s, 432 MB/s [root@node1 ~]# dd if=/dev/zero of=/dev/mapper/ocr3p1 bs=1024K count=1 1+0 records in 1+0 records out 1048576 bytes (1.0 MB) copied, 0.0030372 s, 345 MB/s [root@node1 ~]#
2.关闭crs
crsctl stop crs
3.启动crs
[grid@node1 ~]$ crsctl start crs CRS-4563: Insufficient user privileges. CRS-4000: Command Start failed, or completed with errors. [grid@node1 ~]$ crsctl status res -t -init -------------------------------------------------------------------------------- NAME TARGET STATE SERVER STATE_DETAILS -------------------------------------------------------------------------------- Cluster Resources -------------------------------------------------------------------------------- ora.asm 1 ONLINE INTERMEDIATE node11 OCR not started ora.cluster_interconnect.haip 1 ONLINE ONLINE node11 ora.crf 1 ONLINE ONLINE node11 ora.crsd 1 ONLINE OFFLINE ora.cssd 1 ONLINE ONLINE node11 ora.cssdmonitor 1 ONLINE ONLINE node11 ora.ctssd 1 ONLINE ONLINE node11 ACTIVE:0 ora.diskmon 1 OFFLINE OFFLINE ora.drivers.acfs 1 ONLINE ONLINE node11 ora.evmd 1 ONLINE INTERMEDIATE node11 ora.gipcd 1 ONLINE ONLINE node11 ora.gpnpd 1 ONLINE ONLINE node11 ora.mdnsd 1 ONLINE ONLINE node11 [grid@node1 ~]$ [grid@node1 ~]$ crsctl check crs CRS-4638: Oracle High Availability Services is online CRS-4535: Cannot communicate with Cluster Ready Services CRS-4529: Cluster Synchronization Services is online CRS-4534: Cannot communicate with Event Manager =======
[root@node1 ~]# /u01/app/11.2.0/grid/bin/crsctl start cluster -all CRS-2800: Cannot start resource 'ora.asm' as it is already in the INTERMEDIATE state on server 'node11' CRS-2672: Attempting to start 'ora.cssdmonitor' on 'node12' CRS-2676: Start of 'ora.cssdmonitor' on 'node12' succeeded CRS-2672: Attempting to start 'ora.cssd' on 'node12' CRS-2672: Attempting to start 'ora.diskmon' on 'node12' CRS-2676: Start of 'ora.diskmon' on 'node12' succeeded CRS-2676: Start of 'ora.cssd' on 'node12' succeeded CRS-2672: Attempting to start 'ora.ctssd' on 'node12' CRS-2672: Attempting to start 'ora.cluster_interconnect.haip' on 'node12' CRS-2676: Start of 'ora.ctssd' on 'node12' succeeded CRS-2672: Attempting to start 'ora.evmd' on 'node12' CRS-2676: Start of 'ora.evmd' on 'node12' succeeded CRS-2676: Start of 'ora.cluster_interconnect.haip' on 'node12' succeeded CRS-2672: Attempting to start 'ora.asm' on 'node12' CRS-2674: Start of 'ora.asm' on 'node12' failed CRS-4705: Start of Clusterware failed on node node11. CRS-4705: Start of Clusterware failed on node node12. CRS-4000: Command Start failed, or completed with errors.
ocrcheck检测报错:
[root@node1 ~]# ocrcheck -bash: ocrcheck: command not found [root@node1 ~]# /u01/app/11.2.0/grid/bin/ocrcheck PROT-602: Failed to retrieve data from the cluster registry PROC-26: Error while accessing the physical storage [root@node1 ~]#
GI相关日志
--alert日志
2014-12-11 14:51:33.092: [/u01/app/11.2.0/grid/bin/oraagent.bin(7298)]CRS-5019:All OCR locations are on ASM disk groups [OCR_VOTE], and none of these disk groups are mounted. Details are at "(:CLSN00100:)" in "/u01/app/11.2.0/grid/log/node11/agent/ohasd/oraagent_grid/oraagent_grid.log". 2014-12-11 14:52:03.098: [/u01/app/11.2.0/grid/bin/oraagent.bin(7298)]CRS-5019:All OCR locations are on ASM disk groups [OCR_VOTE], and none of these disk groups are mounted. Details are at "(:CLSN00100:)" in "/u01/app/11.2.0/grid/log/node11/agent/ohasd/oraagent_grid/oraagent_grid.log"
--ocssd日志
/u01/app/11.2.0/grid/log/node11/agent/ohasd/oraagent_grid/oraagent_grid.log [root@node1 ~]# tail -f /u01/app/11.2.0/grid/log/node11/crsd/crsd.log 2014-12-11 13:21:51.881: [ CRSD][1824413440]{1:23871:1029} Done. 2014-12-11 13:21:51.979: [ CRSCOMM][1843324672] IpcL: connection to member 1 has been removed 2014-12-11 13:21:51.979: [CLSFRAME][1843324672] Removing IPC Member:{Relative|Node:0|Process:1|Type:3} 2014-12-11 13:21:51.979: [CLSFRAME][1843324672] Disconnected from AGENT process: {Relative|Node:0|Process:1|Type:3} 2014-12-11 13:21:51.979: [ AGFW][1837020928]{1:23871:1052} Agfw Proxy Server received process disconnected notification, count=1 2014-12-11 13:21:51.979: [ CRSPE][1826514688]{1:23871:1051} Disconnected from server: 2014-12-11 13:21:51.979: [ AGFW][1837020928]{1:23871:1052} /u01/app/11.2.0/grid/bin/oraagent_grid disconnected. 2014-12-11 13:21:51.979: [ AGFW][1837020928]{1:23871:1052} Agent /u01/app/11.2.0/grid/bin/oraagent_grid[77864] stopped! 2014-12-11 13:21:51.979: [ CRSCOMM][1837020928]{1:23871:1052} IpcL: removeConnection: Member 1 does not exist in pending connections.
在我们破坏了ocr所在的asm disk的磁盘后,启动crs明显提示无法找到votedisk信息
###################正式开始恢复###################
以-excl -nocrs 方式启动集群,这将启动ASM实例 但不启动CRS
1.强制关闭crs
[root@node1 ~]# /u01/app/11.2.0/grid/bin/crsctl stop crs -f CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on 'node11' CRS-2673: Attempting to stop 'ora.crf' on 'node11' CRS-2673: Attempting to stop 'ora.ctssd' on 'node11' CRS-2673: Attempting to stop 'ora.evmd' on 'node11' CRS-2673: Attempting to stop 'ora.asm' on 'node11' CRS-2673: Attempting to stop 'ora.drivers.acfs' on 'node11' CRS-2673: Attempting to stop 'ora.mdnsd' on 'node11' CRS-2677: Stop of 'ora.crf' on 'node11' succeeded CRS-2677: Stop of 'ora.evmd' on 'node11' succeeded CRS-2677: Stop of 'ora.mdnsd' on 'node11' succeeded CRS-2677: Stop of 'ora.ctssd' on 'node11' succeeded CRS-2677: Stop of 'ora.drivers.acfs' on 'node11' succeeded CRS-2677: Stop of 'ora.asm' on 'node11' succeeded CRS-2673: Attempting to stop 'ora.cluster_interconnect.haip' on 'node11' CRS-2677: Stop of 'ora.cluster_interconnect.haip' on 'node11' succeeded CRS-2673: Attempting to stop 'ora.cssd' on 'node11' CRS-2677: Stop of 'ora.cssd' on 'node11' succeeded CRS-2673: Attempting to stop 'ora.gipcd' on 'node11' CRS-2677: Stop of 'ora.gipcd' on 'node11' succeeded CRS-2673: Attempting to stop 'ora.gpnpd' on 'node11' CRS-2677: Stop of 'ora.gpnpd' on 'node11' succeeded CRS-2793: Shutdown of Oracle High Availability Services-managed resources on 'node11' has completed CRS-4133: Oracle High Availability Services has been stopped. [root@node2 ~]# /u01/app/11.2.0/grid/bin/crsctl stop crs -f CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on 'node12' CRS-2673: Attempting to stop 'ora.drivers.acfs' on 'node12' CRS-2673: Attempting to stop 'ora.mdnsd' on 'node12' CRS-2673: Attempting to stop 'ora.ctssd' on 'node12' CRS-2673: Attempting to stop 'ora.evmd' on 'node12' CRS-2673: Attempting to stop 'ora.asm' on 'node12' CRS-2677: Stop of 'ora.evmd' on 'node12' succeeded CRS-2677: Stop of 'ora.mdnsd' on 'node12' succeeded CRS-2677: Stop of 'ora.drivers.acfs' on 'node12' succeeded CRS-2677: Stop of 'ora.asm' on 'node12' succeeded CRS-2673: Attempting to stop 'ora.cluster_interconnect.haip' on 'node12' CRS-2677: Stop of 'ora.ctssd' on 'node12' succeeded CRS-2677: Stop of 'ora.cluster_interconnect.haip' on 'node12' succeeded CRS-2673: Attempting to stop 'ora.cssd' on 'node12' CRS-2677: Stop of 'ora.cssd' on 'node12' succeeded CRS-2673: Attempting to stop 'ora.crf' on 'node12' CRS-2677: Stop of 'ora.crf' on 'node12' succeeded CRS-2673: Attempting to stop 'ora.gipcd' on 'node12' CRS-2677: Stop of 'ora.gipcd' on 'node12' succeeded CRS-2673: Attempting to stop 'ora.gpnpd' on 'node12' CRS-2677: Stop of 'ora.gpnpd' on 'node12' succeeded CRS-2793: Shutdown of Oracle High Availability Services-managed resources on 'node12' has completed CRS-4133: Oracle High Availability Services has been stopped.
2.exclusive模式启动crs 以-excl -nocrs 方式启动集群,这将启动ASM实例 但不启动CRS 启动到独占模式且不启动ora.crsd:
crsctl start crs -excl -nocrs [root@node1 ~]# /u01/app/11.2.0/grid/bin/crsctl start crs -excl -nocrs CRS-4123: Oracle High Availability Services has been started. CRS-2672: Attempting to start 'ora.mdnsd' on 'node11' CRS-2676: Start of 'ora.mdnsd' on 'node11' succeeded CRS-2672: Attempting to start 'ora.gpnpd' on 'node11' CRS-2676: Start of 'ora.gpnpd' on 'node11' succeeded CRS-2672: Attempting to start 'ora.cssdmonitor' on 'node11' CRS-2672: Attempting to start 'ora.gipcd' on 'node11' CRS-2676: Start of 'ora.cssdmonitor' on 'node11' succeeded CRS-2676: Start of 'ora.gipcd' on 'node11' succeeded CRS-2672: Attempting to start 'ora.cssd' on 'node11' CRS-2672: Attempting to start 'ora.diskmon' on 'node11' CRS-2676: Start of 'ora.diskmon' on 'node11' succeeded CRS-2676: Start of 'ora.cssd' on 'node11' succeeded CRS-2672: Attempting to start 'ora.drivers.acfs' on 'node11' CRS-2679: Attempting to clean 'ora.cluster_interconnect.haip' on 'node11' CRS-2672: Attempting to start 'ora.ctssd' on 'node11' CRS-2681: Clean of 'ora.cluster_interconnect.haip' on 'node11' succeeded CRS-2672: Attempting to start 'ora.cluster_interconnect.haip' on 'node11' CRS-2676: Start of 'ora.drivers.acfs' on 'node11' succeeded CRS-2676: Start of 'ora.ctssd' on 'node11' succeeded CRS-2676: Start of 'ora.cluster_interconnect.haip' on 'node11' succeeded CRS-2672: Attempting to start 'ora.asm' on 'node11' CRS-2676: Start of 'ora.asm' on 'node11' succeeded crsctl stat res -t -init [grid@node1 ~]$ crsctl stat res -t -init -------------------------------------------------------------------------------- NAME TARGET STATE SERVER STATE_DETAILS -------------------------------------------------------------------------------- Cluster Resources -------------------------------------------------------------------------------- ora.asm 1 ONLINE INTERMEDIATE node11 OCR not started ora.cluster_interconnect.haip 1 ONLINE ONLINE node11 ora.crf 1 OFFLINE OFFLINE ora.crsd 1 OFFLINE OFFLINE ora.cssd 1 ONLINE ONLINE node11 ora.cssdmonitor 1 ONLINE ONLINE node11 ora.ctssd 1 ONLINE ONLINE node11 ACTIVE:0 ora.diskmon 1 OFFLINE OFFLINE ora.drivers.acfs 1 ONLINE ONLINE node11 ora.evmd 1 OFFLINE OFFLINE ora.gipcd 1 ONLINE ONLINE node11 ora.gpnpd 1 ONLINE ONLINE node11 ora.mdnsd 1 ONLINE ONLINE node11
3.组重建原ocr和votedisk所在磁盘组:
注意:这里是在grid用户下
创建新的存放ocr和vote disk的磁盘组,磁盘组名和原有的一致(如果想改变位置,需修改/etc/oracle/ocr.loc文件)
为了操作方便,建议创建磁盘组和以前ocr所在异常的磁盘组一致
SQL> create diskgroup OCR_VOTE normal redundancy disk '/dev/mapper/ocrnew1p1','/dev/mapper/ocrnew2p1','/dev/mapper/ocrnew3p1' attribute 'compatible.asm'='11.2.0.4.0', 'compatible.rdbms'='11.2.0.4.0'; Diskgroup created.
4.还原ocr
[root@node1 ~]# cd /u01/app/11.2.0/grid/cdata/lxxhscan/ [root@node1 lxxhscan]# ll total 58112 -rw------- 1 root root 7438336 Dec 11 09:08 backup00.ocr -rw------- 1 root root 7438336 Dec 11 05:08 backup01.ocr -rw------- 1 root root 7438336 Dec 11 01:08 backup02.ocr -rw------- 1 root root 7438336 Dec 11 11:15 backup_20141211_111501.ocr -rw------- 1 root root 7438336 Dec 11 09:08 day_.ocr -rw------- 1 root root 7438336 Dec 10 09:08 day.ocr -rw------- 1 root root 7438336 Dec 9 09:08 week_.ocr -rw------- 1 root root 7438336 Nov 11 03:39 week.ocr [root@node1 lxxhscan]# /u01/app/11.2.0/grid/bin/ocrconfig -restore backup_20141211_111501.ocr [root@node1 lxxhscan]# /u01/app/11.2.0/grid/bin/ocrconfig -restore backup_20141211_111501.ocr
恢复表决盘的准备工作:
show parameter asm_diskstring SQL> show parameter asm_diskstring; NAME TYPE ------------------------------------ ---------------------- VALUE ------------------------------ asm_diskstring string
如果asm_diskstring没有值,表示ASM磁盘用的是默认ASM磁盘搜索路径。
修改成实际的ASM磁盘搜索路径:(不然要报错)
[root@node1 lxxhscan]# /u01/app/11.2.0/grid/bin/crsctl replace votedisk +OCR_VOTE CRS-4602: Failed 27 to add voting file 0814f7a60de74f23bfb78c437c74110c. CRS-4602: Failed 27 to add voting file 90a17219640c4f40bf25d416d99c58ce. CRS-4602: Failed 27 to add voting file 7177431c8aaf4f9cbffeab86e069b130. Failed to replace voting disk group with +OCR_VOTE. CRS-4000: Command Replace failed, or completed with errors. alter system set asm_diskstring='/dev/mapper/' SQL> alter system set asm_diskstring='/dev/mapper/'; System altered. SQL> show parameter asm_diskstring; NAME TYPE ------------------------------------ ---------------------- VALUE ------------------------------ asm_diskstring string /dev/mapper/ SQL>
5.处理votedisk 检查
/u01/app/11.2.0/grid/bin/crsctl replace votedisk +OCR_VOTE [root@node11 lxxhscan]# /u01/app/11.2.0/grid/bin/crsctl replace votedisk +OCR_VOTE Successful addition of voting disk 248e98950ba54f37bff0c1a143e8adf3. Successful addition of voting disk 4b2eb63628424fd3bff050ad53c6a3de. Successful addition of voting disk 594b9b8769744f78bf74ce3ad737cbd8. Successfully replaced voting disk group with +OCR_VOTE. CRS-4266: Voting file(s) successfully replaced [root@node11 lxxhscan]# crsctl query css votedisk [grid@node11 ~]$ crsctl query css votedisk ## STATE File Universal Id File Name Disk group -- ----- ----------------- --------- --------- 1. ONLINE 248e98950ba54f37bff0c1a143e8adf3 (/dev/mapper/ocrnew1p1) [OCR_VOTE] 2. ONLINE 4b2eb63628424fd3bff050ad53c6a3de (/dev/mapper/ocrnew2p1) [OCR_VOTE] 3. ONLINE 594b9b8769744f78bf74ce3ad737cbd8 (/dev/mapper/ocrnew3p1) [OCR_VOTE] Located 3 voting disk(s). [grid@node11 ~]$
6.创建asm spfile(原来是有的,重新创建没有。重新创建时备份spfile)----------------------
SQL> show parameter spfile; ---原来 NAME TYPE ------------------------------------ ---------------------- VALUE ------------------------------ spfile string +OCR_DATA/racscan/asmparameter file/registry.253.861465185 SQL> SQL> show parameter spfile;--没有 NAME TYPE ------------------------------------ ---------------------- VALUE ------------------------------ spfile string SQL> create spfile='+OCR_VOTE' FROM pfile='/home/grid/11';
7.重启集群服务,检查是否已经恢复正常:
[root@node1 lxxhscan]# /u01/app/11.2.0/grid/bin/crsctl stop crs CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on 'node11' CRS-2673: Attempting to stop 'ora.mdnsd' on 'node11' CRS-2673: Attempting to stop 'ora.drivers.acfs' on 'node11' CRS-2673: Attempting to stop 'ora.ctssd' on 'node11' CRS-2673: Attempting to stop 'ora.asm' on 'node11' CRS-2677: Stop of 'ora.mdnsd' on 'node11' succeeded CRS-2677: Stop of 'ora.asm' on 'node11' succeeded CRS-2673: Attempting to stop 'ora.cluster_interconnect.haip' on 'node11' CRS-2677: Stop of 'ora.cluster_interconnect.haip' on 'node11' succeeded CRS-2677: Stop of 'ora.ctssd' on 'node11' succeeded CRS-2673: Attempting to stop 'ora.cssd' on 'node11' CRS-2677: Stop of 'ora.drivers.acfs' on 'node11' succeeded CRS-2677: Stop of 'ora.cssd' on 'node11' succeeded CRS-2673: Attempting to stop 'ora.gipcd' on 'node11' CRS-2677: Stop of 'ora.gipcd' on 'node11' succeeded CRS-2673: Attempting to stop 'ora.gpnpd' on 'node11' CRS-2677: Stop of 'ora.gpnpd' on 'node11' succeeded CRS-2793: Shutdown of Oracle High Availability Services-managed resources on 'node11' has completed CRS-4133: Oracle High Availability Services has been stopped. [root@node1 lxxhscan]# /u01/app/11.2.0/grid/bin/crsctl start crs --时间有点长,耐心等待 [root@node1 ~]# /u01/app/11.2.0/grid/bin/crsctl start crs CRS-4123: Oracle High Availability Services has been started. [root@node1 ~]#
8.这里crs已经恢复正常,进一步检查ocr,votedisk,asm spfile情况
[root@node1 ~]# /u01/app/11.2.0/grid/bin/ocrcheck Status of Oracle Cluster Registry is as follows : Version : 3 Total space (kbytes) : 262120 Used space (kbytes) : 3084 Available space (kbytes) : 259036 ID : 995290881 Device/File Name : +OCR_VOTE Device/File integrity check succeeded Device/File not configured Device/File not configured Device/File not configured Device/File not configured Cluster registry integrity check succeeded Logical corruption check succeeded [root@node1 ~]# Copyright (c) 1982, 2013, Oracle. All rights reserved. Connected to: Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production With the Real Application Clusters and Automatic Storage Management options SQL> show parameter spfile; [grid@node1 ~]$ crsctl query css votedisk ## STATE File Universal Id File Name Disk group -- ----- ----------------- --------- --------- 1. ONLINE 248e98950ba54f37bff0c1a143e8adf3 (/dev/mapper/ocrnew1p1) [OCR_VOTE] 2. ONLINE 4b2eb63628424fd3bff050ad53c6a3de (/dev/mapper/ocrnew2p1) [OCR_VOTE] 3. ONLINE 594b9b8769744f78bf74ce3ad737cbd8 (/dev/mapper/ocrnew3p1) [OCR_VOTE] Located 3 voting disk(s). [grid@node1 ~]$
CVU验证所有RAC节点OCR的完整性
$ cluvfy comp ocr -n all -verbose
[grid@node1 ~]$ cluvfy comp ocr -n all -verbose
验证 OCR 完整性
正在检查 OCR 完整性...
正在检查是否缺少非集群配置...
所有节点都没有非集群的, 仅限本地的配置
“ASM 运行”检查通过。ASM 正在所有指定节点上运行
正在检查 OCR 配置文件 "/etc/oracle/ocr.loc"...
OCR 配置文件 "/etc/oracle/ocr.loc" 检查成功
ocr 位置 "+OCR_VOTE" 的磁盘组在所有节点上都可用
NOTE:
此检查不验证 OCR 内容的完整性。请以授权用户的身份执行 'ocrcheck' 以验证 OCR 的内容。
OCR 完整性检查已通过
OCR 完整性 的验证成功。
[grid@node1 ~]$ asmcmd lsdg State Type Rebal Sector Block AU Total_MB Free_MB Req_mir_free_MB Usable_file_MB Offline_disks Voting_files Name MOUNTED EXTERN N 512 4096 1048576 1048570 1011974 0 1011974 0 N ARCHLOG/ MOUNTED EXTERN N 512 4096 1048576 1048570 1005645 0 1005645 0 N DATA/ MOUNTED EXTERN N 512 4096 1048576 511993 511887 0 511887 0 N FRA/ MOUNTED NORMAL N 512 4096 1048576 6141 5341 2047 1647 0 Y OCR_VOTE/ [grid@node11 ~]$ [grid@node1 ~]$ crs_stat -t Name Type Target State Host ------------------------------------------------------------ ora.ARCHLOG.dg ora....up.type ONLINE ONLINE node11 ora.DATA.dg ora....up.type ONLINE ONLINE node11 ora.FRA.dg ora....up.type ONLINE ONLINE node11 ora....ER.lsnr ora....er.type ONLINE ONLINE node11 ora....N1.lsnr ora....er.type ONLINE ONLINE node11 ora....VOTE.dg ora....up.type ONLINE ONLINE node11 ora.asm ora.asm.type ONLINE ONLINE node11 ora.cvu ora.cvu.type ONLINE ONLINE node11 ora.gnnt.db ora....se.type ONLINE ONLINE node11 ora.gsd ora.gsd.type OFFLINE OFFLINE ora....network ora....rk.type ONLINE ONLINE node11 ora....SM1.asm application ONLINE ONLINE node11 ora....11.lsnr application ONLINE ONLINE node11 ora.node11.gsd application OFFLINE OFFLINE ora.node11.ons application ONLINE ONLINE node11 ora.node11.vip ora....t1.type ONLINE ONLINE node11 ora.node12.vip ora....t1.type ONLINE ONLINE node11 ora.oc4j ora.oc4j.type ONLINE ONLINE node11 ora.ons ora.ons.type ONLINE ONLINE node11 ora....ry.acfs ora....fs.type ONLINE ONLINE node11 ora.scan1.vip ora....ip.type ONLINE ONLINE node11 [grid@node1 ~]$
9.到此为止表决盘的恢复正常完成了。
【推荐】国内首个AI IDE,深度理解中文开发场景,立即下载体验Trae
【推荐】编程新体验,更懂你的AI,立即体验豆包MarsCode编程助手
【推荐】抖音旗下AI助手豆包,你的智能百科全书,全免费不限次数
【推荐】轻量又高性能的 SSH 工具 IShell:AI 加持,快人一步
· DeepSeek 开源周回顾「GitHub 热点速览」
· 物流快递公司核心技术能力-地址解析分单基础技术分享
· .NET 10首个预览版发布:重大改进与新特性概览!
· AI与.NET技术实操系列(二):开始使用ML.NET
· 单线程的Redis速度为什么快?