Oracle 11.2.0.1 RAC GRID 无法启动 : Oracle High Availability Services startup failed
、在虚拟机上安装的11.2.0.1的RAC,之所以选择11.2.0.1,是因为public IP和Private 网段的问题。 安装实例过程中,电脑死机,重启后,CRS 无法启动。
[root@rac1 bin]# ./crsctlstart crs
CRS-4124: Oracle HighAvailability Services startup failed.
CRS-4000: Command Startfailed, or completed with errors.
[root@rac1 bin]# ps -ef|grep has
root 8081 1 0 03:14 ? 00:00:00/u01/app/grid/11.2.0/bin/ohasd.bin reboot
root 8137 4230 1 03:23 pts/0 00:00:00 grep has
[root@rac1 bin]# kill -9 8081
[root@rac1 bin]# ./crsctl start crs
CRS-4124: Oracle High Availability Servicesstartup failed.
CRS-4000: Command Start failed, orcompleted with errors.
查看log:
[grid@rac2 rac2]$ ll
total 72
drwxr-x--- 2 grid oinstall 4096 Nov 2100:38 admin
drwxrwxr-t 4 root oinstall 4096 Nov 2100:38 agent
-rw-rw-r-- 1 rootroot 9693 Nov 21 02:26 alertrac2.log
drwxr-x--- 2 grid oinstall 4096 Nov 2100:43 client
drwxr-x--- 2 root oinstall 4096 Nov 2100:42 crsd
drwxr-x--- 2 grid oinstall 4096 Nov 2100:39 cssd
drwxr-x--- 2 root oinstall 4096 Nov 2100:41 ctssd
drwxr-x--- 2 grid oinstall 4096 Nov 2100:39 diskmon
drwxr-x--- 2 grid oinstall 4096 Nov 2100:42 evmd
drwxr-x--- 2 grid oinstall 4096 Nov 2100:38 gipcd
drwxr-x--- 2 root oinstall 4096 Nov 2100:38 gnsd
drwxr-x--- 2 grid oinstall 4096 Nov 2100:40 gpnpd
drwxr-x--- 2 grid oinstall 4096 Nov 2100:38 mdnsd
drwxr-x--- 2 root oinstall 4096 Nov 2100:39 ohasd
drwxrwxr-t 5 grid oinstall 4096 Nov 2100:38 racg
drwxr-x--- 2 grid oinstall 4096 Nov 2100:42 srvm
除了alertrac2.log 在宕机的时候有更新外,其他文件均无更新。到节点1重启了一下:
[root@rac1 client]# ll
total 124
-rw-r--r-- 1 root root 193 Nov 21 00:31 clscfg.log
-rw-rw-rw- 1 root root 28635 Nov 21 00:32 crsctl.log
-rw-r--r-- 1 root root 114 Nov 21 00:32 crsctl.trc
-rw-r--r-- 1 gridoinstall 663 Nov 21 03:08 css.log
-rw-r--r-- 1 grid oinstall 1051 Nov 21 00:28 gpnptool_11653.log
-rw-r--r-- 1 grid oinstall 114 Nov 21 00:28 gpnptool_11653.trc
-rw-r--r-- 1 grid oinstall 1461 Nov 21 00:28 gpnptool_11660.log
-rw-r--r-- 1 grid oinstall 114 Nov 21 00:28 gpnptool_11660.trc
-rw-r--r-- 1 grid oinstall 551 Nov 21 00:35 oclskd.log
-rw-r----- 1 root root 6100 Nov 21 00:27 ocrconfig_11312.log
-rw-r--r-- 1 root root 3170 Nov 21 00:31 ocrconfig_12191.log
-rw-r----- 1 root root 342 Nov 21 00:37 ocrconfig_13798.log
-rw-r--r-- 1 grid oinstall 33862 Nov 2100:45 oifcfg.log
-rw-r--r-- 1 grid oinstall 114 Nov 21 00:45 oifcfg.trc
-rw-r--r-- 1 root root 1067 Nov 21 00:36 olsnodes.log
-rw-r--r-- 1 grid oinstall 114 Nov 21 00:37 olsnodes.trc
--css.log 的也只有如下错误:
[root@rac1 client]# cat css.log
Oracle Database 11g Clusterware Release11.2.0.1.0 - Production Copyright 1996, 2009 Oracle. All rights reserved.
2012-11-21 03:08:22.764: [CSSCLNT][4171966208]clssscConnect: gipc request failed with 29 (0x13)
2012-11-21 03:08:22.764: [ CSSCLNT][4171966208]clsssInitNative:connect failed, rc 29
2012-11-21 03:08:28.140: [CSSCLNT][4171966208]clssscConnect: gipc request failed with 29 (0x13)
2012-11-21 03:08:28.140: [CSSCLNT][4171966208]clsssInitNative: connect failed, rc 29
2012-11-21 03:08:37.908: [CSSCLNT][4171966208]clssscConnect: gipc request failed with 29 (0x13)
2012-11-21 03:08:37.908:[ CSSCLNT][4171966208]clsssInitNative: connect failed, rc 29
根据MOS 说明:
How toTroubleshoot Grid Infrastructure Startup Issues [ID 1050908.1]
http://blog.csdn.net/tianlesoftware/article/details/6013763
1. ocssd is fully up
If ocssd.bin is not fully up, crsd.log will show messages like following:
2010-02-03 22:37:51.638: [CSSCLNT][1548456880]clssscConnect: gipc request failed with 29 (0x16)
2010-02-03 22:37:51.638: [ CSSCLNT][1548456880]clsssInitNative: connect failed,rc 29
2010-02-03 22:37:51.639: [ CRSRTI][1548456880] CSS is not ready. Receivedstatus 3 from CSS. Waiting for good status ..
是OCSSD 进程无法启动。那么为什么OCSS进程无法启动? 我们对ohasd进程进行strace:
[root@rac1 client]# ps -ef|grep has
root 12192 1 012:44 ? 00:00:00/u01/app/grid/11.2.0/bin/ohasd.bin reboot
root 12281 8085 0 13:05 pts/2 00:00:00 grep has
[root@rac1 client]# strace -p 12192 -o dave.log
Process 12192 attached - interrupt to quit
quit
Process 12192 detached
[root@rac1 client]#
[root@rac1 client]# ls
clscfg.log dave.log gpnptool_11660.trc ocrconfig_13798.log olsnodes.trc
crsctl.log gpnptool_11653.log oclskd.log oifcfg.log
crsctl.trc gpnptool_11653.trc ocrconfig_11312.log oifcfg.trc
css.log gpnptool_11660.log ocrconfig_12191.log olsnodes.log
[root@rac1 client]# cat dave.log
open("/var/tmp/.oracle/npohasd",O_WRONLY <unfinished ...>
这里提示了一条很重要的信息。就是这里的文件,这个文件,我们在安装11.2.0.1的RAC时也会遇到,其应该说是11.2.0.1的一个bug。
参考:
Oracle 11gRAC ohasd failed to start at /u01/app/11.2.0/grid/crs/install/rootcrs.pl line443 解决方法
http://blog.csdn.net/tianlesoftware/article/details/7697366
所以在启动CRS之前,先在2个节点指定dd命令:
[root@rac1 client]# /bin/ddif=/var/tmp/.oracle/npohasd of=/dev/null bs=1024 count=1
然后启动,这没有问题了:
[root@rac1 bin]# ./crsctlstart crs
CRS-4123: Oracle High Availability Serviceshas been started.
[root@rac2 bin]# ./crsctlstart crs
CRS-4123: Oracle High Availability Serviceshas been started.
[root@rac2 bin]#./crsctl check crs
CRS-4638: Oracle High AvailabilityServices is online
CRS-4535: Cannot communicate with ClusterReady Services
CRS-4530: Communications failure contactingCluster Synchronization Services daemon
CRS-4534: Cannotcommunicate with Event Manager
[root@rac1 bin]# ./crsctlcheck crs
CRS-4638: Oracle High Availability Servicesis online
CRS-4535: Cannot communicate with ClusterReady Services
CRS-4530: Communications failure contactingCluster Synchronization Services daemon
CRS-4534: Cannot communicate with EventManager
[root@rac1 bin]# ./crsctlstart cluster -all
CRS-5702: Resource 'ora.crsd' is alreadyrunning on 'rac1'
CRS-5702: Resource 'ora.crsd' is alreadyrunning on 'rac2'
[root@rac1 bin]# ./crsctlcheck crs
CRS-4638: Oracle High Availability Servicesis online
CRS-4535: Cannot communicate with ClusterReady Services
CRS-4529: Cluster Synchronization Servicesis online
CRS-4533: Event Manager is online
[root@rac2 bin]# ./crsctlcheck crs
CRS-4638: Oracle High Availability Servicesis online
CRS-4535: Cannot communicate with ClusterReady Services
CRS-4529: Cluster Synchronization Servicesis online
CRS-4533: Event Manager is online
--查看进程,都拉起来了。注意11g的进程启动有些慢,多等一会。
[root@rac2 u01]# sh crs_stat.sh
Name Target State Host
------------------------------ ------------------- -------
ora.DATA.dg ONLINE ONLINE rac1
ora.FRA.dg ONLINE ONLINE rac1
ora.LISTENER.lsnr ONLINE ONLINE rac1
ora.LISTENER_SCAN1.lsnr ONLINE ONLINE rac2
ora.OCRVOTING.dg ONLINE ONLINE rac1
ora.asm ONLINE ONLINE rac1
ora.dave.db OFFLINE OFFLINE
ora.eons ONLINE ONLINE rac1
ora.gsd OFFLINE OFFLINE
ora.net1.network ONLINE ONLINE rac1
ora.oc4j OFFLINE OFFLINE
ora.ons ONLINE ONLINE rac1
ora.rac1.ASM1.asm ONLINE ONLINE rac1
ora.rac1.LISTENER_RAC1.lsnr ONLINE ONLINE rac1
ora.rac1.gsd OFFLINE OFFLINE
ora.rac1.ons ONLINE ONLINE rac1
ora.rac1.vip ONLINE ONLINE rac1
ora.rac2.ASM2.asm ONLINE ONLINE rac2
ora.rac2.LISTENER_RAC2.lsnr ONLINE ONLINE rac2
ora.rac2.gsd OFFLINE OFFLINE
ora.rac2.ons ONLINE ONLINE rac2
ora.rac2.vip ONLINE ONLINE rac2
ora.scan1.vip ONLINE ONLINE rac2
现在可以处理我们实例,弄好之后在升级到11.2.0.3.4. 免得每次都遇到这种问题。
---------------------------------------------------------------------------------------
版权所有,文章允许转载,但必须以链接方式注明源地址,否则追究法律责任!
Skype: tianlesoftware
QQ: tianlesoftware@gmail.com
Email: tianlesoftware@gmail.com
Blog: http://blog.csdn.net/tianlesoftware
Weibo: http://weibo.com/tianlesoftware
Twitter: http://twitter.com/tianlesoftware
Facebook: http://www.facebook.com/tianlesoftware
Linkedin: http://cn.linkedin.com/in/tianlesoftware