socket文件权限变更引起crs无法启动故障
Crs无法正常启动,也无法关闭。
[root@rac101 ~]# crsctl stop crs
Stopping resources. This could take several minutes.
Error while stopping resources. Possible cause: CRSD is down.
[root@rac101 ~]# crsctl check crs
Failure 1 contacting CSS daemon
Cannot communicate with CRS
Cannot communicate with EVM
ocssd进程也启动了。
[root@rac101 cssd]# ps -ef|grep ocssd
oracle 1214 670 0 01:58 ? 00:00:14 /db/oracle/product/10.2.0/crs/bin/ocssd.bin
root 10399 25837 0 02:19 pts/2 00:00:19 less ocssd.log
root 30373 25837 0 03:02 pts/2 00:00:00 grep ocssd
crsd进程则处于重启状态。
[root@rac101 cssd]# ps -ef|grep crsd
root 17385 1 0 02:34 ? 00:00:00 /bin/sh /etc/init.d/init.crsd run
root 17707 17385 0 02:34 ? 00:00:11 /db/oracle/product/10.2.0/crs/bin/crsd.bin restart
root 30851 25837 0 03:03 pts/2 00:00:00 grep crsd
crsd进程的日志中显示在等待ocssd进程
2012-11-29 03:05:07.585: [ CRSRTI][1639632]0CSS is not ready. Received status 3 from CSS. Waiting for good status ..
2012-11-29 03:05:08.785: [ COMMCRS][100719504]clsc_connect: (0x98f9bb0) no listener at (ADDRESS=(PROTOCOL=ipc)(KEY=OCSSD_LL_rac101_))
2012-11-29 03:05:08.786: [ CSSCLNT][1639632]clsssInitNative: connect failed, rc 9
2012-11-29 03:05:08.787: [ CRSRTI][1639632]0CSS is not ready. Received status 3 from CSS. Waiting for good status ..
2012-11-29 03:05:09.990: [ COMMCRS][100719504]clsc_connect: (0x98f9bb0) no listener at (ADDRESS=(PROTOCOL=ipc)(KEY=OCSSD_LL_rac101_))
2012-11-29 03:05:09.991: [ CSSCLNT][1639632]clsssInitNative: connect failed, rc 9
2012-11-29 03:05:09.991: [ CRSRTI][1639632]0CSS is not ready. Received status 3 from CSS. Waiting for good status ..
2012-11-29 03:05:11.196: [ COMMCRS][100719504]clsc_connect: (0x98f9bb0) no listener at (ADDRESS=(PROTOCOL=ipc)(KEY=OCSSD_LL_rac101_))
2012-11-29 03:05:11.196: [ CSSCLNT][1639632]clsssInitNative: connect failed, rc 9
2012-11-29 03:05:11.197: [ CRSRTI][1639632]0CSS is not ready. Received status 3 from CSS. Waiting for good status ..
2012-11-29 03:05:12.392: [ COMMCRS][100719504]clsc_connect: (0x98f9bb0) no listener at (ADDRESS=(PROTOCOL=ipc)(KEY=OCSSD_LL_rac101_))
2012-11-29 03:05:12.392: [ CSSCLNT][1639632]clsssInitNative: connect failed, rc 9
2012-11-29 03:05:12.392: [ CRSRTI][1639632]0CSS is not ready. Received status 3 from CSS. Waiting for good status ..
进程ocssd的日志中不停的报clssnmvReadDskHeartbeat: read ALL for Joining
[ CSSD]2012-11-29 01:58:38.269 [90823568] >TRACE: clssgmclientlsnr: Spawned
[ CSSD]2012-11-29 01:58:38.292 [90823568] >TRACE: clsclisten: Permission denied for (ADDRESS=(PROTOCOL=ipc)(KEY=Oracle_CSS_LclLstnr_crs_1))
[ CSSD]2012-11-29 01:58:38.292 [90823568] >ERROR: clssgmclientlsnr: listening failed for (ADDRESS=(PROTOCOL=ipc)(KEY=Oracle_CSS_LclLstnr_crs_1)) (3)
[ CSSD]2012-11-29 01:58:38.292 [90823568] >TRACE: clssgmclientlsnr: listening on (ADDRESS=(PROTOCOL=ipc)(KEY=Oracle_CSS_LclLstnr_crs_1))
[ CSSD]2012-11-29 01:58:38.292 [90823568] >TRACE: clssgmclientlsnr: listening on (ADDRESS=(PROTOCOL=ipc)(KEY=OCSSD_LL_rac101_crs))
[ CSSD]2012-11-29 01:58:38.292 [90823568] >TRACE: clssgmclientlsnr: listening on (ADDRESS=(PROTOCOL=ipc)(KEY=OCSSD_LL_rac101_))
[ CSSD]2012-11-29 01:58:38.498 [50715536] >TRACE: clssnmvKillBlockThread: spawned for disk 0 (/dev/raw/raw2) initial sleep interval (1000)ms
[ CSSD]2012-11-29 01:58:39.366 [40201104] >TRACE: clssnmvReadDskHeartbeat: read ALL for Joining
[ CSSD]2012-11-29 01:58:40.432 [40201104] >TRACE: clssnmvReadDskHeartbeat: read ALL for Joining
[ CSSD]2012-11-29 01:58:41.543 [40201104] >TRACE: clssnmvReadDskHeartbeat: read ALL for Joining
…..
TRACE: clsclisten: Permission denied for (ADDRESS=(PROTOCOL=ipc)(KEY=Oracle_CSS_LclLstnr_crs_1))这个报错显示是权限不允许,查看下了/tmp和ocr、votedisk的权限都是正常的。
其实每次重启服务器css进程会在/tmp/.oracle或/var/tmp/.oracle目录创建一系列socket.如果以前存在的sockets不能被重用或自动删除,那么服务就不能启动了,感觉是权限改变了不能写入socket文件。
[root@rac101 crsd]# cd /var/tmp/.oracle/
[root@rac101 .oracle]# ls -ld
drwxrwxrwt 2 root root 4096 Nov 29 01:39 .
[root@rac101 .oracle]# ls -al
total 12
drwxrwxrwt 2 root root 4096 Nov 29 03:28 .
drwxrwxrwt 3 root root 4096 Nov 29 03:29 ..
srwxrwxrwx 1 root root 0 Nov 29 03:28 s#5058.1
srwxrwxrwx 1 root root 0 Nov 29 03:28 s#5058.2
srwxrwxrwx 1 root root 0 Nov 29 03:27 sArac101_crs_evm
srwxrwxrwx 1 root root 0 Nov 29 03:27 sCrac101_crs_evm
srwxrwxrwx 1 root root 0 Nov 29 03:27 sCRSD_UI_SOCKET
srwxrwxrwx 1 root root 0 Nov 29 03:27 sOCSSD_LL_rac101_
srwxrwxrwx 1 root root 0 Nov 29 03:27 sOCSSD_LL_rac101_crs
srwxrwxrwx 1 root root 0 Nov 29 03:27 sOracle_CSS_LclLstnr_crs_1
srwxrwxrwx 1 root root 0 Nov 29 03:27 sora_crsqs
srwxrwxrwx 1 root root 0 Nov 29 03:28 sora_racg_xiaoyu_rac101
srwxrwxrwx 1 root root 0 Nov 29 03:27 sprocr_local_conn_0_PROC
srwxrwxrwx 1 root root 0 Nov 29 03:27 srac101DBG_CRSD
srwxrwxrwx 1 root root 0 Nov 29 03:27 srac101DBG_CSSD
srwxrwxrwx 1 root root 0 Nov 29 03:27 srac101DBG_EVMD
srwxrwxrwx 1 root root 0 Nov 29 03:27 sSYSTEM.evm.acceptor.auth
问题已经很明显了,删除这个目录/var/tmp/.oracle后,重启机器后crs可以正常启动,关于rac的处理还是要借助crs的日志,系统的日志来得到详细的信息。
[root@rac101 .oracle]# ls -al
total 12
drwxrwxrwt 2 root root 4096 Nov 29 03:28 .
drwxrwxrwt 3 root root 4096 Nov 29 03:29 ..
srwxrwxrwx 1 oracle oinstall 0 Nov 29 03:28 s#5058.1
srwxrwxrwx 1 oracle oinstall 0 Nov 29 03:28 s#5058.2
srwxrwxrwx 1 oracle oinstall 0 Nov 29 03:27 sArac101_crs_evm
srwxrwxrwx 1 oracle oinstall 0 Nov 29 03:27 sCrac101_crs_evm
srwxrwxrwx 1 root root 0 Nov 29 03:27 sCRSD_UI_SOCKET
srwxrwxrwx 1 oracle oinstall 0 Nov 29 03:27 sOCSSD_LL_rac101_
srwxrwxrwx 1 oracle oinstall 0 Nov 29 03:27 sOCSSD_LL_rac101_crs
srwxrwxrwx 1 oracle oinstall 0 Nov 29 03:27 sOracle_CSS_LclLstnr_crs_1
srwxrwxrwx 1 root root 0 Nov 29 03:27 sora_crsqs
srwxrwxrwx 1 oracle oinstall 0 Nov 29 03:28 sora_racg_xiaoyu_rac101
srwxrwxrwx 1 root root 0 Nov 29 03:27 sprocr_local_conn_0_PROC
srwxrwxrwx 1 root root 0 Nov 29 03:27 srac101DBG_CRSD
srwxrwxrwx 1 oracle oinstall 0 Nov 29 03:27 srac101DBG_CSSD
srwxrwxrwx 1 oracle oinstall 0 Nov 29 03:27 srac101DBG_EVMD
srwxrwxrwx 1 oracle oinstall 0 Nov 29 03:27 sSYSTEM.evm.acceptor.auth