email:79996286@qq.com

hthf

  博客园 :: 首页 :: 博问 :: 闪存 :: 新随笔 :: 联系 :: 订阅 订阅 :: 管理 ::

原因:

在测试机上首次安装oracle11G RAC,安装完成后正常使用,过了一段时间后重启节点1测试是否可以自启动,解决节点1没有自启动,手工启动也无法启动

过程:

在节点一上运行:

# pwd

/u01/grid/bin

# ./crsctl start crs

CRS-4124: Oracle High Availability Services startup failed.

CRS-4000: Command Start failed, or completed with errors.

查看节点1日志

# pwd

/u01/grid/log/nodea/client

# cat crsctl_grid.log

Oracle Database 11g Clusterware Release 11.2.0.3.0 - Production Copyright 1996, 2011 Oracle. All rights reserved.

[ CLWAL][1]clsw_Initialize: OCR initlevel [3]

[ CLWAL][1]clsw_Initialize: OCR initlevel [3]

[ CLWAL][1]clsw_Initialize: OCR initlevel [3]

[ CLWAL][1]clsw_Initialize: OCR initlevel [3]

[ CLWAL][1]clsw_Initialize: OCR initlevel [3]

[ CLWAL][1]clsw_Initialize: OCR initlevel [3]

[ CLWAL][1]clsw_Initialize: OCR initlevel [3]

[ CLWAL][1]clsw_Initialize: OCR initlevel [3]

2014-04-09 22:45:20.882: [ CRSCTL][1]File /u01/grid/oc4j/j2ee/home/OC4J_DBWLM_config/system-jazn-data.xml was not modified, OCR key was empty

[ CLWAL][1]clsw_Initialize: OLR initlevel [30000]

2014-04-17 07:27:27.517: [ CRSCTL][1]File /u01/grid/oc4j/j2ee/home/OC4J_DBWLM_config/system-jazn-data.xml was not modified, OCR key was empty

2014-04-19 02:24:13.609: [ CRSCTL][1]File /u01/grid/oc4j/j2ee/home/OC4J_DBWLM_config/system-jazn-data.xml was not modified, OCR key was empty

2014-04-30 02:19:51.492: [GIPCXCPT][1] gipcmodClsaAuthStart: failuring during clsaauthmsg ret clsaretOSD (8), endp 1110bdd70 [0000000000000018] { gipcEndpoint : localAddr 'clsc://(ADDRESS=(PROTOCOL=ipc)(KEY=)(GIPCID=32b4238c-0bc8efcf-12779694))', remoteAddr 'clsc://(ADDRESS=(PROTOCOL=ipc)(KEY=OCSSD_LL_nodea_)(GIPCID=0bc8efcf-32b4238c-7078108))', numPend 5, numReady 0, numDone 2, numDead 0, numTransfer 0, objFlags 0x0, pidPeer 7078108, flags 0x2ca712, usrFlags 0x34000 }

2014-04-30 02:19:51.492: [GIPCXCPT][1] gipcmodClsaAuthStart: slos op : write

2014-04-30 02:19:51.492: [GIPCXCPT][1] gipcmodClsaAuthStart: slos dep : No space left on device (28)

2014-04-30 02:19:51.492: [GIPCXCPT][1] gipcmodClsaAuthStart: slos loc : authrespset5

2014-04-30 02:19:51.492: [GIPCXCPT][1] gipcmodClsaAuthStart: slos info: len -1 != expected 4

2014-04-30 02:19:51.493: [ CSSCLNT][1]clssscConnect: gipc request failed with 22 (12)

2014-04-30 02:19:51.493: [ CSSCLNT][1]clsssInitNative: connect to (ADDRESS=(PROTOCOL=ipc)(KEY=OCSSD_LL_nodea_)) failed, rc 22

发现关键问题:

2014-04-30 02:19:51.492: [GIPCXCPT][1] gipcmodClsaAuthStart: slos dep : No space left on device (28)

查看节点1磁盘空间,发现确实没有空间了

# df -g

Filesystem GB blocks Free %Used Iused %Iused Mounted on

/dev/hd4 0.25 0.05 79% 10134 44% /

/dev/hd2 2.06 0.13 94% 44051 57% /usr

/dev/hd9var 0.44 0.15 67% 6196 15% /var

/dev/hd3 10.00 2.08 80% 4367 1% /tmp

/dev/hd1 0.06 0.00 100% 73 46% /home

/dev/hd11admin 0.12 0.12 1% 5 1% /admin

/proc - - - - - /proc

/dev/hd10opt 0.38 0.18 51% 7044 14% /opt

/dev/livedump 0.25 0.25 1% 4 1% /var/adm/ras/livedump

/dev/fslv00 30.00 0.00 100% 54756 90% /u01

怀疑是数据库一直报警导致日志增大将空间占满了,进入oracle数据库告警日志

$ pwd

/u01/base/diag/rdbms/test/test1/trace

$ du -sg /u01/base/diag/rdbms/test/test1/trace

    1. /u01/base/diag/rdbms/test/test1/trace

删除所有告警日志,因为是测试库,所以不去查到底是什么原因导致数据库一直报警。节点2服务器磁盘空间没有占满。

重新使用root用户启动crs,提示crs已经启动,但是使用crs_stat没有查到进程,原因回来再查询吧

# id

uid=0(root) gid=0(system) groups=2(bin),3(sys),7(security),8(cron),10(audit),11(lp)

# pwd

/u01/grid/bin

# ./crsctl start crs

CRS-4640: Oracle High Availability Services is already active

CRS-4000: Command Start failed, or completed with errors.

 

2014-05-28日更新

节点2没有crs进程,原因没有查到,直接将2台服务器重新启动,反正是测试机,可以随意重启,重启后2台服务器的所有crs进程全部启动了。

$ su  -  grid
grid's Password:
$ crs_stat -t
Name           Type           Target    State     Host       
------------------------------------------------------------
ora.DATA.dg    ora....up.type ONLINE    ONLINE    nodea      
ora.DG01.dg    ora....up.type ONLINE    ONLINE    nodea      
ora....ER.lsnr ora....er.type ONLINE    ONLINE    nodea      
ora....N1.lsnr ora....er.type ONLINE    ONLINE    nodeb      
ora.asm        ora.asm.type   ONLINE    ONLINE    nodea      
ora.cvu        ora.cvu.type   ONLINE    ONLINE    nodeb      
ora.gsd        ora.gsd.type   OFFLINE   OFFLINE              
ora....network ora....rk.type ONLINE    ONLINE    nodea      
ora....SM1.asm application    ONLINE    ONLINE    nodea      
ora....EA.lsnr application    ONLINE    ONLINE    nodea      
ora.nodea.gsd  application    OFFLINE   OFFLINE              
ora.nodea.ons  application    ONLINE    ONLINE    nodea      
ora.nodea.vip  ora....t1.type ONLINE    ONLINE    nodea      
ora....SM2.asm application    ONLINE    ONLINE    nodeb      
ora....EB.lsnr application    ONLINE    ONLINE    nodeb      
ora.nodeb.gsd  application    OFFLINE   OFFLINE              
ora.nodeb.ons  application    ONLINE    ONLINE    nodeb      
ora.nodeb.vip  ora....t1.type ONLINE    ONLINE    nodeb      
ora.oc4j       ora.oc4j.type  ONLINE    ONLINE    nodeb      
ora.ons        ora.ons.type   ONLINE    ONLINE    nodea      
ora....ry.acfs ora....fs.type ONLINE    ONLINE    nodea      
ora.scan1.vip  ora....ip.type ONLINE    ONLINE    nodeb      
ora.test.db    ora....se.type ONLINE    ONLINE    nodea      
$

posted on 2014-05-03 13:07  hthf  阅读(1376)  评论(0编辑  收藏  举报