转 crs damon can't start 2个例子

###sample 1

"node 1 (10.198.127.5):

ps -ef|grep ora.crsd

root 45613166 47185944 0 10:24:35 pts/2 0:00 grep ora.crsd

 


node 2 (10.198.127.6):

[admin@pdbdb02:/home/admin]# ps -ef|grep crsd

root 14811216 1 0 Nov 08 - 1111:05 /db/db/oracleapp/11.2.0/grid/bin/crsd.bin reboot

root 39059458 23462112 0 10:25:08 pts/3 0:00 grep crsd

 

node 1 (10.198.127.5)
root 用户下手工启动CRS DAMON:
#

/db/db/oracleapp/11.2.0/grid/bin/crsctl start res ora.crsd -init

 

检查CRD DAMON 是否启动

ps -ef|grep crsd

 

如果node1 CRS 启动成功,节点1(10.198.127.5) /节点2(10.198.127.6),
分别运行如下命令,成功的话,说明修复成功,


##sample 2


ASM 11gR2 instance Can Not Mount OCR/Voting Diskgroup On RAC. (文档 ID 1095214.1)
In this Document
Symptoms
Cause
Solution

APPLIES TO:
Oracle Database - Enterprise Edition - Version 11.2.0.1.0 and later
Information in this document applies to any platform.
Relevance checked on 17-Jan-2015
SYMPTOMS
1) ASM instance could not mount the OCR/Voting (+OCRVDG) diskgroup on a RAC environment due to the next errors as seen in the CRSD log ( <GridHome>/log/<nodename>/crsd/crsd.log ):

 

2010-04-30 12:58:53.259: [ OCRASM][2566444800]proprasmo: Error in open/create file in dg [OCRVDG]
[ OCRASM][2566444800]SLOS : SLOS: cat=8, opn=kgfoOpenFile01, dep=15056, loc=kgfokge
ORA-17503: ksfdopn:DGOpenFile05 Failed to open file +OCRVDG.255.4294967295
ORA-17503: ksfdopn:2 Failed to open file +OCRVDG.255.4294967295
ORA-15001: diskgroup

2010-04-30 12:58:53.287: [ OCRASM][2566444800]proprasmo: kgfoCheckMount returned [6]
2010-04-30 12:58:53.287: [ OCRASM][2566444800]proprasmo: The ASM disk group OCRVDG is not found or not mounted
2010-04-30 12:58:53.288: [ OCRRAW][2566444800]proprioo: Failed to open [+OCRVDG]. Returned proprasmo() with [26]. Marking location as UNAVAILABLE.
2010-04-30 12:58:53.288: [ OCRRAW][2566444800]proprioo: No OCR/OLR devices are usable
2010-04-30 12:58:53.288: [ OCRASM][2566444800]proprasmcl: asmhandle is NULL
2010-04-30 12:58:53.288: [ OCRRAW][2566444800]proprinit: Could not open raw device
2010-04-30 12:58:53.288: [ OCRASM][2566444800]proprasmcl: asmhandle is NULL
2010-04-30 12:58:53.288: [ OCRAPI][2566444800]a_init:16!: Backend init unsuccessful : [26]
2010-04-30 12:58:53.288: [ CRSOCR][2566444800] OCR context init failure. Error: PROC-26: Error while accessing the physical storage ASM error [SLOS: cat=8, opn=kgfoOpenFile01, dep=15056, loc=kgfokge
ORA-17503: ksfdopn:DGOpenFile05 Failed to open file +OCRVDG.255.4294967295
ORA-17503: ksfdopn:2 Failed to open file +OCRVDG.255.4294967295
ORA-15001: diskgroup
] [8]
2010-04-30 12:58:53.288: [ CRSD][2566444800][PANIC] CRSD exiting: Could not init OCR, code: 26
2010-04-30 12:58:53.288: [ CRSD][2566444800] Done.

2) OCRVDG diskgroup was created using only one disk ('/asmocrvote1') as seen in the ASM alert.log:


SUCCESS: diskgroup OCRVDG was mounted
SUCCESS: CREATE DISKGROUP OCRVDG EXTERNAL REDUNDANCY DISK '/asmocrvote1' ATTRIBUTE 'compatible.asm'='11.2.0.0.0' /* ASMCA */

3) So you validated the '/asmocrvote1' device has a valid ASM header and is accessible through kfed from all the nodes:

 

# kfed read /asmocrvote1
kfbh.endian: 1 ; 0x000: 0x01
kfbh.hard: 130 ; 0x001: 0x82
kfbh.type: 1 ; 0x002: KFBTYP_DISKHEAD
kfbh.datfmt: 1 ; 0x003: 0x01
kfbh.block.blk: 0 ; 0x004: T=0 NUMB=0x0
kfbh.block.obj: 2147483648 ; 0x008: TYPE=0x8 NUMB=0x0
kfbh.check: 2396607692 ; 0x00c: 0x8ed954cc
kfbh.fcn.base: 0 ; 0x010: 0x00000000
kfbh.fcn.wrap: 0 ; 0x014: 0x00000000
kfbh.spare1: 0 ; 0x018: 0x00000000
kfbh.spare2: 0 ; 0x01c: 0x00000000
kfdhdb.driver.provstr: ORCLDISK ; 0x000: length=8
kfdhdb.driver.reserved[0]: 0 ; 0x008: 0x00000000
kfdhdb.driver.reserved[1]: 0 ; 0x00c: 0x00000000
kfdhdb.driver.reserved[2]: 0 ; 0x010: 0x00000000
kfdhdb.driver.reserved[3]: 0 ; 0x014: 0x00000000
kfdhdb.driver.reserved[4]: 0 ; 0x018: 0x00000000
kfdhdb.driver.reserved[5]: 0 ; 0x01c: 0x00000000
kfdhdb.compat: 186646528 ; 0x020: 0x0b200000
kfdhdb.dsknum: 0 ; 0x024: 0x0000
kfdhdb.grptyp: 1 ; 0x026: KFDGTP_EXTERNAL
kfdhdb.hdrsts: 3 ; 0x027: KFDHDR_MEMBER
kfdhdb.dskname: OCRVDG_0000 ; 0x028: length=11
kfdhdb.grpname: OCRVDG ; 0x048: length=6
kfdhdb.fgname: OCRVDG_0000 ; 0x068: length=11


4) But the problem persists.


CAUSE
ASM spfile was corrupt.

Using a transient ASM pfile the OCR/Voting (+OCRVDG) diskgroup was mounted.

SOLUTION

1) Please create a temporary pfile (on node #1 (+ASM1), with the next values on the <Grid Infrastructure Oracle Home>/dbs/init+ASM1.ora:

asm_diskgroups='OCRVDG'
asm_diskstring='ORCL:*'
asm_power_limit=1
instance_type='asm'
large_pool_size=12M
remote_login_passwordfile='EXCLUSIVE'

 

2) Then startup the +ASM1 instance:

$> cd < Grid Infrastructure Oracle Home>/dbs/

$> export ORACLE_SID=+ASM1

$> sqlplus “/as sysasm”

SQL> STARTUP pfile=init+ASM1.ora


3) Recreate the spfile


SQL> create spfile='+OCRVDG' from pfile='/opt/grid/product/11.2.0/grid_1/dbs/init+ASM1.ora';

File created.

 


4) Restarted the OHAS/CRS stack on both nodes:

4.1) Please shutdown all the services on both nodes as follow:

Node #1:

Connect as root user:

# /opt/grid/product/11.2.0/grid_1/bin/crsctl stop crs


Node #2:

Connect as root user:

# /opt/grid/product/11.2.0/grid_1/bin/crsctl stop crs


4.2) Verify that the Grid Infrastructure stack has shutdown successfully on both nodes. The following command should show no output if the GI stack has shutdown:

# ps -ef | grep diskmo[n]


4.3) Start the Grid Infrastructure stack on the first node:

# /opt/grid/product/11.2.0/grid_1/bin/crsctl start crs


4.4) Wait until the Grid Infrastructure stack has started successfully on the first node. To check the status of the Grid Infrastructure stack, run the following command and verify that the "ora.asm" instance is started. Note that the command below will continue to report that it is unable to communicate with the Grid Infrastructure software for several minutes after issuing the "crsctl start crs" command above:

# /opt/grid/product/11.2.0/grid_1/bin/crsctl status resource -t


4.5) Start the Grid Infrastructure stack on the remaining node:

# /opt/grid/product/11.2.0/grid_1/bin/crsctl start crs


4.6) Monitor the startup progress (this could take several minutes):

# /opt/grid/product/11.2.0/grid_1/bin/crsctl status resource -t


4.7) Verify OHAS, CRS & CSS are running on each node:

$> crsctl check has

$> crsctl check crs

$> crsctl check css

 

 

/db/db/oracleapp/11.2.0/grid//bin/crsctl stat res -t -init

/db/db/oracleapp/11.2.0/grid/bin/ocrcheck

 

/db/db/oracleapp/grid/diag/asm/+asm/+ASM1/trace/alert*.log
/db/db/oracleapp/11.2.0/grid/bin/oraagent.bin(6947032)]CRS-5019:All OCR locations are on ASM disk groups [OCRVD_DG], and
none of these disk groups are mounted. Details are at "(:CLSN00100:)" in "/db/db/oracleapp/11.2.0/grid/log/sdbdb01/agent/ohasd/oraagent_grid/oraagent_grid.log".


/db/db/oracleapp/11.2.0/grid/log/sdbdb01/agent/ohasd/oraagent_grid/oraagent_grid.log
2019-04-18 09:14:25.603: [ COMMCRS][5729]clsc_connect: (111438950) no listener at (ADDRESS=(PROTOCOL=IPC)(KEY=CRSD_UI_SOCKET))
clsnUtils::error Exception type=2 string=
Oracle Clusterware(CRS or Grid Infrastructure) network socket files are located in /tmp/.oracle, /usr/tmp/.oracle or /var/tmp/.oracle, it's important not to touch them manually unless instructed by Oracle Support to keep clusterware healthy.
checkCrsStat 2 CLSCRS_STAT ret: 184

 


alert_crsd.log
2018-01-19 20:16:28.336:
[client(9961752)]CRS-10051:CVU found following errors with Clusterware setup : PRVF-4354 : Proper hard limit for resource "maxi
mum user processes" not found on node "sdbdb01" [Expected = "16384" ; Found = "8192"]
PRVF-7543 : OS Kernel parameter "maxuproc" does not have proper value on node "sdbdb01" [Expected = "16384" ; Found = "8192"]
.
PRVF-7543 : OS Kernel parameter "tcp_ephemeral_low" does not have proper value on node "sdbdb01" [Expected = "9000" ; Found =
"32768"].
PRVF-4132 : Multiple users "root,admin" with UID "0" exist on "sdbdb01".
PRVG-1101 : SCAN name "sdbdb-scan" failed to resolve
PRVF-4657 : Name resolution setup check for "sdbdb-scan" (IP address: 58.2.101.5) failed
PRVF-4664 : Found inconsistent name resolution entries for SCAN name "sdbdb-scan"

 

[client(5242908)]CRS-10051:CVU found following errors with Clusterware setup : PRVF-7543 : OS Kernel parameter "tcp_ephemeral_low" does not have proper value on node "sdbdb01"
[Expected = "9000" ; Found = "32768"].
PRVF-7543 : OS Kernel parameter "tcp_ephemeral_high" does not have proper value on node "sdbdb01" [Expected = "65500" ; Found = "65535"].
PRVF-7543 : OS Kernel parameter "udp_ephemeral_low" does not have proper value on node "sdbdb01" [Expected = "9000" ; Found = "32768"].
PRVF-7543 : OS Kernel parameter "udp_ephemeral_high" does not have proper value on node "sdbdb01" [Expected = "65500" ; Found = "65535"].
PRVF-4132 : Multiple users "root,admin" with UID "0" exist on "sdbdb01".


PRVG-1101 : SCAN name "sdbdb-scan" failed to resolve

2018-01-11 19:30:11.672:
[client(3604514)]CRS-10051:CVU found following errors with Clusterware setup : PRVF-4657 : Name resolution setup check for "sdbdb-scan" (IP address: 58.2.101.5) failedPRVF-4664
: Found inconsistent name resolution entries for SCAN name "sdbdb-scan"

 

/db/db/oracleapp/11.2.0/grid/log/sdbdb01/crsd/crsd.log

2018-01-24 02:31:27.566: [ CRSMAIN][515] Policy Engine is not initialized yet!
[ CLWAL][1]clsw_Initialize: OLR initlevel [70000]
2018-01-24 02:31:27.886: [ OCRASM][1]proprasmo: Error in open/create file in dg [OCRVD_DG]
[ OCRASM][1]SLOS : SLOS: cat=8, opn=kgfoOpen01, dep=15056, loc=kgfokge

2018-01-24 02:31:27.886: [ OCRASM][1]ASM Error Stack :
2018-01-24 02:31:27.932: [ OCRASM][1]proprasmo: kgfoCheckMount returned [6]
2018-01-24 02:31:27.932: [ OCRASM][1]proprasmo: The ASM disk group OCRVD_DG is not found or not mounted
2018-01-24 02:31:27.933: [ OCRRAW][1]proprioo: Failed to open [+OCRVD_DG]. Returned proprasmo() with [26]. Marking location as UNAVAILABLE.
2018-01-24 02:31:27.933: [ OCRRAW][1]proprioo: No OCR/OLR devices are usable
2018-01-24 02:31:27.933: [ OCRASM][1]proprasmcl: asmhandle is NULL
2018-01-24 02:31:27.933: [ GIPC][1] gipcCheckInitialization: possible incompatible non-threaded init from [prom.c : 690], original from [clsss.c : 5343]
2018-01-24 02:31:27.936: [ default][1]clsvactversion:4: Retrieving Active Version from local storage.
2018-01-24 02:31:27.939: [ OCRRAW][1]proprrepauto: The local OCR configuration matches with the configuration published by OCR Cache Writer. No repair required.
2018-01-24 02:31:27.940: [ OCRRAW][1]proprinit: Could not open raw device
2018-01-24 02:31:27.940: [ OCRASM][1]proprasmcl: asmhandle is NULL
2018-01-24 02:31:27.941: [ OCRAPI][1]a_init:16!: Backend init unsuccessful : [26]
2018-01-24 02:31:27.941: [ CRSOCR][1] OCR context init failure. Error: PROC-26: Error while accessing the physical storage

2018-01-24 02:31:27.942: [ CRSD][1] Created alert : (:CRSD00111:) : Could not init OCR, error: PROC-26: Error while accessing the physical storage

 

alter diskgroup OCRVD_DG mount;

 

 

##sample2.2

1. failed
node 1 (10.198.209.52) root 用户下手工启动CRS DAMON:
# /db/user/oracleapp/11.2.0/grid/bin/crsctl start res ora.crsd -init


2.check gi alert.log

2020-11-09 19:11:06.514: [ OCRASM][1]proprasmo: The ASM disk group ocrvd_dg is not found or not mounted


ok
kfed read /dev/rhdiskpower6
kfed read /dev/rhdiskpower7
kfed read /dev/rhdiskpower8
kfed read /dev/rhdiskpower9
kfed read /dev/rhdiskpower10

 

 

fix:


alter diskgroup OCRVD_DG mount;

then
/db/user/oracleapp/11.2.0/grid/bin/crsctl start res ora.crsd -init

附录:
3.alert.log report this error:

vi alert*.log

2020-11-09 19:10:58.179:
[crsd(10486210)]CRS-0804:Cluster Ready Service aborted due to Oracle Cluster Registry error [PROC-26: Error while accessing the physical storage
]. Details at (:CRSD00111:) in /db/user/oracleapp/11.2.0/grid/log/puserdb04/crsd/crsd.log.
2020-11-09 19:10:58.710:
[ohasd(4587922)]CRS-2765:Resource 'ora.crsd' has failed on server 'puserdb04'.
2020-11-09 19:11:00.217:
[crsd(11141358)]CRS-1013:The OCR location in an ASM disk group is inaccessible. Details in /db/user/oracleapp/11.2.0/grid/log/puserdb04/crsd/crsd.
log.
2020-11-09 19:11:00.235:
[crsd(11141358)]CRS-0804:Cluster Ready Service aborted due to Oracle Cluster Registry error [PROC-26: Error while accessing the physical storage
]. Details at (:CRSD00111:) in /db/user/oracleapp/11.2.0/grid/log/puserdb04/crsd/crsd.log.
2020-11-09 19:11:00.813:
[ohasd(4587922)]CRS-2765:Resource 'ora.crsd' has failed on server 'puserdb04'.
2020-11-09 19:11:02.308:
[crsd(18022528)]CRS-1013:The OCR location in an ASM disk group is inaccessible. Details in /db/user/oracleapp/11.2.0/grid/log/puserdb04/crsd/crsd.
log.
2020-11-09 19:11:02.324:
[crsd(18022528)]CRS-0804:Cluster Ready Service aborted due to Oracle Cluster Registry error [PROC-26: Error while accessing the physical storage
]. Details at (:CRSD00111:) in /db/o

 

 

 

##sample 3

 

 


5.red-hat 7 RAC node2 主机重启动,GI 无法使用

cat /var/log/messages|grep avahi-daemon

该damon avahi-daemon 作用是检测网路中其他设备,比如打印机,共享文件,这个服务不用到,可以关闭

FIX:
[root@sodsdb02 sysconfig]# chkconfig avahi-daemon off
Note: Forwarding request to 'systemctl disable avahi-daemon.service'.
Removed symlink /etc/systemd/system/multi-user.target.wants/avahi-daemon.service.
Removed symlink /etc/systemd/system/sockets.target.wants/avahi-daemon.socket.
Removed symlink /etc/systemd/system/dbus-org.freedesktop.Avahi.service.


2.重新启动主机
shutdown -r now

 

posted @ 2019-04-18 10:36  feiyun8616  阅读(1115)  评论(0编辑  收藏  举报