記錄一次CRS-0184: Cannot communicate with the CRS daemon的解決
1. 描述:
使用crs_stat –t 命令查看rac服務,直接報CRS-0184: Cannot communicate with the CRS daemon.錯誤
但是奇怪的是我們的DB是沒有問題的。sqlplus / as sysdba可以繼續登陸,并使用。
2. 錯誤分析:
首先查看警告日誌:錯誤從2016/07/13號開始
/grid/11.2.0/log/phars1/alertphars1.log
2016-07-13 16:04:49.616:
[crsd(21419)]CRS-2765:Resource 'ora.VOTDG.dg' has failed on server 'phars1'.
2016-07-13 16:04:49.702:
[crsd(21419)]CRS-2878:Failed to restart resource 'ora.VOTDG.dg'
2016-07-13 16:04:49.703:
[crsd(21419)]CRS-2769:Unable to failover resource 'ora.VOTDG.dg'.
2016-07-13 19:39:38.436:
[crsd(21419)]CRS-1006:The OCR location +VOTDG is inaccessible. Details in /grid/11.2.0/log/phars1/crsd/crsd.log.
2016-07-13 19:39:38.437:
[crsd(21419)]CRS-1006:The OCR location +VOTDG is inaccessible. Details in /grid/11.2.0/log/phars1/crsd/crsd.log.
2016-07-13 19:39:53.742:
[/grid/11.2.0/bin/oraagent.bin(30612)]CRS-5822:Agent '/grid/11.2.0/bin/oraagent_oracle' disconnected from server. Details at (:CRSAGF00117:) {0:11:9490} in /grid/11.2
.0/log/phars1/agent/crsd/oraagent_oracle/oraagent_oracle.log.
2016-07-13 19:39:53.742:
[/grid/11.2.0/bin/orarootagent.bin(21814)]CRS-5822:Agent '/grid/11.2.0/bin/orarootagent_root' disconnected from server. Details at (:CRSAGF00117:) {0:3:36} in /grid/1
1.2.0/log/phars1/agent/crsd/orarootagent_root/orarootagent_root.log.
2016-07-13 19:39:53.743:
[/grid/11.2.0/bin/oraagent.bin(21774)]CRS-5822:Agent '/grid/11.2.0/bin/oraagent_grid' disconnected from server. Details at (:CRSAGF00117:) {0:5:10} in /grid/11.2.0/log/phars1/agent/crsd/oraagent_grid/oraagent_grid.log.
2016-07-13 19:39:53.743:
[/grid/11.2.0/bin/scriptagent.bin(1919)]CRS-5822:Agent '/grid/11.2.0/bin/scriptagent_grid' disconnected from server. Details at (:CRSAGF00117:) {0:13:12} in /grid/11.
2.0/log/phars1/agent/crsd/scriptagent_grid/scriptagent_grid.log.
2016-07-13 19:39:53.745:
[ohasd(20149)]CRS-2765:Resource 'ora.crsd' has failed on server 'phars1'.
2016-07-13 19:39:55.153:
[crsd(16165)]CRS-1013:The OCR location in an ASM disk group is inaccessible. Details in /grid/11.2.0/log/phars1/crsd/crsd.log.
2016-07-13 19:39:55.162:
[crsd(16165)]CRS-0804:Cluster Ready Service aborted due to Oracle Cluster Registry error [PROC-26: Error while accessing the physical storage
]. Details at (:CRSD00111:) in /grid/11.2.0/log/phars1/crsd/crsd.log.
2016-07-13 19:39:55.774:
[ohasd(20149)]CRS-2765:Resource 'ora.crsd' has failed on server 'phars1'.
2016-07-13 19:39:57.201:
[crsd(16185)]CRS-1013:The OCR location in an ASM disk group is inaccessible. Details in /grid/11.2.0/log/phars1/crsd/crsd.log.
2016-07-13 19:39:57.210:
[crsd(16185)]CRS-0804:Cluster Ready Service aborted due to Oracle Cluster Registry error [PROC-26: Error while accessing the physical storage
]. Details at (:CRSD00111:) in /grid/11.2.0/log/phars1/crsd/crsd.log.
2016-07-13 19:39:57.814:
[ohasd(20149)]CRS-2765:Resource 'ora.crsd' has failed on server 'phars1'.
2016-07-13 19:39:59.206:
[crsd(16210)]CRS-1013:The OCR location in an ASM disk group is inaccessible. Details in /grid/11.2.0/log/phars1/crsd/crsd.log.
2016-07-13 19:39:59.214:
[crsd(16210)]CRS-0804:Cluster Ready Service aborted due to Oracle Cluster Registry error [PROC-26: Error while accessing the physical storage
]. Details at (:CRSD00111:) in /grid/11.2.0/log/phars1/crsd/crsd.log.
2016-07-13 19:39:59.843:
[ohasd(20149)]CRS-2765:Resource 'ora.crsd' has failed on server 'phars1'.
2016-07-13 19:40:01.237:
[crsd(16223)]CRS-1013:The OCR location in an ASM disk group is inaccessible. Details in /grid/11.2.0/log/phars1/crsd/crsd.log.
2016-07-13 19:40:01.245:
[crsd(16223)]CRS-0804:Cluster Ready Service aborted due to Oracle Cluster Registry error [PROC-26: Error while accessing the physical storage
]. Details at (:CRSD00111:) in /grid/11.2.0/log/phars1/crsd/crsd.log.
2016-07-13 19:40:01.872:
[ohasd(20149)]CRS-2765:Resource 'ora.crsd' has failed on server 'phars1'.
2016-07-13 19:40:03.263:
[crsd(16238)]CRS-1013:The OCR location in an ASM disk group is inaccessible. Details in /grid/11.2.0/log/phars1/crsd/crsd.log.
2016-07-13 19:40:03.273:
[crsd(16238)]CRS-0804:Cluster Ready Service aborted due to Oracle Cluster Registry error [PROC-26: Error while accessing the physical storage
]. Details at (:CRSD00111:) in /grid/11.2.0/log/phars1/crsd/crsd.log.
2016-07-13 19:40:03.900:
[ohasd(20149)]CRS-2765:Resource 'ora.crsd' has failed on server 'phars1'.
2016-07-13 19:40:05.293:
[crsd(16254)]CRS-1013:The OCR location in an ASM disk group is inaccessible. Details in /grid/11.2.0/log/phars1/crsd/crsd.log.
2016-07-13 19:40:05.302:
[crsd(16254)]CRS-0804:Cluster Ready Service aborted due to Oracle Cluster Registry error [PROC-26: Error while accessing the physical storage
]. Details at (:CRSD00111:) in /grid/11.2.0/log/phars1/crsd/crsd.log.
2016-07-13 19:40:05.929:
[ohasd(20149)]CRS-2765:Resource 'ora.crsd' has failed on server 'phars1'.
2016-07-13 19:40:07.325:
[crsd(16271)]CRS-1013:The OCR location in an ASM disk group is inaccessible. Details in /grid/11.2.0/log/phars1/crsd/crsd.log.
2016-07-13 19:40:07.335:
[crsd(16271)]CRS-0804:Cluster Ready Service aborted due to Oracle Cluster Registry error [PROC-26: Error while accessing the physical storage
]. Details at (:CRSD00111:) in /grid/11.2.0/log/phars1/crsd/crsd.log.
2016-07-13 19:40:07.956:
[ohasd(20149)]CRS-2765:Resource 'ora.crsd' has failed on server 'phars1'.
2016-07-13 19:40:09.346:
[crsd(16290)]CRS-1013:The OCR location in an ASM disk group is inaccessible. Details in /grid/11.2.0/log/phars1/crsd/crsd.log.
2016-07-13 19:40:09.355:
[crsd(16290)]CRS-0804:Cluster Ready Service aborted due to Oracle Cluster Registry error [PROC-26: Error while accessing the physical storage
]. Details at (:CRSD00111:) in /grid/11.2.0/log/phars1/crsd/crsd.log.
2016-07-13 19:40:09.985:
[ohasd(20149)]CRS-2765:Resource 'ora.crsd' has failed on server 'phars1'.
2016-07-13 19:40:11.376:
[crsd(16327)]CRS-1013:The OCR location in an ASM disk group is inaccessible. Details in /grid/11.2.0/log/phars1/crsd/crsd.log.
2016-07-13 19:40:11.386:
[crsd(16327)]CRS-0804:Cluster Ready Service aborted due to Oracle Cluster Registry error [PROC-26: Error while accessing the physical storage
]. Details at (:CRSD00111:) in /grid/11.2.0/log/phars1/crsd/crsd.log.
2016-07-13 19:40:12.013:
[ohasd(20149)]CRS-2765:Resource 'ora.crsd' has failed on server 'phars1'.
2016-07-13 19:40:13.401:
[crsd(16340)]CRS-1013:The OCR location in an ASM disk group is inaccessible. Details in /grid/11.2.0/log/phars1/crsd/crsd.log.
2016-07-13 19:40:13.411:
[crsd(16340)]CRS-0804:Cluster Ready Service aborted due to Oracle Cluster Registry error [PROC-26: Error while accessing the physical storage
]. Details at (:CRSD00111:) in /grid/11.2.0/log/phars1/crsd/crsd.log.
2016-07-13 19:40:14.053:
[ohasd(20149)]CRS-2765:Resource 'ora.crsd' has failed on server 'phars1'.
2016-07-13 19:40:14.053:
[ohasd(20149)]CRS-2771:Maximum restart attempts reached for resource 'ora.crsd'; will not restart.
2016-07-13 19:40:14.053:
[ohasd(20149)]CRS-2769:Unable to failover resource 'ora.crsd'.
分析上面這段日誌,過程就是 資源'ora.VOTDG.dg' failed=》嘗試重啟該資源=》重啟失敗=》OCR文件的位置+VOTDG無法訪問=》最後就導致了crs的異常,由於無法訪問物理存儲。=》嘗試重啟達到最大次數之後,放棄了重啟=》crsd失敗。
上面的全部證明就表示是由於VOTDG無法訪問,導致了crs服務的異常
接下來我們再看看/grid/11.2.0/log/phars1/crsd/crsd.log日誌
2016-07-13 16:04:49.615: [ AGFW][4118722304]{0:5:6} Agfw Proxy Server received the message: RESOURCE_STATUS[Proxy] ID 20481:162956
2016-07-13 16:04:49.615: [ AGFW][4118722304]{0:5:6} Verifying msg rid = ora.VOTDG.dg phars1 1
2016-07-13 16:04:49.615: [ AGFW][4118722304]{0:5:6} Received state change for ora.VOTDG.dg phars1 1 [old state = ONLINE, new state = OFFLINE] --這裡提示ora.VOTDG.dg的狀態變為了offline
2016-07-13 16:04:49.615: [ AGFW][4118722304]{0:5:6} Agfw Proxy Server sending message to PE, Contents = [MIDTo:2|OpID:3|FromA:{Invalid|Node:0|Process:0|Type:0}|ToA
:{Invalid|Node:-1|Process:-1|Type:-1}|MIDFrom:0|Type:4|Pri2|Id:287142:Ver:2]
2016-07-13 16:04:49.615: [ AGFW][4118722304]{0:5:6} Agfw Proxy Server replying to the message: RESOURCE_STATUS[Proxy] ID 20481:162956
2016-07-13 16:04:49.616: [ CRSPE][4108216064]{0:5:6} State change received from phars1 for ora.VOTDG.dg phars1 1
2016-07-13 16:04:49.616: [ CRSPE][4108216064]{0:5:6} Processing PE command id=13336. Description: [Resource State Change (ora.VOTDG.dg phars1 1) : 0x7fb470104850]
2016-07-13 16:04:49.616: [ CRSPE][4108216064]{0:5:6} RI [ora.VOTDG.dg phars1 1] new external state [OFFLINE] old value: [ONLINE] on phars1 label = []
2016-07-13 16:04:49.616: [ CRSD][4108216064]{0:5:6} {0:5:6} Resource Resource Instance ID[ora.VOTDG.dg phars1 1]. Values:
STATE=OFFLINE
TARGET=ONLINE
LAST_SERVER=phars1
CURRENT_RCOUNT=0
LAST_RESTART=0
FAILURE_COUNT=0
FAILURE_HISTORY=
STATE_DETAILS=
INCARNATION=0
STATE_CHANGE_VERS=0
LAST_FAULT=0
LAST_STATE_CHANGE=1468397089
INTERNAL_STATE=0
DEGREE_ID=1
ID=ora.VOTDG.dg phars1 1
Lock Info:
Write Locks:none
ReadLocks:|STATE INITED||ONLINE STATERECOVERED| has failed!
2016-07-13 16:04:49.616: [ CRSPE][4108216064]{0:5:6} Processing unplanned state change for [ora.VOTDG.dg phars1 1]
2016-07-13 16:04:49.617: [ CRSPE][4108216064]{0:5:6} Scheduled local recovery for [ora.VOTDG.dg phars1 1]
2016-07-13 16:04:49.617: [ CRSRPT][4106114816]{0:5:6} Published to EVM CRS_RESOURCE_STATE_CHANGE for ora.VOTDG.dg
2016-07-13 16:04:49.617: [ CRSPE][4108216064]{0:5:6} Op 0x7fb4700c89d0 has 5 WOs
2016-07-13 16:04:49.618: [ CRSPE][4108216064]{0:5:6} RI [ora.VOTDG.dg phars1 1] new internal state: [STARTING] old value: [STABLE]
2016-07-13 16:04:49.618: [ CRSPE][4108216064]{0:5:6} Sending message to agfw: id = 287144
2016-07-13 16:04:49.618: [ CRSPE][4108216064]{0:5:6} CRS-2672: Attempting to start 'ora.VOTDG.dg' on 'phars1'
2016-07-13 16:04:49.618: [ AGFW][4118722304]{0:5:6} Agfw Proxy Server received the message: RESOURCE_START[ora.VOTDG.dg phars1 1] ID 4098:287144
2016-07-13 16:04:49.619: [ AGFW][4118722304]{0:5:6} Agfw Proxy Server forwarding the message: RESOURCE_START[ora.VOTDG.dg phars1 1] ID 4098:287144 to the agent /gr
id/11.2.0/bin/oraagent_grid
2016-07-13 16:04:49.673: [ AGFW][4118722304]{0:5:6} Received the reply to the message: RESOURCE_START[ora.VOTDG.dg phars1 1] ID 4098:287145 from the agent /grid/11
.2.0/bin/oraagent_grid
2016-07-13 16:04:49.673: [ AGFW][4118722304]{0:5:6} Agfw Proxy Server sending the reply to PE for message:RESOURCE_START[ora.VOTDG.dg phars1 1] ID 4098:287144
2016-07-13 16:04:49.673: [ CRSPE][4108216064]{0:5:6} Received reply to action [Start] message ID: 287144
2016-07-13 16:04:49.701: [ AGFW][4118722304]{0:5:6} Received the reply to the message: RESOURCE_START[ora.VOTDG.dg phars1 1] ID 4098:287145 from the agent /grid/11
.2.0/bin/oraagent_grid
2016-07-13 16:04:49.701: [ AGFW][4118722304]{0:5:6} Agfw Proxy Server sending the last reply to PE for message:RESOURCE_START[ora.VOTDG.dg phars1 1] ID 4098:287144
2016-07-13 16:04:49.701: [ CRSPE][4108216064]{0:5:6} Received reply to action [Start] message ID: 287144
2016-07-13 16:04:49.701: [ CRSPE][4108216064]{0:5:6} RI [ora.VOTDG.dg phars1 1] new internal state: [STABLE] old value: [STARTING]
2016-07-13 16:04:49.701: [ CRSPE][4108216064]{0:5:6} CRS-2674: Start of 'ora.VOTDG.dg' on 'phars1' failed
這裡日誌也主要講'ora.VOTDG.dg' 失敗,導致crs的失敗
3. 錯誤解決:
①首先是提示我的crs服務不能通信,所以我首先去查看我的alert log 和 crs log
②通過查看crsd.log還看到下面這句話
2016-07-15 10:17:24.000: [ OCRASM][992749344]proprasmo: The ASM disk group VOTDG is not found or not mounted
這裡提示我的votedisk磁盤沒有找到或沒有mount
③因為我的DB是正常的,我去查看我的votedisk磁盤狀態
SQL> select name,state from v$asm_diskgroup;
NAME STATE
------------------------------ -----------
BACKUPDG CONNECTED
DATADG CONNECTED
SYSDG CONNECTED
VOTDG DISMOUNTED
這裡的確顯示我的votedisk dismounted了。正常狀態是必須mounted的
手動mount votedisk
grid@phars1: /home/grid> sqlplus / as sysasm --這裡注意要使用grid用戶的sysasm登陸
SQL*Plus: Release 11.2.0.4.0 Production on Fri Jul 15 11:38:40 2016
Copyright (c) 1982, 2013, Oracle. All rights reserved.
Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Real Application Clusters and Automatic Storage Management options
SQL> alter diskgroup VOTDG mount; --手動mount votedisk磁盤
Diskgroup altered.
這個在兩邊都要做。
然後重啟一下cluster服務,就好了。注意在沒有mount起來重啟是無效的。只有mount了之後才能正常停起
[root@phaws1 ~]# crsctl stop cluster -all
CRS-2796: The command may not proceed when Cluster Ready Services is not running
CRS-2796: The command may not proceed when Cluster Ready Services is not running
CRS-4704: Shutdown of Clusterware failed on node phaws1.
CRS-4704: Shutdown of Clusterware failed on node phaws2.
CRS-4000: Command Stop failed, or completed with errors.
[root@phaws1 ~]# crsctl start cluster -all
CRS-2672: Attempting to start 'ora.crsd' on 'phaws1'
CRS-2672: Attempting to start 'ora.crsd' on 'phaws2'
CRS-2676: Start of 'ora.crsd' on 'phaws1' succeeded
CRS-2676: Start of 'ora.crsd' on 'phaws2' succeeded
總結:crs異常主要是因為votedisk的無法訪問導致。主要還是要分析日誌,根據日誌得出正確的處理思路。