crsctl start crs启动不了

亲爱的用户,您好:

1、gpnpd 进程 具体会尝试几次启动失败后,才会不再尝试重启,而保持 OFFLINE状态?

这个由ora.gpnpd资源的RESTART_ATTEMPTS属性决定。默认为10次。

crsctl stat res ora.gpnpd -p -init | grep RESTART_ATTEMPTS

RESTART_ATTEMPTS=10

2、 gpnpd 进程保持 OFFLINE状态,在哪里能看到这个 offline 状态?当时rac1上的crsctl start crs 根本无法启动,执行crsctl status res -t -init ,也是无法和集群通信报错的。只能在正常的rac2上才能正常执行 crsctl status res -t -init。

通常 crsctl status res -t -init 可以查看。
如果 crsctl status res -t -init 查看不了,需要查看 问题发生时段的 ohasd进程的trace日志文件(ohasd.trc)来确认。

谢谢

Oracle Support	- 21 days ago		[Notes]

亲爱的用户,您好!

您的更新已经收到,我们会尽快查看!感谢您的耐心等待。

谢谢

ZUPENG_LI@YMTC.COM - 21 days ago [Update from Customer]

您好,

“gpnpd 进程经过多次启动失败后,12/28 15:05 后不再尝试重启,保持 OFFLINE状态。
此后,在 12/30 您执行 crsctl 命令时,也因为 gpnpd 保持 OFFLINE 状态导致ocssd、ASM 无法启动而失败。”
<<<<<<
1、gpnpd 进程 具体会尝试几次启动失败后,才会不再尝试重启,而保持 OFFLINE状态?
2、 gpnpd 进程保持 OFFLINE状态,在哪里能看到这个 offline 状态?当时rac1上的crsctl start crs 根本无法启动,执行crsctl status res -t -init ,也是无法和集群通信报错的。只能在正常的rac2上才能正常执行 crsctl status res -t -init。

谢谢!

Oracle Support	- 25 days ago		[ODM Answer]

亲爱的用户,您好:

12月28日,主机1的网络是处于offline状态。

了解了。

当网络恢复正常后,此种情况,该如何处理以启动crs?

这种情况,需要手动把资源拉起来。

crsctl start res ora.gpnpd -init

查看资源状况
crsctl stat res -t -init

谢谢

Oracle Support	- 25 days ago		[ODM Question]

12月28日,主机1的网络是处于offline状态。

“此后,在 12/30 您执行 crsctl 命令时,也因为 gpnpd 保持 OFFLINE 状态导致
ocssd、ASM 无法启动而失败。”
<<<<<<<<
当网络恢复正常后,此种情况,该如何处理以启动crs?

ZUPENG_LI@YMTC.COM - 25 days ago [Update from Customer]

您好,

12月28日,主机1的网络是处于offline状态。

“此后,在 12/30 您执行 crsctl 命令时,也因为 gpnpd 保持 OFFLINE 状态导致
ocssd、ASM 无法启动而失败。”
<<<<<<<<
当网络恢复正常后,此种情况,该如何处理以启动crs?

感谢!

Best Regards

Oracle Support	- 28 days ago		[ODM Action Plan]

-------------------- ACTION PLAN DETAILS BELOW---------------------

亲爱的用户,您好:

感谢您的耐心等待,向您报告调查的进展。

从 gpnpd 的 trace 文件,可以看到,在 12/28 ,gpnpd 进程多次失败,
报 "no interfaces to filter in net data" 错误:

<gpnpd.trc>

2021-12-28 15:05:01.247 : CLSINET:4160576128: no interfaces to filter in net data <<<<<<<

2021-12-28 15:05:01.247 : GPNP:4160576128: clsgpnpd_lCheckIpTypes:
[at clsgpnpd.c:1719] Result: (1) CLSGPNP_ERR. (:GPNPD00120:) clsinet_ProfileGetNetData() failed, crv=1. <<<<<
2021-12-28 15:05:01.248 : GPNP:4160576128: clsgpnpd_term: [at clsgpnpd.c:1180] STOP GPnPD terminating. Closing connections...
2021-12-28 15:05:01.250 : default:4160576128: clsgpnpd_term STOP terminating.
2021-12-28 15:05:01.250 : GPNP:4160576128: clsgpnp_Term: [at clsgpnp0.c:1512] GPnP cli=gpnpd
2021-12-28 15:05:01.250 : GPNP:4160576128: clsgpnp_Term: [at clsgpnp0.c:1512] GPnP cli=clsinet
2021-12-28 15:05:01.251 : GPNP:4160576128: [at clsgpnp0.c:1443] Glob "gpnpd" ref dec (1) from "clsinet"
2021-12-28 15:05:01.252 : GPNP:4160576128: [at clsgpnp0.c:1430] Glob "gpnpd" terminated from "gpnpd" <<<<<<

gpnpd 进程经过多次启动失败后,12/28 15:05 后不再尝试重启,保持 OFFLINE
状态。

此后,在 12/30 您执行 crsctl 命令时,也因为 gpnpd 保持 OFFLINE 状态导致
ocssd、ASM 无法启动而失败。

综上所述,怀疑在 12/28 15:05 前后,私网网卡出现了故障。若要了解当时的详细
情形,麻烦您提供 12/28 15:05 前后的 OSWatcher 信息。如果没有 当时的数据,
请您和OS管理、网络管理人员协同,查看当时的网卡、网络通信等是否出现了问题。

Best Regards, 高 健 Oracle客户服务-中国数据库组

Oracle Support	- 28 days ago		[ODM Data Collection]

=== Data Collection ===

Filename = gpnpd.trc

2021-12-28 15:05:01.247 : CLSINET:4160576128: no interfaces to filter in net data

2021-12-28 15:05:01.247 : GPNP:4160576128: clsgpnpd_lCheckIpTypes:
[at clsgpnpd.c:1719] Result: (1) CLSGPNP_ERR. (:GPNPD00120:) clsinet_ProfileGetNetData() failed, crv=1. <<<<<
2021-12-28 15:05:01.248 : GPNP:4160576128: clsgpnpd_term: [at clsgpnpd.c:1180] STOP GPnPD terminating. Closing connections...
2021-12-28 15:05:01.250 : default:4160576128: clsgpnpd_term STOP terminating.
2021-12-28 15:05:01.250 : GPNP:4160576128: clsgpnp_Term: [at clsgpnp0.c:1512] GPnP cli=gpnpd
2021-12-28 15:05:01.250 : GPNP:4160576128: clsgpnp_Term: [at clsgpnp0.c:1512] GPnP cli=clsinet
2021-12-28 15:05:01.251 : GPNP:4160576128: [at clsgpnp0.c:1443] Glob "gpnpd" ref dec (1) from "clsinet"
2021-12-28 15:05:01.252 : GPNP:4160576128: [at clsgpnp0.c:1430] Glob "gpnpd" terminated from "gpnpd"

Filename = gpnpd.trc

Oracle Support	- 29 days ago		[ODM Data Collection]

=== Data Collection ===

Filename = crsd.trc

2021-12-28 13:13:17.563 : AGFW:2628663040: [ INFO] {1:39877:29678} Agfw Proxy Server received the message: CMD_COMPLETED[Proxy] ID 20482:8770684
2021-12-28 13:13:17.563 : AGFW:2628663040: [ INFO] {1:39877:29678} Agfw Proxy Server replying to the message: CMD_COMPLETED[Proxy] ID 20482:8770684
2021-12-28 13:13:17.573 :UiServer:1870640896: [ INFO] {1:39877:29678} Done for ctx=0x7fff1c062d40
2021-12-28 13:13:17.573 :UiServer:1870640896: [ INFO] {1:39877:29678} Informing CSS of successful CRS shutdown...
2021-12-28 13:13:17.574 :UiServer:1870640896: [ INFO] {1:39877:29678} Flushing repository write requests...
2021-12-28 13:13:17.574 : CRSD:1870640896: [ INFO] {1:39877:29678} Exiting on request of the Policy Engine...
2021-12-28 13:13:17.574 : CRSD:1870640896: [ INFO] {1:39877:29678} Done. <<<< last line

Filename = crsd.trc

Oracle Support	- 29 days ago		[ODM Data Collection]

=== Data Collection ===

Filename = alert_+ASM1.log

2021-12-28T13:13:28.533348+08:00
freeing rdom 4
freeing the fusion rht of pdb 4
freeing rdom 3
freeing the fusion rht of pdb 3
freeing rdom 2
freeing the fusion rht of pdb 2
freeing rdom 1
freeing the fusion rht of pdb 1
freeing rdom 0
freeing the fusion rht of pdb 0
2021-12-28T13:13:33.788148+08:00
Instance shutdown complete (OS id: 71392) <<<<<< last line

Filename = alert_+ASM1.log

Oracle Support	- 29 days ago		[ODM Issue Verification]

Verified the issue in the log file as noted below:

LOG FILE

Filename = node#1\alert.log
See the following error:

021-12-30 03:10:08.902 [CRSCTL(120577)]CRS-1013: The OCR location in an ASM disk group is inaccessible. Details in /grid/app/grid/diag/crs/p1rac1/crs/trace/crsctl_120577.trc.
2021-12-30 03:10:15.737 [CRSCTL(120720)]CRS-1013: The OCR location in an ASM disk group is inaccessible. Details in /grid/app/grid/diag/crs/p1rac1/crs/trace/crsctl_120720.trc.

Oracle Support	- 29 days ago		[ODM Data Collection]

=== Data Collection ===

Filename = crsctl_120577.trc

Trace file /grid/app/grid/diag/crs/p1rac1/crs/trace/crsctl_120577.trc
Oracle Database 19c Clusterware Release 19.0.0.0.0 - Production
Version 19.10.1.0.0 Copyright 1996, 2021 Oracle. All rights reserved.
default:4160564992: u_set_comp_error: comptype '103' : error '29' <<<<<<<<<<<
2021-12-30 03:10:02.479 : OCRRAW:4160564992: kgfnInitEnv env=0x7ffffffefef8 flags=0x0

2021-12-30 03:10:02.479 : OCRRAW:4160564992: kgfoCreateCtxExt2 trcflg: 0 [trclvl_in:3] ctx:0x5555562d16b0

2021-12-30 03:10:02.725 : OCRRAW:4160564992: kgxgncin: clsssinit: CLSS init failed with status 3

2021-12-30 03:10:02.725 : OCRRAW:4160564992: kgxgncin: clsssinit: return status 3 (0 SKGXN not av) from CLSS

2021-12-30 03:10:02.725 : OCRRAW:4160564992: kgfnFindLocalNode01: ORA-29701

2021-12-30 03:10:02.725*:kgfn.c@1381: kgfnFindLocalNode: ORA-29701 nmret=2
2021-12-30 03:10:02.725 : OCRRAW:4160564992: kgfnFindLocalNode: not ok

2021-12-30 03:10:02.725*:kgfn.c@1485: kgfnFindLocalNode: not ok
2021-12-30 03:10:02.725 : OCRRAW:4160564992: kgfnTgtInit: local node not found, free kgfnpds

2021-12-30 03:10:02.725*:kgfn.c@2271: kgfnTgtInit: not found
2021-12-30 03:10:02.725 : OCRRAW:4160564992: kgfnGetBeqData failed init target; inst=(null) flags=0x6000

2021-12-30 03:10:02.725*:kgfn.c@5993: kgfnGetBeqData: kgfnTgtInit failed, inst=NULL flags=0x6000
2021-12-30 03:10:02.729 : CLSNS:4160564992: clsns_SetTraceLevel:trace level set to 1.
2021-12-30 03:10:02.847 : OCRRAW:4160564992: 9607 Error 4 querying length of attr ASM_DISCOVERY_ADDRESS <<<

2021-12-30 03:10:02.851 : OCRRAW:4160564992: 9607 Error 4 querying length of attr ASM_STATIC_DISCOVERY_ADDRESS

2021-12-30 03:10:02.885 : OCRRAW:4160564992: 9325 Error 4 opening dom root in 0x555556559a50
......

2021-12-30 03:10:08.902*:kgfn.c@5513: kgfnConnect2: failed to connect
2021-12-30 03:10:08.902 : OCRRAW:4160564992: kgfnConnect2Retry: failed to connect connect after 2 attempts, 151s elapsed

2021-12-30 03:10:08.902 : OCRRAW:4160564992: kgfo_kge2slos error stack at kgfoAl06: ORA-15077: could not locate ASM instance serving a required diskgroup <<<<<<<<<<<

2021-12-30 03:10:08.902*:kgfo.c@1014: kgfo_kge2slos error stack at kgfoAl06: ORA-15077: could not locate ASM instance serving a required diskgroup

2021-12-30 03:10:08.902 : OCRRAW:4160564992: -- trace dump on error exit --

2021-12-30 03:10:08.902 : OCRRAW:4160564992: Error [kgfoAl06] in [kgfokge] at kgfo.c:3180

2021-12-30 03:10:08.902 : OCRRAW:4160564992: ORA-15077: could not locate ASM instance serving a required diskgroup

2021-12-30 03:10:08.902 : OCRRAW:4160564992: Category: 7

2021-12-30 03:10:08.902 : OCRRAW:4160564992: DepInfo: 15077

2021-12-30 03:10:08.902 : OCRRAW:4160564992: -- trace dump end --

OCRASM:4160564992: SLOS : SLOS: cat=7, opn=kgfoAl06, dep=15077, loc=kgfokge

2021-12-30 03:10:08.902 : OCRASM:4160564992: ASM Error Stack : ORA-15077: could not locate ASM instance serving a required diskgroup

2021-12-30 03:10:08.902 : OCRASM:4160564992: proprasmo: kgfoCheckMount returned [7]
2021-12-30 03:10:08.902 : OCRASM:4160564992: proprasmo: The ASM instance is down
2021-12-30 03:10:08.980 : OCRRAW:4160564992: proprioo: Failed to open [+DG_CRS_FEFL/p-rac/OCRFILE/registry.255.1078051179]. Returned proprasmo() with [26]. Marking location as UNAVAILABLE.
2021-12-30 03:10:08.980 : OCRRAW:4160564992: proprioo: No OCR/OLR devices are usable
OCRUTL:4160564992: u_fill_errorbuf: Error Info : [Insufficient quorum to open OCR devices]
default:4160564992: u_set_gbl_comp_error: comptype '107' : error '0'
2021-12-30 03:10:08.980 : OCRRAW:4160564992: proprinit: Could not open raw device
2021-12-30 03:10:08.980 : default:4160564992: a_init:7!: Backend init unsuccessful : [26]
2021-12-30 03:10:08.982 : default:4160564992: clsvactversion:4: Retrieving Active Version from local storage.

Filename = crsctl_120577.trc

Oracle Support	- 29 days ago		[ODM Data Collection]

=== Data Collection ===

Filename = node#1\alert.log

2021-12-29 15:17:26.195 [GIPCD(22619)]CRS-42216: No interfaces are configured on the local node for interface definition bond1(:.)?:20.20.88.0: available interface definitions are [eno1(:.)?:10.131.12.0][bond0(:.)?:10.20.28.0].
2021-12-29 15:17:26.221 [GIPCD(22619)]CRS-42216: No interfaces are configured on the local node for interface definition bond1(:.
)?:20.20.88.0: available interface definitions are [eno1(:.)?:10.131.12.0][bond0(:.)?:10.20.28.0].
2021-12-30 03:10:08.902 [CRSCTL(120577)]CRS-1013: The OCR location in an ASM disk group is inaccessible. Details in /grid/app/grid/diag/crs/p1rac1/crs/trace/crsctl_120577.trc.
2021-12-30 03:10:15.737 [CRSCTL(120720)]CRS-1013: The OCR location in an ASM disk group is inaccessible. Details in /grid/app/grid/diag/crs/p1rac1/crs/trace/crsctl_120720.trc.
2021-12-30 03:10:22.969 [CRSCTL(120872)]CRS-1013: The OCR location in an ASM disk group is inaccessible. Details in /grid/app/grid/diag/crs/p1rac1/crs/trace/crsctl_120872.trc.
2021-12-31 14:02:53.207 [CRSCTL(119476)]CRS-1013: The OCR location in an ASM disk group is inaccessible. Details in /grid/app/grid/diag/crs/p1rac1/crs/trace/crsctl_119476.trc.
2021-12-31 14:03:00.371 [CRSCTL(119668)]CRS-1013: The OCR location in an ASM disk group is inaccessible. Details in /grid/app/grid/diag/crs/p1rac1/crs/trace/crsctl_119668.trc.
2021-12-31 14:03:06.842 [OCRCONFIG(119799)]CRS-1013: The OCR location in an ASM disk group is inaccessible. Details in /grid/app/grid/diag/crs/p1rac1/crs/trace/ocrconfig_119799.trc.
2021-12-31 14:03:18.172 [OCRDUMP(121314)]CRS-1013: The OCR location in an ASM disk group is inaccessible. Details in /grid/app/grid/diag/crs/p1rac1/crs/trace/ocrdump_121314.trc.
2021-12-31 14:04:23.398 [CRSCTL(129904)]CRS-1013: The OCR location in an ASM disk group is inaccessible. Details in /grid/app/grid/diag/crs/p1rac1/crs/trace/crsctl_129904.trc.
2021-12-31 14:04:30.579 [CRSCTL(130934)]CRS-1013: The OCR location in an ASM disk group is inaccessible. Details in /grid/app/grid/diag/crs/p1rac1/crs/trace/crsctl_130934.trc.
2021-12-31 14:04:37.047 [OCRCONFIG(131208)]CRS-1013: The OCR location in an ASM disk group is inaccessible. Details in /grid/app/grid/diag/crs/p1rac1/crs/trace/ocrconfig_131208.trc.
2021-12-31 14:04:48.418 [OCRDUMP(132888)]CRS-1013: The OCR location in an ASM disk group is inaccessible. Details in /grid/app/grid/diag/crs/p1rac1/crs/trace/ocrdump_132888.trc.
2021-12-31 15:10:09.557 [CRSCTL(47630)]CRS-1013: The OCR location in an ASM disk group is inaccessible. Details in /grid/app/grid/diag/crs/p1rac1/crs/trace/crsctl_47630.trc.

Filename = node#1\alert.log

Oracle Support	- 29 days ago		[Notes]

亲爱的用户,您好:

关于 主机messages中发现的error信息, 目前尚不能确定它和 crs 启动不了的现象是否有关联。

此信息与如下文档的记载有些类似:

Error 'Multipathd: Asm!.Asm_ctl_spec: Failed To Store Path Info' found In /var/log/messages ( Doc ID 1268895.1 )

您可以尝试上述文档的方法,看看是否可以使得message 的信息消失。

我将继续调查 crs 启动不了的现象,若有进展,会再向您报告。

Best Regards, 高 健 Oracle客户服务-中国数据库组

posted @ 2022-02-04 16:46  武汉OracleDBA  阅读(2246)  评论(0编辑  收藏  举报