11gR2 RAC vip和network资源的依赖性与频繁failover
在11gR2中出现了Oracle Network resource类型的CRS resource用以监控Public Network:
[root@vrh1 ~]# crsctl status resource ora.net1.network -f NAME=ora.net1.network TYPE=ora.network.type STATE=ONLINE TARGET=ONLINE ACL=owner:root:rwx,pgrp:root:r-x,other::r--,group:oinstall:r-x,user:grid:r-x ACTION_FAILURE_TEMPLATE= ACTION_SCRIPT= AGENT_FILENAME=%CRS_HOME%/bin/orarootagent%CRS_EXE_SUFFIX% ALIAS_NAME= AUTO_START=restore CHECK_INTERVAL=1 CREATION_SEED=1 DEFAULT_TEMPLATE= DEGREE=1 DESCRIPTION=Oracle Network resource ENABLED=1 ID=ora.net1.network LOAD=1 LOGGING_LEVEL=1 NLS_LANG= NOT_RESTARTING_TEMPLATE= OFFLINE_CHECK_INTERVAL=60 PROFILE_CHANGE_TEMPLATE= RESTART_ATTEMPTS=5 SCRIPT_TIMEOUT=60 START_DEPENDENCIES= START_TIMEOUT=0 STATE_CHANGE_TEMPLATE= STOP_DEPENDENCIES= STOP_TIMEOUT=0 TYPE_VERSION=2.2 UPTIME_THRESHOLD=1d USR_ORA_AUTO=static USR_ORA_ENV= USR_ORA_IF=eth0 USR_ORA_NETMASK=255.255.255.0 USR_ORA_SUBNET=192.168.1.0 VERSION=11.2.0.3.0 [root@vrh1 ~]# crsctl status resource ora.vrh1.vip -f NAME=ora.vrh1.vip TYPE=ora.cluster_vip_net1.type STATE=ONLINE TARGET=ONLINE ACL=owner:root:rwx,pgrp:root:r-x,other::r--,group:oinstall:r-x,user:grid:r-x ACTION_FAILURE_TEMPLATE= ACTION_SCRIPT= ACTIVE_PLACEMENT=1 AGENT_FILENAME=%CRS_HOME%/bin/orarootagent%CRS_EXE_SUFFIX% AUTO_START=restore CARDINALITY=1 CARDINALITY_ID=0 CHECK_INTERVAL=1 CHECK_TIMEOUT=30 CREATION_SEED=34700 DEFAULT_TEMPLATE=PROPERTY(RESOURCE_CLASS=vip) DEGREE=1 DESCRIPTION=Oracle VIP resource ENABLED=1 FAILOVER_DELAY=0 FAILURE_INTERVAL=0 FAILURE_THRESHOLD=0 GEN_USR_ORA_STATIC_VIP= GEN_USR_ORA_VIP= HOSTING_MEMBERS=vrh1 ID=ora.vrh1.vip LOAD=1 LOGGING_LEVEL=2 NLS_LANG= NOT_RESTARTING_TEMPLATE= OFFLINE_CHECK_INTERVAL=0 PLACEMENT=favored PROFILE_CHANGE_TEMPLATE= RESTART_ATTEMPTS=0 SCRIPT_TIMEOUT=60 SERVER_POOLS=* START_DEPENDENCIES=hard(ora.net1.network) pullup(ora.net1.network) START_TIMEOUT=0 STATE_CHANGE_TEMPLATE= STOP_DEPENDENCIES=hard(ora.net1.network) STOP_TIMEOUT=0 TYPE_VERSION=2.1 UPTIME_THRESHOLD=1h USR_ORA_ENV= USR_ORA_VIP=vrh1-vip VERSION=11.2.0.3.0 [root@vrh1 ~]# crsctl status res ora.LISTENER.lsnr -f NAME=ora.LISTENER.lsnr TYPE=ora.listener.type STATE=ONLINE TARGET=ONLINE ACL=owner:grid:rwx,pgrp:oinstall:rwx,other::r-- ACTION_FAILURE_TEMPLATE= ACTION_SCRIPT=%CRS_HOME%/bin/racgwrap%CRS_SCRIPT_SUFFIX% AGENT_FILENAME=%CRS_HOME%/bin/oraagent%CRS_EXE_SUFFIX% ALIAS_NAME=ora.%CRS_CSS_NODENAME_LOWER_CASE%.LISTENER_%CRS_CSS_NODENAME_UPPER_CASE%.lsnr AUTO_START=restore CHECK_INTERVAL=60 CHECK_TIMEOUT=30 CREATION_SEED=35089 DEFAULT_TEMPLATE=PROPERTY(RESOURCE_CLASS=listener) PROPERTY(LISTENER_NAME=PARSE(%NAME%, ., 2)) DEGREE=1 DESCRIPTION=Oracle Listener resource ENABLED=1 ENDPOINTS=TCP:1521 ID=ora.LISTENER.lsnr LOAD=1 LOGGING_LEVEL=5 NLS_LANG= NOT_RESTARTING_TEMPLATE= OFFLINE_CHECK_INTERVAL=0 ORACLE_HOME=%CRS_HOME% PORT=1521 PROFILE_CHANGE_TEMPLATE= RESTART_ATTEMPTS=5 SCRIPT_TIMEOUT=60 START_DEPENDENCIES=hard(type:ora.cluster_vip_net1.type) pullup(type:ora.cluster_vip_net1.type) START_TIMEOUT=180 STATE_CHANGE_TEMPLATE= STOP_DEPENDENCIES=hard(intermediate:type:ora.cluster_vip_net1.type) STOP_TIMEOUT=0 TYPE_VERSION=1.2 UPTIME_THRESHOLD=1d USR_ORA_ENV=ORACLE_BASE=/g01/orabase USR_ORA_OPI=false VERSION=11.2.0.3.0可以看到在11.2 RAC中Listener资源依赖于VIP, 而VIP资源依赖于ora.net1.network;这就造成了当public network短时不可用(或曰network hiccup)时会造成ora.net1.network资源OFFLINE,这就将造成该节点上VIP资源的FAILOVER和LISTENER的OFFLINE。 且由于在11.2上ora.net1.network资源的默认CHECK_INTERVAL=1,即每秒都会对该NETWORK资源进行监控,所以NETWORK资源变得十分敏感,不管是由于硬件网络亦或者较高的主机负载造成短时的Public Network不可用,都可能导致VIP和LISTENER由于NETWORK依赖资源OFFLINE而受到影响。 我们来实际体会一下:
[root@vrh1 orarootagent_root]# pwd /g01/11.2.0/grid/log/vrh1/agent/crsd/orarootagent_root 2012-12-21 02:04:13.600: [ora.vrh1.vip][2281658688][ENTER] {1:62611:2} [check] START_DEPENDENCIES of type ora.network.type: ora.net1.network 2012-12-21 02:04:13.600: [ora.vrh1.vip][2281658688][ENTER] {1:62611:2} [check] START_DEPENDENCIES of type ora.network.type: ora.net1.network 2012-12-21 02:04:14.596: [ora.vrh1.vip][2281658688][ENTER] {1:62611:2} [check] START_DEPENDENCIES of type ora.network.type: ora.net1.network 2012-12-21 02:04:14.596: [ora.vrh1.vip][2281658688][ENTER] {1:62611:2} [check] START_DEPENDENCIES of type ora.network.type: ora.net1.network 2012-12-21 02:04:14.596: [ora.vrh1.vip][2281658688][ENTER] {1:62611:2} [check] START_DEPENDENCIES of type ora.network.type: ora.net1.network 2012-12-21 02:04:14.597: [ora.vrh1.vip][2281658688][ENTER] {1:62611:2} [check] START_DEPENDENCIES of type ora.network.type: ora.net1.network 2012-12-21 02:04:15.602: [ora.vrh1.vip][1266022720][ENTER] {1:62611:2} [check] START_DEPENDENCIES of type ora.network.type: ora.net1.network 2012-12-21 02:04:15.603: [ora.vrh1.vip][1266022720][ENTER] {1:62611:2} [check] START_DEPENDENCIES of type ora.network.type: ora.net1.network 2012-12-21 02:04:15.603: [ora.vrh1.vip][1266022720][ENTER] {1:62611:2} [check] START_DEPENDENCIES of type ora.network.type: ora.net1.network 2012-12-21 02:04:15.603: [ora.vrh1.vip][1266022720][ENTER] {1:62611:2} [check] START_DEPENDENCIES of type ora.network.type: ora.net1.network 2012-12-21 02:04:16.598: [ora.vrh1.vip][2281658688][ENTER] {1:62611:2} [check] START_DEPENDENCIES of type ora.network.type: ora.net1.network 2012-12-21 02:04:16.598: [ora.vrh1.vip][2281658688][ENTER] {1:62611:2} [check] START_DEPENDENCIES of type ora.network.type: ora.net1.network 2012-12-21 02:04:16.598: [ora.vrh1.vip][2281658688][ENTER] {1:62611:2} [check] START_DEPENDENCIES of type ora.network.type: ora.net1.network 2012-12-21 02:04:16.598: [ora.vrh1.vip][2281658688][ENTER] {1:62611:2} [check] START_DEPENDENCIES of type ora.network.type: ora.net1.network 2012-12-21 02:04:17.594: [ora.vrh1.vip][2281658688][ENTER] {1:62611:2} [check] START_DEPENDENCIES of type ora.network.type: ora.net1.network [root@vrh1 orarootagent_root]# lsof |grep orarootagent_root.log orarootag 3835 root 4w REG 253,0 9109896 4717997 /g01/11.2.0/grid/log/vrh1/agent/ohasd/orarootagent_root/orarootagent_root.log orarootag 23559 root 4w REG 253,0 3166410 4723518 /g01/11.2.0/grid/log/vrh1/agent/crsd/orarootagent_root/orarootagent_root.log [root@vrh1 orarootagent_root]# ps -ef|egrep "3835|23559"|grep -v grep root 3835 1 0 Oct23 ? 06:57:18 /g01/11.2.0/grid/bin/orarootagent.bin root 23559 1 0 Nov20 ? 04:06:24 /g01/11.2.0/grid/bin/orarootagent.bin可以看到如上orarootagent.bin进程每秒都会对VIP的依赖资源ora.net1.network做心跳检测,我们手动在3s内关闭和重启NETWORK资源,并观测结果:
2012-12-21 02:09:05.781: [ AGFW][1400256832] {1:62611:2} ora.net1.network vrh1 1 state changed from: ONLINE to: OFFLINE 2012-12-21 02:09:05.781: [ AGFW][1400256832] {1:62611:2} Switching online monitor to offline one 2012-12-21 02:09:05.781: [ AGFW][1400256832] {1:62611:2} Started implicit monitor for [ora.net1.network vrh1 1] interval=60000 delay=60000 2012-12-21 02:09:05.781: [ AGFW][1400256832] {0:1:53734} Generating new Tint for unplanned state change. Original Tint: {1:62611:2} 2012-12-21 02:09:05.781: [ AGFW][1400256832] {0:1:53734} Agent sending message to PE: RESOURCE_STATUS[Proxy] ID 20481:24047944 2012-12-21 02:09:05.781: [ora.vrh1.vip][1333139776][ENTER] {1:62611:2} [check] START_DEPENDENCIES of type ora.network.type: ora.net1.network 2012-12-21 02:09:05.782: [ora.vrh1.vip][1333139776][ENTER] {1:62611:2} [check] START_DEPENDENCIES of type ora.network.type: ora.net1.network 2012-12-21 02:09:05.782: [ora.vrh1.vip][1333139776] {1:62611:2} [check] VipAgent::checkIp returned false 2012-12-21 02:09:05.782: [ora.vrh1.vip][1333139776][ENTER] {1:62611:2} [check] START_DEPENDENCIES of type ora.network.type: ora.net1.network 2012-12-21 02:09:05.782: [ora.vrh1.vip][1333139776][ENTER] {1:62611:2} [check] START_DEPENDENCIES of type ora.network.type: ora.net1.network 2012-12-21 02:09:05.790: [ AGFW][1400256832] {0:1:53734} Agent received the message: RESOURCE_PROBE[ora.vrh1.vip 1 1] ID 4097:417234 2012-12-21 02:09:05.791: [ AGFW][1400256832] {0:1:53734} Preparing CHECK command for: ora.vrh1.vip 1 1 2012-12-21 02:09:05.876: [ AGFW][1400256832] {0:1:53734} Agent received the message: RESOURCE_START[ora.net1.network vrh1 1] ID 4098:417237 2012-12-21 02:09:05.876: [ AGFW][1400256832] {0:1:53734} Preparing START command for: ora.net1.network vrh1 1 2012-12-21 02:09:05.876: [ AGFW][1400256832] {0:1:53734} ora.net1.network vrh1 1 state changed from: OFFLINE to: STARTING 2012-12-21 02:09:05.878: [ AGFW][1400256832] {0:1:53734} Agent received the message: RESOURCE_PROBE[ora.scan1.vip 1 1] ID 4097:417241 2012-12-21 02:09:05.878: [ AGFW][1400256832] {0:1:53734} Preparing CHECK command for: ora.scan1.vip 1 1 2012-12-21 02:09:05.878: [ora.net1.network][1266022720] {0:1:53734} [start] (:CLSN00107:) clsn_agent::start { 2012-12-21 02:09:05.878: [ora.net1.network][1266022720] {0:1:53734} [start] NetworkAgent::init enter { 2012-12-21 02:09:05.878: [ora.scan1.vip][2281658688] {0:1:53734} [check] VipAgent::checkIp returned false 2012-12-21 02:09:05.878: [ AGFW][1400256832] {0:1:53734} ora.scan1.vip 1 1 state changed from: ONLINE to: OFFLINE 2012-12-21 02:09:05.878: [ AGFW][1400256832] {0:1:53735} Generating new Tint for unplanned state change. Original Tint: {0:1:53734} 2012-12-21 02:09:05.878: [ AGFW][1400256832] {0:1:53735} Agent sending message to PE: RESOURCE_STATUS[Proxy] ID 20481:24047956 2012-12-21 02:09:05.878: [ AGFW][1400256832] {0:1:53735} Agent sending last reply for: RESOURCE_PROBE[ora.scan1.vip 1 1] ID 4097:417241 2012-12-21 02:09:06.056: [ora.net1.network][1266022720] {0:1:53734} [start] Checking if eth0 Interface is fine 2012-12-21 02:09:06.176: [ora.net1.network][1266022720] {0:1:53734} [start] synchStatic { 2012-12-21 02:09:06.176: [ora.net1.network][1266022720] {0:1:53734} [start] synchStatic } 2012-12-21 02:09:06.177: [ora.net1.network][1266022720] {0:1:53734} [start] unsynchStatic { 2012-12-21 02:09:06.177: [ora.net1.network][1266022720] {0:1:53734} [start] unsynchStatic } 2012-12-21 02:09:06.347: [ USRTHRD][1333139776] {1:62611:2} Thread:[SendFail2SrvThread:]start { 2012-12-21 02:09:06.348: [ USRTHRD][1333139776] {1:62611:2} Thread:[SendFail2SrvThread:]start } 2012-12-21 02:09:06.348: [ AGFW][1400256832] {1:62611:2} ora.vrh1.vip 1 1 state changed from: ONLINE to: OFFLINE 2012-12-21 02:09:06.348: [ora.vrh1.vip][1333139776][ENTER] {0:1:53734} [check] START_DEPENDENCIES of type ora.network.type: ora.net1.network 2012-12-21 02:09:06.348: [ AGFW][1400256832] {0:1:53736} Generating new Tint for unplanned state change. Original Tint: {1:62611:2} 2012-12-21 02:09:06.348: [ AGFW][1400256832] {0:1:53736} Agent sending message to PE: RESOURCE_STATUS[Proxy] ID 20481:24047960 2012-12-21 02:09:06.349: [ora.vrh1.vip][1333139776][ENTER] {0:1:53734} [check] START_DEPENDENCIES of type ora.network.type: ora.net1.network 2012-12-21 02:09:06.349: [ora.vrh1.vip][1333139776] {0:1:53734} [check] VipAgent::checkIp returned false 2012-12-21 02:09:06.349: [ora.vrh1.vip][1333139776][ENTER] {0:1:53734} [check] START_DEPENDENCIES of type ora.network.type: ora.net1.network 2012-12-21 02:09:06.349: [ora.vrh1.vip][1333139776][ENTER] {0:1:53734} [check] START_DEPENDENCIES of type ora.network.type: ora.net1.network 2012-12-21 02:09:06.350: [ AGFW][1400256832] {0:1:53734} Agent sending last reply for: RESOURCE_PROBE[ora.vrh1.vip 1 1] ID 4097:417234 2012-12-21 02:09:06.395: [ AGENT][1266022720] {0:1:53734} UserErrorException: Locale is 2012-12-21 02:09:06.445: [ USRTHRD][1467373888] {1:62611:2} VipAgent::sendFail2Srv { 2012-12-21 02:09:06.535: [ USRTHRD][1467373888] {1:62611:2} VipAgent::sendFail2Srv } 2012-12-21 02:09:06.535: [ USRTHRD][1467373888] {1:62611:2} Thread:[SendFail2SrvThread:]isRunning is reset to false here 2012-12-21 02:09:06.687: [ora.net1.network][1266022720] {0:1:53734} [start] CRS-5008: Invalid attribute value: eth0 for the network interface 2012-12-21 02:09:06.687: [ora.net1.network][1266022720] {0:1:53734} [start] NetworkAgent::init exit } 2012-12-21 02:09:06.687: [ USRTHRD][1266022720] {0:1:53734} Thread:[VipRelocate:]stop { 2012-12-21 02:09:06.687: [ USRTHRD][1266022720] {0:1:53734} Thread:[VipRelocate:]stop } 2012-12-21 02:09:06.725: [ora.net1.network][1266022720] {0:1:53734} [start] Ocr Context init default level 45001104 2012-12-21 02:09:06.726: [ora.net1.network][1266022720] {0:1:53734} [start] Ocr Context init default level 45001104 2012-12-21 02:09:06.836: [ora.net1.network][1266022720] {0:1:53734} [start] (:CLSN00107:) clsn_agent::start } 2012-12-21 02:09:06.836: [ AGFW][1266022720] {0:1:53734} Command: start for resource: ora.net1.network vrh1 1 completed with status: SUCCESS 2012-12-21 02:09:06.836: [ora.net1.network][1266022720] {0:1:53734} [check] NetworkAgent::init enter { 2012-12-21 02:09:06.836: [ AGFW][1400256832] {0:1:53734} Agent sending reply for: RESOURCE_START[ora.net1.network vrh1 1] ID 4098:417237 2012-12-21 02:09:06.836: [ora.net1.network][1266022720] {0:1:53734} [check] Checking if eth0 Interface is fine 2012-12-21 02:09:06.837: [ AGENT][1266022720] {0:1:53734} UserErrorException: Locale is 2012-12-21 02:09:06.837: [ora.net1.network][1266022720] {0:1:53734} [check] CRS-5008: Invalid attribute value: eth0 for the network interface 2012-12-21 02:09:06.837: [ora.net1.network][1266022720] {0:1:53734} [check] NetworkAgent::init exit } 2012-12-21 02:09:07.184: [ora.net1.network][1266022720] {0:1:53734} [check] NetInterface::scheckNetInterface returned 0 2012-12-21 02:09:07.184: [ora.net1.network][1266022720] {0:1:53734} [check] NetworkAgent::checkInterface returned false 2012-12-21 02:09:07.185: [ AGFW][1400256832] {0:1:53734} ora.net1.network vrh1 1 state changed from: STARTING to: OFFLINE 2012-12-21 02:09:07.185: [ AGFW][1400256832] {0:1:53734} Switching online monitor to offline one 2012-12-21 02:09:07.185: [ AGFW][1400256832] {0:1:53734} Started implicit monitor for [ora.net1.network vrh1 1] interval=60000 delay=60000 2012-12-21 02:09:07.185: [ AGFW][1400256832] {0:1:53734} Agent sending last reply for: RESOURCE_START[ora.net1.network vrh1 1] ID 4098:417237可以看到如上在Public Network DOWN之后1s内orarootagent进程即检测到并试图重启ora.net1.network,重启失败后将VIP置为OFFLINE状态,随后LISTENER也被OFFLINE,之后当Public Network恢复后,network、vip、listener相继自动恢复:
2012-12-21 02:10:07.196: [ora.net1.network][1333139776] {0:1:53734} [check] NetworkAgent::init enter { 2012-12-21 02:10:07.196: [ora.net1.network][1333139776] {0:1:53734} [check] Checking if eth0 Interface is fine 2012-12-21 02:10:07.217: [ora.net1.network][1333139776] {0:1:53734} [check] ifname=eth0 2012-12-21 02:10:07.217: [ora.net1.network][1333139776] {0:1:53734} [check] subnetmask=255.255.255.0 2012-12-21 02:10:07.217: [ora.net1.network][1333139776] {0:1:53734} [check] subnetnumber=192.168.1.0 2012-12-21 02:10:07.217: [ora.net1.network][1333139776] {0:1:53734} [check] Interface Name = eth0 2012-12-21 02:10:07.217: [ora.net1.network][1333139776] {0:1:53734} [check] Primary IP = 192.168.1.161 2012-12-21 02:10:07.217: [ora.net1.network][1333139776] {0:1:53734} [check] NetworkAgent::init exit } 2012-12-21 02:10:07.218: [ USRTHRD][1333139776] {0:1:53734} Thread:[VipRelocate:]start { 2012-12-21 02:10:07.218: [ USRTHRD][1333139776] {0:1:53734} Thread:[VipRelocate:]start } 2012-12-21 02:10:07.220: [ AGFW][1400256832] {0:1:53734} ora.net1.network vrh1 1 state changed from: OFFLINE to: ONLINE 2012-12-21 02:10:07.220: [ AGFW][1400256832] {0:1:53734} Switching offline monitor to online one 2012-12-21 02:10:07.220: [ AGFW][1400256832] {0:1:53734} Started implicit monitor for [ora.net1.network vrh1 1] interval=1000 delay=1000 2012-12-21 02:10:07.220: [ AGFW][1400256832] {0:1:53739} Generating new Tint for unplanned state change. Original Tint: {0:1:53734} 2012-12-21 02:10:07.220: [ AGFW][1400256832] {0:1:53739} Agent sending message to PE: RESOURCE_STATUS[Proxy] ID 20481:24047975 2012-12-21 02:10:07.244: [ USRTHRD][1266022720] {0:1:53734} Relocating Node Vip Resource 2012-12-21 02:10:07.245: [ USRTHRD][1266022720] {0:1:53734} synchStatic { 2012-12-21 02:10:07.245: [ USRTHRD][1266022720] {0:1:53734} synchStatic } 2012-12-21 02:10:07.270: [ GIPC][1266022720] gipcCheckInitialization: possible incompatible non-threaded init from [clsss.c : 5326], original from [clsnOraRootAgent. cpp : 208] 2012-12-21 02:10:07.278: [ USRTHRD][1266022720] {0:1:53734} unsynchStatic { 2012-12-21 02:10:07.279: [ USRTHRD][1266022720] {0:1:53734} unsynchStatic } 2012-12-21 02:10:07.310: [ USRTHRD][1266022720] {0:1:53734} Thread:[VipRelocate:]isRunning is reset to false here 2012-12-21 02:10:07.343: [ AGFW][1400256832] {1:62611:962} Agent received the message: RESOURCE_START[ora.vrh1.vip 1 1] ID 4098:417329 2012-12-21 02:10:07.343: [ AGFW][1400256832] {1:62611:962} Preparing START command for: ora.vrh1.vip 1 1 2012-12-21 02:10:07.343: [ AGFW][1400256832] {1:62611:962} ora.vrh1.vip 1 1 state changed from: OFFLINE to: STARTING 2012-12-21 02:10:07.344: [ora.vrh1.vip][1333139776] {1:62611:962} [start] (:CLSN00107:) clsn_agent::start { 2012-12-21 02:10:07.344: [ USRTHRD][1333139776] {1:62611:962} Thread:[SendFail2SrvThread:]stop { 2012-12-21 02:10:07.344: [ USRTHRD][1333139776] {1:62611:962} Thread:[SendFail2SrvThread:]stop } 2012-12-21 02:10:07.344: [ USRTHRD][1333139776] {1:62611:962} Thread:[StartorphanSrvThread:]stop { 2012-12-21 02:10:07.344: [ USRTHRD][1333139776] {1:62611:962} Thread:[StartorphanSrvThread:]stop } 2012-12-21 02:10:07.344: [ora.vrh1.vip][1333139776][ENTER] {1:62611:962} [start] START_DEPENDENCIES of type ora.network.type: ora.net1.network 2012-12-21 02:10:07.344: [ora.vrh1.vip][1333139776] {1:62611:962} [start] InterfaceName = eth0 2012-12-21 02:10:07.344: [ora.vrh1.vip][1333139776] {1:62611:962} [start] NetworkMask = 255.255.255.0 2012-12-21 02:10:07.344: [ora.vrh1.vip][1333139776] {1:62611:962} [start] NetworkAuto = static 2012-12-21 02:10:07.344: [ora.vrh1.vip][1333139776] {1:62611:962} [start] Checking if eth0 Interface is fine 2012-12-21 02:10:07.349: [ora.vrh1.vip][1333139776] {1:62611:962} [start] ifname=eth0 2012-12-21 02:10:07.349: [ora.vrh1.vip][1333139776] {1:62611:962} [start] subnetmask=255.255.255.0 2012-12-21 02:10:07.349: [ora.vrh1.vip][1333139776] {1:62611:962} [start] subnetnumber=192.168.1.0 2012-12-21 02:10:07.349: [ora.vrh1.vip][1333139776] {1:62611:962} [start] InterfaceName = eth0 2012-12-21 02:10:07.350: [ora.vrh1.vip][1333139776] {1:62611:962} [start] HostName vrh1-vip translated to 192.168.1.162 2012-12-21 02:10:07.350: [ora.vrh1.vip][1333139776][ENTER] {1:62611:962} [start] START_DEPENDENCIES of type ora.network.type: ora.net1.network 2012-12-21 02:10:07.350: [ora.vrh1.vip][1333139776] {1:62611:962} [start] Interface Name = eth0 2012-12-21 02:10:07.350: [ora.vrh1.vip][1333139776] {1:62611:962} [start] Ip Address = 192.168.1.162 2012-12-21 02:10:07.350: [ora.vrh1.vip][1333139776] {1:62611:962} [start] Utils::getCrsHome crsHome /g01/11.2.0/grid 2012-12-21 02:10:07.351: [ora.vrh1.vip][1333139776] {1:62611:962} [start] Ocr Context init default level -1677304432 2012-12-21 02:10:07.526: [ default][1333139776]ICMP Ping from 192.168.1.161 to 192.168.1.162 2012-12-21 02:10:08.529: [ora.vrh1.vip][1333139776] {1:62611:962} [start] Trying to plumb 192.168.1.162 on eth0 2012-12-21 02:10:08.529: [ora.vrh1.vip][1333139776] {1:62611:962} [start] VipActions::startIp { 2012-12-21 02:10:08.529: [ora.vrh1.vip][1333139776] {1:62611:962} [start] Adding 192.168.1.162 on eth0:1 2012-12-21 02:10:08.529: [ora.vrh1.vip][1333139776] {1:62611:962} [start] VipActions::startIp } 2012-12-21 02:10:08.529: [ora.vrh1.vip][1333139776] {1:62611:962} [start] Flushing neighbours ARP Cache 2012-12-21 02:10:08.529: [ora.vrh1.vip][1333139776] {1:62611:962} [start] Arp::sCreateSocket { 2012-12-21 02:10:08.532: [ora.vrh1.vip][1333139776] {1:62611:962} [start] Arp::sCreateSocket } 2012-12-21 02:10:08.532: [ora.vrh1.vip][1333139776] {1:62611:962} [start] Arp::sFlushArpCache { 2012-12-21 02:10:08.532: [ora.vrh1.vip][1333139776] {1:62611:962} [start] Arp::sSend: sending type 1 2012-12-21 02:10:08.532: [ora.vrh1.vip][1333139776] {1:62611:962} [start] Arp::sFlushArpCache } 2012-12-21 02:10:08.532: [ora.vrh1.vip][1333139776] {1:62611:962} [start] Succeed to bind udp://192.168.1.162:0 2012-12-21 02:10:08.532: [ora.vrh1.vip][1333139776] {1:62611:962} [start] Succeed to bind udp://192.168.1.162:0 2012-12-21 02:10:08.533: [ora.vrh1.vip][1333139776] {1:62611:962} [start] (:CLSN00107:) clsn_agent::start } 2012-12-21 02:10:08.533: [ AGFW][1333139776] {1:62611:962} Command: start for resource: ora.vrh1.vip 1 1 completed with status: SUCCESS 2012-12-21 02:10:08.534: [ora.vrh1.vip][1333139776][ENTER] {1:62611:962} [check] START_DEPENDENCIES of type ora.network.type: ora.net1.network 2012-12-21 02:10:08.534: [ora.vrh1.vip][1333139776][ENTER] {1:62611:962} [check] START_DEPENDENCIES of type ora.network.type: ora.net1.network 2012-12-21 02:10:08.534: [ AGFW][1400256832] {1:62611:962} Agent sending reply for: RESOURCE_START[ora.vrh1.vip 1 1] ID 4098:417329 2012-12-21 02:10:08.534: [ora.vrh1.vip][1333139776][ENTER] {1:62611:962} [check] START_DEPENDENCIES of type ora.network.type: ora.net1.network 2012-12-21 02:10:08.534: [ora.vrh1.vip][1333139776][ENTER] {1:62611:962} [check] START_DEPENDENCIES of type ora.network.type: ora.net1.network 2012-12-21 02:10:08.535: [ AGFW][1400256832] {1:62611:962} ora.vrh1.vip 1 1 state changed from: STARTING to: ONLINE 2012-12-21 02:10:08.535: [ AGFW][1400256832] {1:62611:962} Started implicit monitor for [ora.vrh1.vip 1 1] interval=1000 delay=1000 2012-12-21 02:10:08.535: [ AGFW][1400256832] {1:62611:962} Agent sending last reply for: RESOURCE_START[ora.vrh1.vip 1 1] ID 4098:417329 012-12-21 02:10:08.535: [ AGFW][1400256832] {1:62611:962} Agent sending last reply for: RESOURCE_START[ora.vrh1.vip 1 1] ID 4098:417329 2012-12-21 02:10:09.529: [ora.vrh1.vip][1333139776][ENTER] {1:62611:962} [check] START_DEPENDENCIES of type ora.network.type: ora.net1.network 2012-12-21 02:10:09.529: [ora.vrh1.vip][1333139776][ENTER] {1:62611:962} [check] START_DEPENDENCIES of type ora.network.type: ora.net1.network 2012-12-21 02:10:09.529: [ora.vrh1.vip][1333139776][ENTER] {1:62611:962} [check] START_DEPENDENCIES of type ora.network.type: ora.net1.network 2012-12-21 02:10:09.529: [ora.vrh1.vip][1333139776][ENTER] {1:62611:962} [check] START_DEPENDENCIES of type ora.network.type: ora.net1.network 2012-12-21 02:10:09.529: [ USRTHRD][1333139776] {1:62611:962} Thread:[StartorphanSrvThread:]start { 2012-12-21 02:10:09.529: [ USRTHRD][1333139776] {1:62611:962} Thread:[StartorphanSrvThread:]start } 2012-12-21 02:10:09.530: [ USRTHRD][1500932416] {1:62611:962} VipAgent::startOrphanSrv { 2012-12-21 02:10:09.555: [ USRTHRD][1500932416] {1:62611:962} VipAgent::startOrphanSrv } 2012-12-21 02:10:09.555: [ USRTHRD][1500932416] {1:62611:962} Thread:[StartorphanSrvThread:]isRunning is reset to false here 2012-12-21 02:10:10.532: [ora.vrh1.vip][1333139776][ENTER] {1:62611:962} [check] START_DEPENDENCIES of type ora.network.type: ora.net1.network解决由于Network资源过于敏感导致的不必要的vip和listener的方法: 打补丁12378938 该补丁被包含在" 11.2.0.2 GI PSU4, 11.2.0.3 GI PSU3, 11.2.0.3 Windows Patch 7, 11.2.0.4 and above";并修改vip资源的依赖属性,例如:
# $GRID_HOME/bin/crsctl modify res ora.scan.vip -attr "STOP_DEPENDENCIES=hard(intermediate:ora.net.network)" For example: # /ocw/grid/bin/crsctl modify res ora.scan1.vip -attr "STOP_DEPENDENCIES=hard(intermediate:ora.net1.network)" # /ocw/grid/bin/crsctl modify res ora.scan2.vip -attr "STOP_DEPENDENCIES=hard(intermediate:ora.net1.network)" # /ocw/grid/bin/crsctl modify res ora.scan3.vip -attr "STOP_DEPENDENCIES=hard(intermediate:ora.net1.network)" # /ocw/grid/bin/srvctl stop scan -f $ /ocw/grid/bin/srvctl start scan_listenerMORE INFO:
After check timed out, 11gR2 Grid Infrastructure network resource (usually ora.net1.network) goes to INTERMEDIATE state, then goes back to ONLINE very shortly. This note will not discuss cause of check time out, but most common cause is public network hiccup.
Once network resource goes into INTERMEDIATE state, it may trigger VIP/service to be failed over and listener to go offline due to resource dependence, which could result in unnecessary connectivity issue for that period of time. After network resource is back online, affected resources (listener, service or instance) may not come back online.
$GRID_HOME/log//crsd/crsd.log
2011-06-12 07:12:31.261: [ AGFW][10796] {0:1:2881} Received state change for ora.net1.network racnode1 1 [old state = ONLINE, new state = UNKNOWN]
2011-06-12 07:12:31.261: [ AGFW][10796] {0:1:2881} Received state LABEL change for ora.net1.network racnode1 1 [old label = , new label = CHECK TIMED OUT]
..
2011-06-12 07:12:31.297: [ CRSPE][12081] {0:1:2881} RI [ora.net1.network racnode1 1] new external state [INTERMEDIATE] old value: [ONLINE] on racnode1 label = [CHECK TIMED OUT]
..
2011-06-12 07:12:31.981: [ AGFW][10796] {0:1:2882} Received state change for ora.net1.network racnode1 1 [old state = UNKNOWN, new state = ONLINE]
..
2011-06-12 07:12:32.307: [ CRSPE][12081] {0:1:2881} RI [ora.LISTENER.lsnr racnode1 1] new internal state: [STOPPING] old value: [STABLE]
2011-06-12 07:12:32.308: [ CRSPE][12081] {0:1:2881} CRS-2673: Attempting to stop 'ora.LISTENER.lsnr' on 'racnode1'
$GRID_HOME/log//agent/crsd/orarootagent_root/orarootagent_root.log
2011-06-12 07:12:08.965: [ AGFW][2070] {1:27767:2} Created alert : (:CRSAGF00113:) : Aborting the command: check for resource: ora.net1.network racnode1 1
2011-06-12 07:12:08.966: [ora.net1.network][2070] {1:27767:2} [check] clsn_agent::abort {
..
2011-06-12 07:12:31.257: [ AGFW][2070] {1:27767:2} Command: check for resource: ora.net1.network racnode1 1 completed with status: TIMEDOUT
2011-06-12 07:12:31.258: [ AGFW][2314] {1:27767:2} ora.net1.network racnode1 1 state changed from: ONLINE to: UNKNOWN
2011-06-12 07:12:31.258: [ AGFW][2314] {1:27767:2} ora.net1.network racnode1 1 would be continued to monitored!
2011-06-12 07:12:31.258: [ AGFW][2314] {1:27767:2} ora.net1.network racnode1 1 state details has changed from: to: CHECK TIMED OUT
..
2011-06-12 07:12:31.923: [ora.net1.network][2314][F-ALGO] {1:27767:2} CHECK initiated by timer for: ora.net1.network racnode1 1
..
2011-06-12 07:12:31.973: [ora.net1.network][8502][F-ALGO] {1:27767:2} [check] Command check for resource: ora.net1.network racnode1 1 completed with status ONLINE
2011-06-12 07:12:31.978: [ AGFW][2314] {1:27767:2} ora.net1.network racnode1 1 state changed from: UNKNOWN to: ONLINE
$GRID_HOME/log//agent/crsd/oraagent_/oraagent_.log
2011-06-12 07:12:32.335: [ AGFW][2314] {0:1:2881} Agent received the message: RESOURCE_STOP[ora.LISTENER.lsnr racnode1 1] ID 4099:14792
2011-06-12 07:12:32.335: [ AGFW][2314] {0:1:2881} Preparing STOP command for: ora.LISTENER.lsnr racnode1 1
2011-06-12 07:12:32.335: [ AGFW][2314] {0:1:2881} ora.LISTENER.lsnr racnode1 1 state changed from: ONLINE to: STOPPING
Cause
This is caused by bug 12680491
@ Related issues bug 12378938 bug 11857122 bug 12428737
Solution
The fix of bug 12680491 will add intermediate modifyer to stop dependency between network resource and VIP to avoid unnecessary resource state change, it's included in 11.2.0.2 GI PSU4, 11.2.0.3 GI PSU3, 11.2.0.3 Windows Patch 7, 11.2.0.4 and above. This fix is recommended instead of fix for bug 12378938 to avoid the issue in first place.
Once patch for this bug is applied, the following needs to be executed to change the dependence for all VIPs:
# $GRID_HOME/bin/crsctl modify res ora..vip -attr "STOP_DEPENDENCIES=hard(intermediate:ora..network)"
For example:
# /ocw/grid/bin/crsctl modify res ora.racnode1.vip -attr "STOP_DEPENDENCIES=hard(intermediate:ora.net1.network)"
Once the attribute is changed, a restart of nodeapps/VIP is needed to be in effect
posted on 2013-03-19 00:48 Oracle和MySQL 阅读(1998) 评论(0) 编辑 收藏 举报