转://对于11gR2的集群relink
对于11gR2的集群relink
参考MOS:
Do I need to relink the Oracle Clusterware / Grid Infrastructure home after an OS upgrade?”RAC: Frequently Asked Questions [ID 220970.1]
我们在对111gR2的RAC的操作系统进行内核升级或者系统升级后,我们一般建议对集群和rdbms进行一次relink
因为对于Oracle Grid Infrastructure(GI) 11.2 及之后的版本,在GRID HOME中有一些binary需要在OS升级或者打补丁后被relink。
当然,如果10g的CRS能够被relink最好也做一下,但是似乎10gR2 的crs没有relink的相关命令,倒是CRS 中的client shared libraries可以被relink
参考:
Will an Operating System Upgrade Affect Oracle Clusterware? [ID 743649.1]
下面提供11gR2 RAC在进行操作系统升级或者内核升级后如何进行relink:
1. 停止当前节点上的所有数据库实例
$su - oracle
$srvctl stop instance -d orcl -i orcl1 -o immediate
2. 切换当前节点的service到其余正常节点,确保业务高可用。
$ srvctl status service -d orcl
3. 用root用户执行<GRID_HOME>/crs/install/rootcrs.pl -unlock来修改相应目录权限并停止GI:
[root@s1-11g ~]# cd /oracle/app/11.2.0/grid/crs/install
[root@s1-11g install]# perl rootcrs.pl -unlock
Using configuration parameter file: ./crsconfig_params
CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on ‘s1-11g‘
CRS-2673: Attempting to stop ‘ora.crsd‘ on ‘s1-11g‘
CRS-2790: Starting shutdown of Cluster Ready Services-managed resources on ‘s1-11g‘
CRS-2673: Attempting to stop ‘ora.s2-11g.vip‘ on ‘s1-11g‘
CRS-2673: Attempting to stop ‘ora.oc4j‘ on ‘s1-11g‘
CRS-2673: Attempting to stop ‘ora.LISTENER_SCAN1.lsnr‘ on ‘s1-11g‘
CRS-2673: Attempting to stop ‘ora.cvu‘ on ‘s1-11g‘
CRS-2677: Stop of ‘ora.s2-11g.vip‘ on ‘s1-11g‘ succeeded
CRS-2677: Stop of ‘ora.LISTENER_SCAN1.lsnr‘ on ‘s1-11g‘ succeeded
CRS-2673: Attempting to stop ‘ora.scan1.vip‘ on ‘s1-11g‘
CRS-2677: Stop of ‘ora.scan1.vip‘ on ‘s1-11g‘ succeeded
CRS-2677: Stop of ‘ora.oc4j‘ on ‘s1-11g‘ succeeded
CRS-2677: Stop of ‘ora.cvu‘ on ‘s1-11g‘ succeeded
CRS-2673: Attempting to stop ‘ora.LISTENER.lsnr‘ on ‘s1-11g‘
CRS-2673: Attempting to stop ‘ora.OCRVOTE.dg‘ on ‘s1-11g‘
CRS-2673: Attempting to stop ‘ora.orcl.db‘ on ‘s1-11g‘
CRS-2677: Stop of ‘ora.LISTENER.lsnr‘ on ‘s1-11g‘ succeeded
CRS-2673: Attempting to stop ‘ora.s1-11g.vip‘ on ‘s1-11g‘
CRS-2677: Stop of ‘ora.s1-11g.vip‘ on ‘s1-11g‘ succeeded
CRS-2677: Stop of ‘ora.OCRVOTE.dg‘ on ‘s1-11g‘ succeeded
CRS-2677: Stop of ‘ora.orcl.db‘ on ‘s1-11g‘ succeeded
CRS-2673: Attempting to stop ‘ora.DATA.dg‘ on ‘s1-11g‘
CRS-2673: Attempting to stop ‘ora.ARCH.dg‘ on ‘s1-11g‘
CRS-2677: Stop of ‘ora.DATA.dg‘ on ‘s1-11g‘ succeeded
CRS-2677: Stop of ‘ora.ARCH.dg‘ on ‘s1-11g‘ succeeded
CRS-2673: Attempting to stop ‘ora.asm‘ on ‘s1-11g‘
CRS-2677: Stop of ‘ora.asm‘ on ‘s1-11g‘ succeeded
CRS-2673: Attempting to stop ‘ora.ons‘ on ‘s1-11g‘
CRS-2677: Stop of ‘ora.ons‘ on ‘s1-11g‘ succeeded
CRS-2673: Attempting to stop ‘ora.net1.network‘ on ‘s1-11g‘
CRS-2677: Stop of ‘ora.net1.network‘ on ‘s1-11g‘ succeeded
CRS-2792: Shutdown of Cluster Ready Services-managed resources on ‘s1-11g‘ has completed
CRS-2677: Stop of ‘ora.crsd‘ on ‘s1-11g‘ succeeded
CRS-2673: Attempting to stop ‘ora.mdnsd‘ on ‘s1-11g‘
CRS-2673: Attempting to stop ‘ora.crf‘ on ‘s1-11g‘
CRS-2673: Attempting to stop ‘ora.ctssd‘ on ‘s1-11g‘
CRS-2673: Attempting to stop ‘ora.evmd‘ on ‘s1-11g‘
CRS-2673: Attempting to stop ‘ora.asm‘ on ‘s1-11g‘
CRS-2677: Stop of ‘ora.mdnsd‘ on ‘s1-11g‘ succeeded
CRS-2677: Stop of ‘ora.crf‘ on ‘s1-11g‘ succeeded
CRS-2677: Stop of ‘ora.evmd‘ on ‘s1-11g‘ succeeded
CRS-2677: Stop of ‘ora.ctssd‘ on ‘s1-11g‘ succeeded
CRS-2677: Stop of ‘ora.asm‘ on ‘s1-11g‘ succeeded
CRS-2673: Attempting to stop ‘ora.cluster_interconnect.haip‘ on ‘s1-11g‘
CRS-2677: Stop of ‘ora.cluster_interconnect.haip‘ on ‘s1-11g‘ succeeded
CRS-2673: Attempting to stop ‘ora.cssd‘ on ‘s1-11g‘
CRS-2677: Stop of ‘ora.cssd‘ on ‘s1-11g‘ succeeded
CRS-2673: Attempting to stop ‘ora.gipcd‘ on ‘s1-11g‘
CRS-2677: Stop of ‘ora.gipcd‘ on ‘s1-11g‘ succeeded
CRS-2673: Attempting to stop ‘ora.gpnpd‘ on ‘s1-11g‘
CRS-2677: Stop of ‘ora.gpnpd‘ on ‘s1-11g‘ succeeded
CRS-2793: Shutdown of Oracle High Availability Services-managed resources on ‘s1-11g‘ has completed
CRS-4133: Oracle High Availability Services has been stopped.
Successfully unlock /oracle/app/11.2.0/grid
注意,如果在$GRID_HOME/rdbms/audit下面的audit文件很多,会导致rootcrs.pl执行很长时间,这样的话可以将$GRID_HOME/rdbms/audit/*.aud 文件备份到GRID_HOME之外,然后删除。
4. 禁止GI在OS重启后自动启动,这是因为升级OS或者打OS补丁后,可能需要重启主机,这样的话,需要在relink之前禁止GI启动。
用root用户:
[root@s1-11g install]# crsctl disable crs
CRS-4621: Oracle High Availability Services autostart is disabled.
5. 备份GI和RDBMS的ORACLE_HOME。作为一个DBA,备份是最重要的,当你出现不可逆的错误后,回退备份是你最终极的解决方案。
6. 升级OS内核或者给OS打补丁,包括重启主机等(如果需要)。
7. 用GI的属主用户来对GI binary进行relink:
[root@s1-11g audit]# su - grid
[grid@s1-11g ~]$ export ORACLE_HOME=/oracle/app/11.2.0/grid
确保GI是停止的,然后再执行relink:
[grid@s1-11g ~]$ ps -ef|grep d.bin
grid 3408 3360 0 17:09 pts/0 00:00:00 grep d.bin
[grid@s1-11g ~]$ crsctl stat res -t
CRS-4535: Cannot communicate with Cluster Ready Services
CRS-4000: Command Status failed, or completed with errors.
[grid@s1-11g ~]$ $ORACLE_HOME/bin/relink
writing relink log to: /oracle/app/11.2.0/grid/install/relink.log
[grid@s1-11g ~]$ <===relink结束后,并不会有任何信息提示,只是显示命令提示符。
需要检查/oracle/app/11.2.0/grid/install/relink.log, 查看是否有错误。
8. 用RDBMS的属主对数据库binary做relink:
su - oracle
确保$ORACLE_HOME设置为了数据库的ORACLE_HOME,然后执行:
[oracle@s1-11g ~]$ $ORACLE_HOME/bin/relink all
writing relink log to: /oracle/app/oracle/product/11.2.0/dbhome_1/install/relink.log
<===relink结束后,并不会有任何信息提示,只是显示命令提示符。
需要检查/oracle/app/oracle/product/11.2.0/dbhome_1/install/relink.log, 查看是否有错误。
9. 用root用户执行<GRID_HOME>/crs/install/rootcrs.pl -patch来修改相应目录权限并启动GI:
[root@s1-11g ~]# cd /oracle/app/11.2.0/grid/crs/install
[root@s1-11g install]# perl rootcrs.pl -patch
Using configuration parameter file: ./crsconfig_params
CRS-4123: Oracle High Availability Services has been started.
10. Enable CRS来保证主机重启后可以自动启动GI:
[root@s1-11g install]# crsctl enable crs
CRS-4622: Oracle High Availability Services autostart is enabled.
11. 确认所有的应启动的资源都已启动:
[root@s1-11g install]# crsctl stat res -t
--------------------------------------------------------------------------------
NAME TARGET STATE SERVER STATE_DETAILS
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.OCRVOTE.dg
ONLINE ONLINE s1-11g
ONLINE ONLINE s2-11g
ora.DATA.dg
ONLINE ONLINE s1-11g
ONLINE ONLINE s2-11g
ora.LISTENER.lsnr
ONLINE ONLINE s1-11g
ONLINE ONLINE s2-11g
ora.ARCH.dg
ONLINE ONLINE s1-11g
ONLINE ONLINE s2-11g
ora.asm
ONLINE ONLINE s1-11g Started
ONLINE ONLINE s2-11g Started
ora.gsd
OFFLINE OFFLINE s1-11g
OFFLINE OFFLINE s2-11g
ora.net1.network
ONLINE ONLINE s1-11g
ONLINE ONLINE s2-11g
ora.ons
ONLINE ONLINE s1-11g
ONLINE ONLINE s2-11g
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.LISTENER_SCAN1.lsnr
1 ONLINE ONLINE s2-11g
ora.cvu
1 ONLINE ONLINE s2-11g
ora.oc4j
1 ONLINE ONLINE s2-11g
ora.s1-11g.vip
1 ONLINE ONLINE s1-11g
ora.s2-11g.vip
1 ONLINE ONLINE s2-11g
ora.orcl.db
1 ONLINE ONLINE s2-11g Open
2 OFFLINE OFFLINE Instance Shutdown
ora.scan1.vip
1 ONLINE ONLINE s2-11g
如果发现实例没有启动,可以手工启动:
$srvctl start instance -d orcl -i orcl1
12. 可以用下面的MOS文档中的方法来确认oracle 的binary是RAC的:
How to Check Whether Oracle Binary/Instance is RAC Enabled and Relink Oracle Binary in RAC [ID 284785.1]
方法1:如果下面的命令能查出kcsm.o ,说明binary是RAC的:
su - oracle
$ar -t $ORACLE_HOME/rdbms/lib/libknlopt.a|grep kcsm.o
kcsm.o
在AIX上命令是不同的:
ar -X32_64 -t $ORACLE_HOME/rdbms/lib/libknlopt.a|grep kcsm.o
方法2:查看RAC特有的后台进程是否存在,比如:
[grid@s1-11g ~]$ ps -ef|grep lmon
grid 7732 1 0 17:59 ? 00:00:17 asm_lmon_+ASM1
oracle 18605 1 0 20:49 ? 00:00:00 ora_lmon_ORCL1 <===========
grid 20992 10160 0 21:10 pts/2 00:00:00 grep lmon
上面的所有步骤需要在集群的各个节点上依次执行。
------------------------------------------------------------------------------------
<版权所有,文章允许转载,但必须以链接方式注明源地址,否则追究法律责任!>
原博客地址:http://blog.itpub.net/23732248/
原作者:应以峰 (frank-ying)
-------------------------------------------------------------------------------------