【RAC】RAC中的负载均衡和故障切换--TAF配置
涉及到的内容包括:
Oracle RAC 客户端连接负载均衡(Load Balance)
实现负载均衡(Load Balance)是Oracle RAC最重要的特性之一,主要是把负载平均分配到集群中的各个节点,以提高系统的整体吞吐能力。通常情况下有两种方式来实现负载均衡,一个是基于客户端连接的负载均衡,一个是基于服务器端监听器(Listener)收集到的信息来将新的连接请求分配到连接数较少实例上的实现方式。本文主要讨论的是基于客户端连接的负载均衡,并给出演示。
与负载均衡配置之前的监听配置请参考
ORACLE RAC 监听配置 (listener.ora tnsnames.ora)
ORACLE RAC 下非缺省端口监听配置(listener.ora tnsnames.ora)
一、客户端的负载均衡
客户端的负载均衡主要是通过为tnsnames.ora增加load_balance=yes条目来实现,下面看看oracle(Note:226880.1)的解释
The client load balancing feature enables clients to randomize connection requests among the listeners. Oracle Net progresses through the list of
protocol addresses in a random sequence, balancing the load on the various listeners. Without client load balancing, Oracle Net progresses through the
list of protocol addresses sequentially until one succeeds. This normally is referred to connect-time load balance.
从上面的描述中可以得知,如果未开启load_balance=yes时,Oracle Net会根据地址列表按顺序来选择一个进行连接,直到连接成功为止。
如果第一个host主机连接失败,在有多个地址的情形下,接下来选择第二个地址连接,依此类推,直到连接成功为止。
当开启了load_balance=yes时,则Oracle Net会从多个地址中随机地选择一个地址进行连接,直到连接成功为止。
注意,此连接方式仅根据地址列表随机选择,并不考虑到各个实例上当前真正连接数量的多少,也即是没有考虑各个节点真实的连接负载情况。
二、服务器与客户端的配置情况
[sql] view plain copy print? 1、服务器端监听器配置 oracle@bo2dbp:~> more $ORACLE_HOME/network/admin/listener.ora --#节点bo2dbp上的listener # listener.ora.bo2dbp Network Configuration File: /u01/oracle/db/network/admin/listener.ora.bo2dbp # Generated by Oracle configuration tools. LISTENER_NEW_BO2DBP = (DESCRIPTION_LIST = (DESCRIPTION = (ADDRESS = (PROTOCOL = TCP)(HOST = bo2dbp-vip.2gotrade.com)(PORT = 1314)(IP = FIRST)) (ADDRESS = (PROTOCOL = TCP)(HOST = 192.168.7.51)(PORT = 1314)(IP = FIRST)) ) ) ...........................--#其余部分省略,注意此处的配置是使用了1314的非缺省监听器端口号 oracle@bo2dbs:/u01/oracle/db/network/admin> more listener.ora --#节点bo2dbs上的listener # listener.ora.bo2dbs Network Configuration File: /u01/oracle/db/network/admin/listener.ora.bo2dbs # Generated by Oracle configuration tools. LISTENER_NEW_BO2DBS = (DESCRIPTION_LIST = (DESCRIPTION = (ADDRESS = (PROTOCOL = TCP)(HOST = bo2dbs-vip.2gotrade.com)(PORT = 1314)(IP = FIRST)) (ADDRESS = (PROTOCOL = TCP)(HOST = 192.168.7.52)(PORT = 1314)(IP = FIRST)) ) ) ............... 2、参数配置 -->instrance ora10g1上的参数配置 SQL> show parameter instance_na NAME TYPE VALUE ------------------------------------ ----------- ------------------------------ instance_name string ora10g1 SQL> show parameter listener NAME TYPE VALUE ------------------------------------ ----------- ------------------------------ local_listener string local_lsnr_ora10g1 remote_listener string remote_lsnr_ora10g -->instrance ora10g2上的参数配置 SQL> show parameter instance_name NAME TYPE VALUE ------------------------------------ ----------- ------------------------------ instance_name string ora10g2 SQL> show parameter listener NAME TYPE VALUE ------------------------------------ ----------- ------------------------------ local_listener string local_lsnr_ora10g2 remote_listener string remote_lsnr_ora10g -->instrance ora10g2上的监听信息 SQL> ho ps -ef | grep lsnr oracle 17372 1 0 11:00 ? 00:00:00 /u01/oracle/db/bin/tnslsnr LISTENER_NEW_BO2DBS -inherit oracle 17502 24301 0 12:10 pts/0 00:00:00 /bin/bash -c ps -ef | grep lsnr oracle 17504 17502 0 12:10 pts/0 00:00:00 grep lsnr SQL> ho lsnrctl status LISTENER_NEW_BO2DBS Listening Endpoints Summary... (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=192.168.7.62)(PORT=1314))) (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=192.168.7.52)(PORT=1314))) Services Summary... Service "ora10g" has 2 instance(s). Instance "ora10g1", status READY, has 1 handler(s) for this service... Instance "ora10g2", status READY, has 2 handler(s) for this service... .......... -->由于只是测试基于客户端的load balance,因此移出remote_listener参数 SQL> alter system reset remote_listener scope=both sid='*'; alter system reset remote_listener scope=both sid='*' * ERROR at line 1: ORA-32009: cannot reset the memory value for instance * from instance ora10g2 SQL> alter system reset remote_listener scope=spfile sid='*'; System altered. SQL> ho srvctl stop database -d ora10g -->关闭数据库ora10g SQL> ho srvctl start database -d ora10g -->启动数据库ora10g使得刚刚修改的remote_listener生效 SQL> ho lsnrctl status LISTENER_NEW_BO2DBS -->此时可以看到只有ora10g2注册到监听器 Listening Endpoints Summary... (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=192.168.7.62)(PORT=1314))) (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=192.168.7.52)(PORT=1314))) Services Summary... Service "PLSExtProc" has 1 instance(s). Instance "PLSExtProc", status UNKNOWN, has 1 handler(s) for this service... Service "ora10g" has 1 instance(s). Instance "ora10g2", status READY, has 1 handler(s) for this service... .......... oracle@bo2dbp:~> lsnrctl status LISTENER_NEW_BO2DBP #同样在节点bo2dbp也只有ora10g1注册到监听器 Listening Endpoints Summary... (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=192.168.7.61)(PORT=1314))) (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=192.168.7.51)(PORT=1314))) Services Summary... Service "ora10g" has 1 instance(s). Instance "ora10g1", status READY, has 1 handler(s) for this service... ......... 3、客户端配置 SZDB:~ # ifconfig eth1 | grep "inet addr"|cut -d " " -f12|cut -d : -f2 #客户端主机的ip 192.168.7.2 SZDB:~ # su - oracle oracle@SZDB:~> cat /etc/hosts --#客户端主机添加了RAC上的两个虚拟节点的ip信息 192.168.7.61 bo2dbp-vip.2gotrade.com bo2dbp-vip 192.168.7.62 bo2dbs-vip.2gotrade.com bo2dbs-vip oracle@SZDB:~> tail -12 $ORACLE_HOME/network/admin/tnsnames.ora ORA10G = (DESCRIPTION = (ADDRESS = (PROTOCOL = TCP)(HOST = bo2dbp-vip.2gotrade.com)(PORT = 1314)) (ADDRESS = (PROTOCOL = TCP)(HOST = bo2dbs-vip.2gotrade.com)(PORT = 1314)) (LOAD_BALANCE = yes) (CONNECT_DATA = (SERVER = DEDICATED) (SERVICE_NAME = ora10g) ) )
三、测试负载均衡(load balance)
[sql] view plain copy print? 1、启用load balance的测试 oracle@SZDB:~> more load_balance.sh #!/bin/bash for i in {1..100} do echo $i sqlplus -S system/oracle@ORA10G < select instance_name from v\$instance; EOF sleep 1 done exit 0 # Author: Robinson Cheng # Blog : http://blog.csdn.net/robinson_0612 oracle@SZDB:~> ./load_balance.sh >load_bal.log oracle@SZDB:~> head -20 load_bal.log 1 INSTANCE_NAME ---------------- ora10g2 2 INSTANCE_NAME ---------------- ora10g1 3 INSTANCE_NAME ---------------- ora10g2 4 oracle@SZDB:~> grep ora10g1 load_bal.log |wc -l 47 oracle@SZDB:~> grep ora10g2 load_bal.log |wc -l 53 从上面的log日志中可以看出启用客户端的负载均衡基本上使得从客户端发起连接的能够保持均衡。 2、未启用load balance的测试 从客户端的tnsnames.ora中移出(LOAD_BALANCE = yes)选项,然后继续使用上面的脚本来测试 oracle@SZDB:~> grep ora10g1 no_load_bal.log |wc -l 100 oracle@SZDB:~> grep ora10g2 no_load_bal.log |wc -l 0 从上面的日志中可以看出当移出LOAD_BALANCE = yes项后,所有的用户连接请求都被定为到ora10g1,这是因为连接请求从tnsnames.ora中选择 列在ADDRESS项中排在第一行的位置。 下面我们关闭实例ora10g1,再来测试连接情形 oracle@bo2dbp:~> srvctl stop instance -d ora10g -i ora10g1 oracle@SZDB:~> ./load_balance.sh >no_load_bal_new.log oracle@SZDB:~> grep ora10g1 no_load_bal_new.log |wc -l 0 oracle@SZDB:~> grep ora10g2 no_load_bal_new.log |wc -l 100 由于实例ora10g1已经关闭,因此所有的连接请求都被分配到ora10g2。
四、总结
1、客户端的负载均衡配置较为简单,仅仅是在客户端的tnsnames.ora添加 LOAD_BALANCE = yes |on
2、其连接分配原则是根据tnsnames.ora中连接标识符下的ADDRESS列表随机选择来进行与服务器之间的连接
3、如果选择列表中的某个节点listener或instance不可用,则再从剩余的ADDRESS列表随机选择,直到成功为止
Oracle RAC 服务器端连接负载均衡(Load Balance)
Oracle RAC服务器端的负载均衡是根据RAC中各节点的连接负荷数情况,将新的连接请求分配到负荷最小的节点上去。当数据库处于运行时,RAC中各节点的PMON进程每3秒会将各自节点的连接负荷数更新到service_register。而对于节点中任意监听器故障或监听器意外失败时,PMON进程会每1秒钟检查当前节点上的监听是否重启,以获得最新的负载信息来及时调整负载均衡。本文主要演示suse 10 + oracle 10g rac下的服务器端的负载均衡。
有关客户端的负载均衡可参考
Oracle RAC 客户端连接负载均衡(Load Balance)
配置RAC负载均衡与故障转移
有关Oracle 网络配置相关基础以及概念性的问题请参考:
配置ORACLE 客户端连接到数据库
配置非默认端口的动态服务注册
配置sqlnet.ora限制IP访问Oracle
Oracle 监听器日志配置与管理
设置 Oracle 监听器密码(LISTENER)
Oracle RAC 监听配置
ORACLE RAC 下非缺省端口监听配置(listener.ora tnsnames.ora)
一、服务器端负载均衡配置
[sql] view plain copy print? 1、为tnsnames.ora 添加相应的网络服务名(每个节点配置) oracle@bo2dbp:~> more $ORACLE_HOME/network/admin/tnsnames.ora # tnsnames.ora Network Configuration File: /u01/oracle/db/network/admin/tnsnames.ora # Generated by Oracle configuration tools. remote_lsnr_gobo4 = (ADDRESS_LIST = (ADDRESS = (PROTOCOL = TCP)(HOST = 192.168.7.61)(PORT = 1521)) (ADDRESS = (PROTOCOL = TCP)(HOST = 192.168.7.62)(PORT = 1521)) ) local_lsnr_gobo4a = (ADDRESS_LIST = (ADDRESS = (PROTOCOL = TCP)(HOST = 192.168.7.61)(PORT = 1521)) ) local_lsnr_gobo4b = (ADDRESS_LIST = (ADDRESS = (PROTOCOL = TCP)(HOST = 192.168.7.62)(PORT = 1521)) ) 2、设置remote_listener参数 alter system set remote_listener='' scope=both sid='*'; SQL> show parameter instance_name NAME TYPE VALUE ------------------------------------ ----------- ------------------------------ instance_name string GOBO4A SQL> show parameter listener NAME TYPE VALUE ------------------------------------ ----------- ------------------------------ local_listener string remote_listener string SQL> alter system set remote_listener='remote_lsnr_gobo4' scope=both sid='*'; System altered. 3、配置客户端tnsnames.ora -->客户端为suse 10 SZDB:~ # ifconfig eth1 | grep "inet addr"|cut -d " " -f12|cut -d : -f2 #--客户端主机的ip 192.168.7.2 SZDB:~ # su - oracle oracle@SZDB:~> tail -10 $ORACLE_HOME/network/admin/tnsnames.ora GOBO4 = (DESCRIPTION = (ADDRESS = (PROTOCOL = TCP)(HOST = 192.168.7.61)(PORT = 1521)) (ADDRESS = (PROTOCOL = TCP)(HOST = 192.168.7.62)(PORT = 1521)) (LOAD_BALANCE = off) #--由于仅仅测试寄予服务器端的负载均衡,因此关闭客户端负载均衡选项 (CONNECT_DATA = (SERVER = DEDICATED) (SERVICE_NAME = GOBO4) ) ) #Author : Robinson Cheng #Blog : http://blog.csdn.net/robinson_0612 4、检查监听情况 oracle@bo2dbp:~> lsnrctl status Listening Endpoints Summary... (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=192.168.7.61)(PORT=1521))) (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=192.168.7.51)(PORT=1521))) Services Summary... Service "+ASM" has 1 instance(s). Instance "+ASM1", status BLOCKED, has 1 handler(s) for this service... Service "GOBO4" has 2 instance(s). Instance "GOBO4A", status READY, has 2 handler(s) for this service... Instance "GOBO4B", status READY, has 1 handler(s) for this service.. ....... oracle@bo2dbs:~> lsnrctl status Listening Endpoints Summary... (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=192.168.7.62)(PORT=1521))) (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=192.168.7.52)(PORT=1521))) Services Summary... Service "+ASM" has 1 instance(s). Instance "+ASM2", status BLOCKED, has 1 handler(s) for this service... Service "GOBO4" has 2 instance(s). Instance "GOBO4A", status READY, has 1 handler(s) for this service... Instance "GOBO4B", status READY, has 2 handler(s) for this service... .......... #--如果监听或数据库需要重启异常请考虑重新启动监听器或数据库 #--下面清空监听日志以便于后续统计连接信息 oracle@bo2dbp:/u01/oracle/db/network/log> cat /dev/null>listener_bo2dbp.log oracle@bo2dbs:/u01/oracle/db/network/log> cat /dev/null>listener_bo2dbs.log
二、测试服务器端的负载均衡
- 1、从客户端建立连接
- oracle@SZDB:~> more load_balance.sh
- #!/bin/bash
- for i in {1..1000}
- do
- echo $i
- sqlplus -S system/oracle@GOBO4 <<eof < span="">
- select instance_name from v\$instance;
- EOF
- sleep 1
- done
- exit 0
- oracle@SZDB:~> ./load_balance.sh >srv_load_bal.log
- 2、分析监听日志
- oracle@bo2dbp:/u01/oracle/db/network/log> more listener_bo2dbp.log
- 12-OCT-2012 12:00:10 * service_update * GOBO4B * 0 #节点bo2dbs上的实例GOBO4B的更新到bo2dbp上监听器的更新信息
- 12-OCT-2012 12:00:35 * service_update * GOBO4B * 0
- 12-OCT-2012 12:01:04 * (CONNECT_DATA=(SERVER=DEDICATED)(SERVICE_NAME=GOBO4)(CID=(PROGRAM=sqlplus@SZDB)(HOST=SZDB)
- (USER=oracle))) * (ADDRESS=(PROTOCOL=tcp)(HOST=192.168.7.2)(PORT=50322)) * establish * GOBO4 * 0
- 12-OCT-2012 12:01:05 * (CONNECT_DATA=(SERVER=DEDICATED)(SERVICE_NAME=GOBO4)(CID=(PROGRAM=sqlplus@SZDB)(HOST=SZDB)
- (USER=oracle))) * (ADDRESS=(PROTOCOL=tcp)(HOST=192.168.7.2)(PORT=50325)) * establish * GOBO4 * 0
- 12-OCT-2012 12:01:07 * (CONNECT_DATA=(SERVER=DEDICATED)(SERVICE_NAME=GOBO4)(CID=(PROGRAM=sqlplus@SZDB)(HOST=SZDB)
- (USER=oracle))) * (ADDRESS=(PROTOCOL=tcp)(HOST=192.168.7.2)(PORT=50328)) * establish * GOBO4 * 0
- 12-OCT-2012 12:01:08 * (CONNECT_DATA=(SERVER=DEDICATED)(SERVICE_NAME=GOBO4)(CID=(PROGRAM=sqlplus@SZDB)(HOST=SZDB)
- (USER=oracle))) * (ADDRESS=(PROTOCOL=tcp)(HOST=192.168.7.2)(PORT=50330)) * establish * GOBO4 * 0
- #上面的日志片断中可以看出全部是客户端发起的到bo2dbp节点上的建立连接的信息
- #下面来查看bo2dbs上的监听日志
- oracle@bo2dbs:/u01/oracle/db/network/log> more listener_bo2dbs.log
- 12-OCT-2012 12:00:10 * service_update * GOBO4B * 0
- 12-OCT-2012 12:00:10 * service_update * GOBO4B * 0
- 12-OCT-2012 12:00:14 * (CONNECT_DATA=(CID=(PROGRAM=)(HOST=bo2dbs)(USER=oracle))(COMMAND=status)(ARGUMENTS=64)
- (SERVICE=LISTENER_BO2DBS)(VERSION=169870080)) * status * 0
- 12-OCT-2012 12:00:35 * service_update * GOBO4B * 0
- 12-OCT-2012 12:00:35 * service_update * GOBO4B * 0
- 12-OCT-2012 12:01:04 * (CONNECT_DATA=(SERVER=DEDICATED)(SERVICE_NAME=GOBO4)(CID=(PROGRAM=sqlplus@SZDB)(HOST=SZDB)
- (USER=oracle))(INSTANCE_NAME=GOBO4B)) * (ADDRESS=(PROTOCOL=tcp)(HOST=192.168.7.2)(PORT=61862)) * establish * GOBO4 * 0
- 12-OCT-2012 12:01:07 * (CONNECT_DATA=(SERVER=DEDICATED)(SERVICE_NAME=GOBO4)(CID=(PROGRAM=sqlplus@SZDB)(HOST=SZDB)
- (USER=oracle))(INSTANCE_NAME=GOBO4B)) * (ADDRESS=(PROTOCOL=tcp)(HOST=192.168.7.2)(PORT=61868)) * establish * GOBO4 * 0
- 12-OCT-2012 12:01:09 * (CONNECT_DATA=(SERVER=DEDICATED)(SERVICE_NAME=GOBO4)(CID=(PROGRAM=sqlplus@SZDB)(HOST=SZDB)
- (USER=oracle))(INSTANCE_NAME=GOBO4B)) * (ADDRESS=(PROTOCOL=tcp)(HOST=192.168.7.2)(PORT=61872)) * establish * GOBO4 * 0
- #在12-OCT-2012 12:01:04时刻,连接信息中有INSTANCE_NAME=GOBO4B的连接信息,而节点bo2dbp上也有一条类似的信息,因此该条连接
- #日志是由节点bo2dbp转发过来而建立的连接请求。
- #同样在12-OCT-2012 12:01:07时刻,节点bo2dbp转发过来而建立的连接请求。
- #小结一下,
- #对于直接连接,监听器日志中将出现establish,且不含有INSTANCE_NAME=GOBO4B 字样
- #而对于转发的连接,则转发节点与接收的节点同时存在连接信息,转发节点上存在连接信息的与普通的连接请求一样,
- #而接收的节点上存在INSTANCE_NAME=<instance_name> 信息
- 3、检查负载均衡结果
- oracle@SZDB:~> grep GOBO4A srv_load_bal.log |wc -l
- 755
- oracle@SZDB:~> grep GOBO4B srv_load_bal.log |wc -l
- 245
- #从上面的日志文件中可知总共有755个客户端连接到了gobo4a,有245各客户端连接到了gobo4b
- #下面查看监听器日志来获得连接信息
- #下面的查询中在节点bo2dbp上总共有接受了1000个用户连接
- oracle@bo2dbp:/u01/oracle/db/network/log> grep establish listener_bo2dbp.log |wc -l
- 1000
- #下面的查询查看是否有从节点bo2dbs转发过来的连接,结果为0,说明没有任何连接请求从bo2dbs转发过来
- oracle@bo2dbp:/u01/oracle/db/network/log> grep INSTANCE_NAME=GOBO4 listener_bo2dbp.log |wc -l
- 0
- #接下来查看节点bo2dbs的监听日志,可以看出总共接受了245个连接请求
- oracle@bo2dbs:/u01/oracle/db/network/log> grep establish listener_bo2dbs.log |wc -l
- 245
- #下面的过滤情况也表明在节点bo2dbs上的连接是从bo2dbp上转发的连接,而非客户端直接到bo2dbs的请求连接
- oracle@bo2dbs:/u01/oracle/db/network/log> grep INSTANCE_NAME=GOBO4 listener_bo2dbs.log |wc -l
- 245
- #从监听器的日志检查可以,测试中的连接全部请求到节点bo2dbp,是由于tnsnames.ora中ADDRESS的第一个IP地址就是bo2dbp的IP
- #因此所有的连接都是请求到bo2dbp,而没有客户端发出到bo2dbs的连接请求
- #其次是尽管在bo2dbp有1000个连接请求,而真正建立连接的只有755个,有245转发到了节点bo2dbs
三、总结
1、服务器端的负载均衡需要配置remote_listener参数,而该参数的值依赖于tnsnames.ora的连接字符串
2、对于基于服务器端的连接负载均衡,监听器会根据当前节点、实例上的连接负载情况进行转发到空闲的实例
3、转发的依据仅仅是当前节点监听的连接数量的多少,而非当前实例的过度负载
4、从上面的测试可以得出,各个节点的连接并不算均衡,是相对的均衡,因此应结合客户端连接负载协同工作
5、对于当前实例的过度负载的情形,应结合配置service方法来实现负载均衡
配置 RAC 负载均衡与故障转移
Oracle负载均衡主要是指新会话连接到RAC数据库时,如何判定这个新的连接要连到哪个节点进行工作?通常情况下,负载均衡分为客户端负载
均衡与服务器端负载均衡。客户端负载均衡通常是在客户端的tnsnames.ora中多添加一个链接地址以及LOAD_BALANCE与failover参数。而服务器
端的负载均衡则相对复杂,下面具体描述服务器端负载均衡。
一、负载均衡
注意这里的负载均衡指的是连接的负载均衡,即客户可以随机从不同的实例中连接到数据库
1.配置tnsnames.ora使得该文件中包含如下全部内容:
- # LISTENERS_DEVDB DEVDB是数据库名,可以使用netmgr,netca编辑或直接使用Vim创建
- LISTENERS_DEVDB =
- (ADDRESS_LIST =
- (ADDRESS = (PROTOCOL = TCP)(HOST = rac1-vip.robinson.com)(PORT = 1521))
- (ADDRESS = (PROTOCOL = TCP)(HOST = rac2-vip.robinson.com)(PORT = 1521))
- )
2.配置参数文件remote_listener
- SQL> alter system set remote_listener='LISTENERS_DEVDB' scope=both sid='*';
3.需要配置连接描述信息的两个IP地址、端口号、以及load_balance子项为yes (主要是load_balance子项)
- DEVDB =
- (DESCRIPTION =
- (ADDRESS = (PROTOCOL = TCP)(HOST = rac1-vip.robinson.com)(PORT = 1521))
- (ADDRESS = (PROTOCOL = TCP)(HOST = rac2-vip.robinson.com)(PORT = 1521))
- (LOAD_BALANCE = yes)
- (CONNECT_DATA =
- (SERVER = DEDICATED)
- (SERVICE_NAME = devdb.robinson.com)
- )
- )
4.查看侦听器的状态,从下面可以看到devdb.robinson.com服务中有两个实例为其提供服务
- [oracle@rac2 ~]$ lsnrctl status
- Service "devdb.robinson.com" has 2 instance(s).
- Instance "devdb1", status READY, has 1 handler(s) for this service...
- Instance "devdb2", status READY, has 2 handler(s) for this service...
5.测试负载均衡
使用shell脚本来进行测试负载均衡
- --编辑TestLoadBalance.sh
- #!/bin/bash
- #Usage: TestLoadBalance devdb 1000
- count=0
- while [ $count -lt $2 ] # Set up a loop control
- do # Begin the loop
- count='expr $count + 1' # Increment the counter
- sqlplus -s usr1/usr1pwd@$1 @TestLoadBalance.sql # Connect instance and execute sql statement
- sleep 1
- done
- --TestLoadBalance.sql 脚本
- col instance_name format a30
- select instance_name from v$instance;
- --实施测试
- ./TestLoadBalance.sh devdb 1000
- --查看结果
- SQL> select inst_id,count(1) from gv$instance group by inst_d;
- INST_ID COUNT(1)
- ---------- ----------
- devdb1 446
- devdb2 554
二、配置故障转移
负载均衡是用于实现基于连接的负载均衡,但不能解决节点是否可用,一旦一个节点损坏,已成功连接的客户端并不能转移到其他正常服务的
实例中。而故障转移功能则使得该功能得以实现。可以使用srvctl 和dbca来创建服务。下面使用dbca来创建一个新的服务,客户端连接到实
例后,对故障实现透明切换。
1.配置故障转移服务
在节点rac1使用oracle帐户启动dbca工具,
a.选择 Oracle Real Application Clusters database
b.选择 Services Management
c.集群数据库列表:单击 Next。
d.数据库服务:单击 Add。
添加服务:输入sales。
选择 devdb1 作为首选实例。
选择 devdb2 作为可用实例。
TAF 策略:选择 Basic。
单击 Finish。
e.数据库配置助手:单击 No 退出。
配置完毕后,在tnsnames.ora中多出了以下内容,注意是各个节点都会多出以下内容
- SALES =
- (DESCRIPTION =
- (ADDRESS = (PROTOCOL = TCP)(HOST = rac1-vip.robinson.com)(PORT = 1521))
- (ADDRESS = (PROTOCOL = TCP)(HOST = rac2-vip.robinson.com)(PORT = 1521))
- (LOAD_BALANCE = yes)
- (CONNECT_DATA =
- (SERVER = DEDICATED)
- (SERVICE_NAME = sales.robinson.com)
- (FAILOVER = --failover_mode是实现故障转移的关键选项
- (TYPE = SELECT)
- (METHOD = BASIC) --TAF 策略:此处当前为 Basic
- (RETRIES = 180)
- (DELAY = 5)
- )
- )
- )
同时参数service_names会多出一个服务名,位于在配置数据库服务时选择的首选实例中
- SQL> select instance_name from v$instance;
- INSTANCE_NAME
- ----------------
- devdb1
- SQL> show parameter service_names
- NAME TYPE VALUE
- ------------------------------------ ----------- ------------------------------
- service_names string devdb.robinson.com, sales
- SQL> select instance_name from v$instance;
- INSTANCE_NAME
- ----------------
- devdb2
- SQL> show parameter service
- NAME TYPE VALUE
- ------------------------------------ ----------- ------------------------------
- service_names string devdb.robinson.com
使用srvctl工具也可以看到该服务已经正常开始提供服务
- SQL> ho srvctl status service -d devdb -s sales
- Service sales is running on instance(s) devdb1
- SQL> ho lsnrctl status
- Service "sales.robinson.com" has 1 instance(s). --sales正常提供服务
- Instance "devdb1", status READY, has 2 handler(s) for this service...
2.实现故障转移
下面使用帐户usr1,服务名sales从Windows客户端来登陆,注意要配置好客户端tnsnames,可以将服务器sales项内容全部复制到客户端tnsnames.ora中
- C:\>sqlplus usr1/usr1@sales
- SQL> col host_name format a20
- SQL> select instance_number ins_no,instance_name,host_name,status from v$instance;
- INS_NO INSTANCE_NAME HOST_NAME STATUS
- ---------- ---------------- -------------------- ------------
- 1 devdb1 rac1.robinson.com OPEN
- SQL> select failover_type,failover_method,failed_over from v$session
- 2 where username='USR1';
- FAILOVER_TYPE FAILOVER_M FAI
- ------------- ---------- ---
- SELECT BASIC NO
- --从其它会话使用sys帐户登陆到crm 并关闭该实例
- SQL> show user;
- USER is "SYS"
- SQL> select instance_name from v$instance;
- INSTANCE_NAME
- ----------------
- devdb1
- SQL> shutdown abort
- ORACLE instance shut down.
- --从先前登陆到sales的会话中验证会话故障切换功能
- SQL> select instance_number ins_no,instance_name,host_name,status from v$instance;
- INS_NO INSTANCE_NAME HOST_NAME STATUS
- ---------- ---------------- -------------------- ------------
- 2 devdb2 rac2.robinson.com OPEN
- SQL> select failover_type,failover_method,failed_over from v$session
- 2 where username='USR1'; --第3列显示的为yes,也表明经过故障切换后提供的服务
- FAILOVER_TYPE FAILOVER_M FAI
- ------------- ---------- ---
- SELECT BASIC YES
- --由下面的查询中可以看到服务名sales被添加到可用实例devdb2,节点rac2 的service_names参数中
- SQL> select instance_name from v$instance;
- INSTANCE_NAME
- ----------------
- devdb2
- SQL> show parameter service
- NAME TYPE VALUE
- ------------------------------------ ----------- ------------------------------
- service_names string devdb.robinson.com, sales
3.重新定位故障转移服务到首选实例
对于首选实例从故障中恢复后,需要手动来重新定位到首选实例
- SQL> startup --启动devdb1
- SQL> show parameter service_names
- NAME TYPE VALUE
- ------------------------------------ ----------- ------------------------------
- service_names string devdb.robinson.com
- SQL> ho srvctl relocate service -d devdb -s sales -i devdb2 -t devdb1
- SQL> show parameter service_names
- NAME TYPE VALUE
- ------------------------------------ ----------- ------------------------------
- service_names string devdb.robinson.com, sales
4.DML故障转移(不同于DQL,因此单独列出)
使用Windows客户端通过sales服务名登陆
- C:\>sqlplus usr1/usr1@sales
- SQL> show user;
- USER is "USR1"
- SQL> create table tb_temp (id int,ename varchar2(20)) tablespace tbs1;
- Table created.
- SQL> insert into tb_temp
- 2 select 1,'Robinson' from dual
- 3 union all
- 4 select 2,'Jackson' from dual;
- 2 rows created.
- SQL> commit;
- Commit complete.
- SQL> select * from tb_temp;
- ID ENAME
- ---------- --------------------
- 1 Robinson
- 2 Jackson
- SQL> delete from tb_temp;
- 2 rows deleted.
- SQL> select * from tb_temp;
- no rows selected
从另一个会话中使用sysdba关闭devdb1(shutdown abort)
再在刚刚执行表创建的会话中查询记录,收到提示,事务必须被回滚
- SQL> select * from tb_temp;
- select * from tb_temp
- *
- ERROR at line 1:
- ORA-25402: transaction must roll back
- SQL> rollback;
- Rollback complete.
- SQL> select * from tb_temp;
- ID ENAME
- ---------- --------------------
- 1 Robinson
- 2 Jackson
- SQL> select failover_type,failover_method,failed_over from v$session
- 2 where username='USR1';
- FAILOVER_TYPE FAILOVER_M FAI
- ------------- ---------- ---
- SELECT BASIC YES
总结:对于DML 操作在实现故障转移时,将严格按照ACID原则来执行,大部分情况需要回滚事务。
TAF配置
11g RAC环境下客户端配置TAF
TAF是Transparent Application Failover的英文缩写,顾名思义就是对应用透明的故障转移,举个例子,当应用连接某个oracle数据库的执行查询操作的时候,数据库服务器网络中断或者实例崩溃,在经过delay参数设置的值之后,将自动连接到其他可用的实例,继续进行查询。(前提是执行计划,输出结果集和输出的顺序不能发生变化)
TAF的特性:
1:TAF是ORACLE客户端提供的一项特性,使用TAF,对客户端的环境有一定的要求,比如JAVA的JDBC驱动、Oracle客户端的版本等(8i开始支持TAF);
2:大致上TAF可以分为2种,连接时的TAF和会话建立后TAF;
3:TAF本身与是否RAC环境无关,但一般都用在RAC环境,最小程度的减少最应用的影响,单实例环境下也可以使用TAF,这样使用PL/SQL developer连接数据库,即使数据库实例重启,也不需要重新连接;
4:RAC环境下,还可以把TAF配置在服务器端;
5:配置listener.ora文件的GLOBAL_DBNAME参数后将会禁用TAF。
配置如下:
1、查看服务端的service_name 和 RAC SCAN地址:
[oracle@rac122 tnslsnr]$ cat /etc/hosts # Do not remove the following line, or various programs # that require network functionality will fail. 127.0.0.1 localhost.localdomain localhost ::1 localhost6.localdomain6 localhost6 #Public 10.1.252.121 rac121 10.1.252.122 rac122 #Private 192.168.123.1 rac121-priv 192.168.123.2 rac122-priv #Virtual IP 10.1.252.75 rac121-vip 10.1.252.76 rac122-vip 10.1.252.68 rac-scan [oracle@rac122 tnslsnr]$ sqlplus / as sysdba SQL*Plus: Release 11.2.0.3.0 Production on Tue Jun 26 15:33:29 2012 Copyright (c) 1982, 2011, Oracle. All rights reserved. Connected to: Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP, Data Mining and Real Application Testing options SQL> show parameter service_name NAME TYPE VALUE ------------------------------------ ----------- ------------------------------ service_names string racdb SQL>
2、在客户端tnsnames.ora文件中添加如下内容:
racdb = (DESCRIPTION = (ADDRESS_LIST = (ADDRESS = (PROTOCOL = TCP)(HOST = 10.1.252.68)(PORT = 1521)) ) (CONNECT_DATA = (SERVICE_NAME = racdb) (FAILOVER_MODE = (TYPE = SELECT) (METHOD = BASIC) (RETRIES = 180) (DELAY = 5)) ) )
属性说明 :
TYPE type属性描述了故障转移的类型,可能的值如下:
SESSION 指定发生故障转换时,TAF应创建会话,不执行其他任何操作。
SELECT 指定执行故障转换时,除了创建会话,TAF 也应该重新启动故障转移期间运行的任何SELECT语句,
当SELECT 语句执行时,ORACLE Net 将返回故障转移前没有返回给用户的行。
NONE 不执行TAF故障转移
METHOD method特征确定何时创建会话,可能的值如下 :
BASIC 在故障转移期间创建会话
PRECONNECT 与数据库的初始连接建立后,采用由BACKUP 特性指定的ORACLE Net别名创建故障转移会话。
RETRIES retries特性指Oracle Net 返回ADDRESS_LIST 并尝试连接到幸存实例的次数。
DELAY delay特性指定每次重试之间等待的次数
3、用客户端登录作如下操作:
C:\Documents and Settings\```>sqlplus /nolog SQL*Plus: Release 10.2.0.1.0 - Production on 星期二 6月 26 15:00:56 2012 Copyright (c) 1982, 2005, Oracle. All rights reserved. SQL> conn sys/system@racdb as sysdba 已连接。 SQL> show parameter instance_name NAME TYPE VALUE ------------------------- ---------------------- --------------- instance_name string racdb2 SQL> select sid from v$mystat where rownum=1; SID ---------- 163 SQL> select machine,failover_type,failover_method,failed_over from v$session where sid=163; MACHINE FAILOVER_TYPE FAILOVER_METHOD FAILED ---------------- ---------- -------------------- ------ AILK\TANWEI SELECT BASIC NO SQL> create table taf_client_test(id int,vname varchar2(10)); 表已创建。 SQL> insert into taf_client_test values(1,'taf'); 已创建 1 行。 SQL> update taf_client_test set id=5; 已更新 1 行。 SQL> select * from taf_client_test; ID VNAME ---------- -------------------- 5 taf
4、更新完毕不commit,停掉racdb2实例:
[oracle@rac122 tnslsnr]$ srvctl status instance -d racdb -i racdb2 Instance racdb2 is running on node rac122 [oracle@rac122 tnslsnr]$ srvctl stop instance -d racdb -i racdb2 -o abort [oracle@rac122 tnslsnr]$ srvctl status instance -d racdb -i racdb2 Instance racdb2 is not running on node rac122 [oracle@rac122 tnslsnr]$ srvctl status instance -d racdb -i racdb1 Instance racdb1 is running on node rac121 --还有一个节点在工作
5、回到客户端作如下操作:
SQL> select * from taf_client_test; select * from taf_client_test * 第 1 行出现错误: ORA-25402: 事务处理必须重新运行 SQL> rollback; 回退已完成。 SQL> select instance_name from v$instance; INSTANCE_NAME -------------------------------- racdb1 SQL> show parameter instance_name NAME TYPE VALUE ---------------- ------------- ----------- instance_name string racdb1 --发现自动切换到节点1, SQL> select * from taf_client_test; 未选定行 SQL> select * from v$mystat where rownum=1; SID STATISTIC# VALUE ---------- ---------- ---------- 166 0 0 SQL> select failover_type,failover_method,failed_over from v$session where sid=166; FAILOVER_TYPE FAILOVER_METHOD FAILED -------------------------- -------------------- ------ SELECT BASIC YES
到此,完成了TAF的配置与验证。
如何在RAC服务器端配置TAF
服务端TAF配置会覆盖客户端TNS连接串中配置的TAF。如果客户端没有配置TAF,在最小配置模式,failover类型必须设置为启用TAF。如果failover type是在服务端设置,那么faliover method默认为BASIC。DELAY和RETRIES参数是可选的,可分别指定。
1在RAC上添加一个service
注意:service name是唯一的,并且不能与默认创建服务名一致。默认情况下,Oracle RAC数据库会创建一个特殊的数据库服务。这个默认的服务在RAC环境中对所有实例总是可用,除非实例在限制模式。并且,DBA不能修改这个默认的服务及其属性。
添加service的语法:
srvctl add service -d orcl -s taf_orcl.oracle.com -r "orcl1,orcl2" -P BASIC
注:参数-d:数据库名;-s:服务名;-r:首选实例名;-P:TAF策略
2启动service
srvctl start service -d orcl -s taf_orcl.oracle.com
3检查service运行状态
srvctl config service -d orcl
4获取创建的service_id
select name,service_id from dba_services where name = 'taf_orcl.oracle.com';
5检查service的标准配置
col failover_method format a11 heading 'METHOD'
col failover_type format a10 heading 'TYPE'
col failover_retries format 9999999 heading 'RETRIES'
col goal format a10
col clb_goal format a8
col AQ_HA_NOTIFICATIONS format a5 heading 'AQNOT'
SQL>select name, failover_method, failover_type, failover_retries,goal, clb_goal,aq_ha_notifications
from dba_services where service_id = X
注意:默认情况下,method,type,retries是没有值的,这些值对于服务端的TAF是必须的。这是由于srvctl命令添加服务时,没有修改DBMS_SERVICE参数,在11.2之后,该bug解决。
6添加failover参数
--11.2之前:
SQL> execute dbms_service.modify_service (service_name => 'taf_orcl.oracle.com' -
, aq_ha_notifications => true -
, failover_method => dbms_service.failover_method_basic -
, failover_type => dbms_service.failover_type_select -
, failover_retries => 180 -
, failover_delay => 5 -
, clb_goal => dbms_service.clb_goal_long);
PL/SQL procedure successfully completed.
注:由于TAF的method为BASIC,BASIC是当前支持的唯一值,不再支持PRECONNECT。
--11.2开始:
srvctl add service -d db_unique_name -s service_name -t edition_name{-r preferred_list [-a available_list]} | {-g server_pool[-c {UNIFORM | SINGLETON}] [-k net_number]} [-P {BASIC | NONE}] [-l {[PRIMARY] | [PHYSICAL_STANDBY] | [LOGICAL_STANDBY] | [SNAPSHOT_STANDBY]}] [-y {AUTOMATIC | MANUAL}] [-q {TRUE | FALSE}] [-x {TRUE | FALSE}] [-j {SHORT | LONG}] [-B {NONE | SERVICE_TIME | THROUGHPUT}] [-e {NONE |SESSION | SELECT}] [-m {NONE | BASIC}] [-z failover_retries] [-w failover_delay]
7检查service及method、retries
SQL>select name, failover_method, failover_type, failover_retries,goal, clb_goal,aq_ha_notifications from dba_services where service_id = X
8检查监听是否已注册新创建的service
lsnrctl services
当然,还可以通过DBCA或者Oracle OEM来配置RAC Server。本文不再详述。
配置Server Side TAF
实验环境:Oracle 11.2.0.4 RAC
参考MOS文档:
How To Configure Server Side Transparent Application Failover (文档 ID 460982.1)
- 1.为设置TAF在RAC集群上新建服务
- 2.启动server_taf服务
- 3.检查确认服务正在运行
- 4.找到刚创建服务的service_id
- 5.根据service_id审查服务的信息
- 6.给服务添加server side failover参数
- 7.再次审查服务可以看到Method, Type和Retries值
- 8.检查已注册的服务的监听信息
- 9.创建网络服务名
- 10.测试TAF功能
1.为设置TAF在RAC集群上新建服务
eg: srvctl add service -d rac -s server_taf -r "rac1,rac2" -P BASIC
使用oracle用户在RAC集群上新建服务server_taf:
[oracle@jyrac1 ~]$ srvctl add service -d jyzhao -s server_taf -r "jyzhao1,jyzhao2" -P BASIC
[oracle@jyrac1 ~]$
注意不能使用grid用户操作,如果使用grid 用户执行的话,会报错:
[grid@jyrac1 ~]$ srvctl add service -d jyzhao -s server_taf -r "jyzhao1,jyzhao2" -P BASIC
PRCD-1288 : User is not authorized to create service server_taf for database jyzhao
PRKH-1014 : Current user "grid" is not the oracle owner user "oracle" of oracle home "/opt/app/oracle/product/11.2.0/dbhome_1"
2.启动server_taf服务
eg: srvctl start service -d rac -s server_taf
启动server_taf服务
[oracle@jyrac1 ~]$ srvctl start service -d jyzhao -s server_taf
3.检查确认服务正在运行
eg: srvctl config service -d rac
检查确认服务正在运行:
[oracle@jyrac1 ~]$ srvctl config service -d jyzhao
Service name: server_taf
Service is enabled
Server pool: jyzhao_server_taf Cardinality: 2 Disconnect: false Service role: PRIMARY
Management policy: AUTOMATIC
DTP transaction: false AQ HA notifications: false Failover type: NONE
Failover method: NONE
TAF failover retries: 0 TAF failover delay: 0 Connection Load Balancing Goal: LONG
Runtime Load Balancing Goal: NONE
TAF policy specification: BASIC Edition: Preferred instances: jyzhao1,jyzhao2
Available instances:
4.找到刚创建服务的service_id
eg: select name,service_id from dba_services where name = 'server_taf';
找到刚创建服务的service_id
SQL> select name,service_id from dba_services where name = 'server_taf'; NAME SERVICE_ID
---------------------------------------------------------------- ----------
server_taf 7
5.根据service_id审查服务的信息
col name format a15
col failover_method format a11 heading 'METHOD'
col failover_type format a10 heading 'TYPE'
col failover_retries format 9999999 heading 'RETRIES'
col goal format a10
col clb_goal format a8
col AQ_HA_NOTIFICATIONS format a5 heading 'AQNOT'
select name, failover_method, failover_type, failover_retries,goal, clb_goal,aq_ha_notifications
from dba_services where service_id = 7;
根据service_id审查服务的信息:
SQL> col name format a15
SQL> col failover_method format a11 heading 'METHOD' SQL> col failover_type format a10 heading 'TYPE' SQL> col failover_retries format 9999999 heading 'RETRIES' SQL> col goal format a10
SQL> col clb_goal format a8
SQL> col AQ_HA_NOTIFICATIONS format a5 heading 'AQNOT' SQL> select name, failover_method, failover_type, failover_retries,goal, clb_goal,aq_ha_notifications 2 from dba_services where service_id = 7 3 ; NAME METHOD TYPE RETRIES GOAL CLB_GOAL AQNOT
--------------- ----------- ---------- -------- ---------- -------- -----
server_taf NONE NONE 0 NONE LONG NO
SQL>
6.给服务添加server side failover参数
execute dbms_service.modify_service (service_name => 'server_taf' -
, aq_ha_notifications => true -
, failover_method => dbms_service.failover_method_basic -
, failover_type => dbms_service.failover_type_select -
, failover_retries => 180 -
, failover_delay => 5 -
, clb_goal => dbms_service.clb_goal_long);
11.2版本可以使用srvctl 修改服务的信息:
srvctl modify service -d RAC -s server_taf -m BASIC -e SELECT -q TRUE -j LONG
给服务添加server side failover参数:
SQL> execute dbms_service.modify_service (service_name => 'server_taf' -
> , aq_ha_notifications => true -
> , failover_method => dbms_service.failover_method_basic -
> , failover_type => dbms_service.failover_type_select -
> , failover_retries => 180 -
> , failover_delay => 5 -
> , clb_goal => dbms_service.clb_goal_long);
PL/SQL procedure successfully completed.
7.再次审查服务可以看到Method, Type和Retries值
select name, failover_method, failover_type, failover_retries,goal, clb_goal,aq_ha_notifications
from dba_services where service_id = 7;
再次审查服务可以看到Method, Type和Retries值:
SQL> select name, failover_method, failover_type, failover_retries,goal, clb_goal,aq_ha_notifications 2 from dba_services where service_id = 7; NAME METHOD TYPE RETRIES GOAL CLB_GOAL AQNOT
--------------- ----------- ---------- -------- ---------- -------- -----
server_taf BASIC SELECT 180 NONE LONG YES
8.检查已注册的服务的监听信息
lsnrctl services
Service "server_taf.za.oracle.com" has 2 instance(s).
Instance "rac1", status READY, has 2 handler(s) for this service...
Handler(s):
"DEDICATED" established:0 refused:0 state:ready
REMOTE SERVER
(ADDRESS=(PROTOCOL=TCP)(HOST=dell01)(PORT=1521))
"DEDICATED" established:0 refused:0 state:ready
LOCAL SERVER
Instance "rac2", status READY, has 1 handler(s) for this service...
Handler(s):
"DEDICATED" established:0 refused:0 state:ready
REMOTE SERVER
(ADDRESS=(PROTOCOL=TCP)(HOST=dell02)(PORT=1521))
我这里版本差异,显示有区别,分别在不同节点显示自己的实例:
--node1: Service "server_taf" has 1 instance(s).
Instance "jyzhao1", status READY, has 1 handler(s) for this service... Handler(s): "DEDICATED" established:0 refused:0 state:ready LOCAL SERVER The command completed successfully --node2: Service "server_taf" has 1 instance(s). Instance "jyzhao2", status READY, has 1 handler(s) for this service... Handler(s): "DEDICATED" established:0 refused:0 state:ready LOCAL SERVER The command completed successfully
9.创建网络服务名
SERVERTAF =
(DESCRIPTION =
(LOAD_BALANCE = yes)
(ADDRESS = (PROTOCOL = TCP)(HOST = dell01)(PORT = 1521))
(ADDRESS = (PROTOCOL = TCP)(HOST = dell02)(PORT = 1521))
(CONNECT_DATA =
(SERVICE_NAME = server_taf.za.oracle.com)
)
)
服务端RAC所有节点配置tnsnames.ora,添加内容:
SERVERTAF =
(DESCRIPTION =
(LOAD_BALANCE = yes)
(ADDRESS = (PROTOCOL = TCP)(HOST = jyrac1)(PORT = 1521))
(ADDRESS = (PROTOCOL = TCP)(HOST = jyrac2)(PORT = 1521))
(CONNECT_DATA =
(SERVICE_NAME = server_taf)
)
)
sqlplus system/oracle@192.168.56.160/server_taf
10.测试TAF功能
select host_name,instance_name from v$instance;
SQL> select instance_name from V$instance;
INSTANCE_NAME
----------------
rac2
SQL> shutdown abort;
ORACLE instance shut down.
select host_name,instance_name from v$instance;
10.1 模拟客户端使用scanVIP测试能否实现TAF
sqlplus system/oracle@192.168.56.160/server_taf
[grid@jyrac1 ~]$ sqlplus system/oracle@192.168.56.160/server_taf
SQL*Plus: Release 11.2.0.4.0 Production on Fri Mar 10 02:59:53 2017 Copyright (c) 1982, 2013, Oracle. All rights reserved.
Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Data Mining and Real Application Testing options
SQL> select host_name,instance_name from v$instance;
HOST_NAME ---------------------------------------------------------------- INSTANCE_NAME ---------------- jyrac1
jyzhao1 --这里强制关掉jyzhao1实例。 SQL> /
HOST_NAME ---------------------------------------------------------------- INSTANCE_NAME ---------------- jyrac2
jyzhao2
10.1 结论: 可以实现TAF功能,相当于客户端不再需要配置,直接通过SCAN VIP连接。
10.2 模拟客户端使用Public IP测试能否实现TAF
sqlplus system/oracle@192.168.56.150/server_taf
SQL> select host_name,instance_name from v$instance;
HOST_NAME ---------------------------------------------------------------- INSTANCE_NAME ---------------- jyrac1
jyzhao1 --这里强制关掉jyzhao1实例。 SQL> / select host_name,instance_name from v$instance
*
ERROR at line 1:
ORA-12153: TNS:not connected
Process ID: 20116 Session ID: 24 Serial number: 7
如果客户端配置tnsnames.ora,将publicIP配置
TAF =
(DESCRIPTION =
(LOAD_BALANCE = yes)
(ADDRESS = (PROTOCOL = TCP)(HOST = 192.168.56.150)(PORT = 1521))
(ADDRESS = (PROTOCOL = TCP)(HOST = 192.168.56.152)(PORT = 1521))
(CONNECT_DATA =
(SERVICE_NAME = server_taf)
)
)
再次测试:
[oracle@jyrac2 admin]$ sqlplus system/oracle@taf
SQL*Plus: Release 11.2.0.4.0 Production on Fri Mar 10 05:11:30 2017 Copyright (c) 1982, 2013, Oracle. All rights reserved.
Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Data Mining and Real Application Testing options
SQL> select host_name,instance_name from v$instance;
HOST_NAME ---------------------------------------------------------------- INSTANCE_NAME ---------------- jyrac2
jyzhao2 --这里强制关掉jyzhao2实例。 SQL> /
HOST_NAME ---------------------------------------------------------------- INSTANCE_NAME ---------------- jyrac1
jyzhao1
10.2 结论: 直接连接Public IP无法实现TAF功能。但客户端配置Public IP列表,可以实现。
10.3 模拟客户端使用VIP测试能否实现TAF
sqlplus system/oracle@192.168.56.151/server_taf
[grid@jyrac1 ~]$ sqlplus system/oracle@192.168.56.151/server_taf
SQL*Plus: Release 11.2.0.4.0 Production on Fri Mar 10 04:32:20 2017 Copyright (c) 1982, 2013, Oracle. All rights reserved.
Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Data Mining and Real Application Testing options
SQL> select host_name,instance_name from v$instance;
HOST_NAME
----------------------------------------------------------------
INSTANCE_NAME
----------------
jyrac1
jyzhao1
SQL> / select host_name,instance_name from v$instance
* ERROR at line 1:
ORA-03113: end-of-file on communication channel
Process ID: 32459 Session ID: 159 Serial number: 3
如果客户端配置tnsnames.ora,可以通过sqlplus system/oracle@tafvip连接。
TAFVIP =
(DESCRIPTION =
(LOAD_BALANCE = yes)
(ADDRESS = (PROTOCOL = TCP)(HOST = 192.168.56.151)(PORT = 1521))
(ADDRESS = (PROTOCOL = TCP)(HOST = 192.168.56.153)(PORT = 1521))
(CONNECT_DATA =
(SERVICE_NAME = server_taf)
)
)
再次测试:
[oracle@jyrac2 admin]$ sqlplus system/oracle@tafvip
SQL*Plus: Release 11.2.0.4.0 Production on Fri Mar 10 05:15:32 2017 Copyright (c) 1982, 2013, Oracle. All rights reserved.
Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Data Mining and Real Application Testing options
SQL> select host_name,instance_name from v$instance;
HOST_NAME ---------------------------------------------------------------- INSTANCE_NAME ---------------- jyrac2
jyzhao2 --这里强制关掉jyzhao2实例。 SQL> /
HOST_NAME ---------------------------------------------------------------- INSTANCE_NAME ---------------- jyrac1
jyzhao1
10.3 结论: 直接连接VIP无法实现TAF功能。但客户端配置VIP列表,可以实现。
官方文档:http://docs.oracle.com/cd/E11882_01/network.112/e41945/advcfg.htm#NETAG455
About Transparent Application Failover
TAF is a client-side feature that allows clients to reconnect to surviving databases in the event of a failure of a database instance. Notifications are used by the server to trigger TAF callbacks on the client-side.
TAF is configured using either client-side specified Transparent Network Substrate (TNS) connect string or using server-side service attributes. If both methods are used to configure TAF, then the server-side service attributes supersede the client-side settings. Server-side service attributes are the preferred way to set up TAF.
TAF can operate in one of two modes, Session Failover and Select Failover. Session Failover re-creates lost connections and sessions. Select Failover replays queries that were in progress.
When there is a failure, callback functions are initiated on the client-side using Oracle Call Interface (OCI) callbacks. This works with standard OCI connections as well as Connection Pool and Session Pool connections.
TAF operates with Oracle Data Guard to provide automatic failover. TAF works with the following database configurations to effectively mask a database failure:
-
Oracle Real Application Clusters
-
Replicated systems
-
Standby databases
-
Single instance Oracle database
See Also:
-
Oracle Real Application Clusters Installation and Configuration Guide
-
Oracle Real Application Clusters Administration and Deployment Guide
-
Oracle Call Interface Programmer's Guide for more details on callbacks, connection pools, and session pools
What Transparent Application Failover Restores
TAF automatically restores some or all of the following elements associated with active database connections. Other elements may need to be embedded in the application code to enable TAF to recover the connection.
-
Client-server database connections: TAF automatically reestablishes the connection using the same connect string or an alternate connect string that you specify when configuring failover.
-
Users' database sessions: TAF automatically logs a user in with the same user ID as was used before the failure. If multiple users were using the connection, then TAF automatically logs them in as they attempt to process database commands. Unfortunately, TAF cannot automatically restore other session properties. These properties can be restored by invoking a callback function.
-
Completed commands: If a command was completed at the time of connection failure, and it changed the state of the database, then TAF does not resend the command. If TAF reconnects in response to a command that may have changed the database, then TAF issues an error message to the application.
-
Open cursors used for fetching: TAF allows applications that began fetching rows from a cursor before failover to continue fetching rows after failover. This is called select failover. It is accomplished by re-running a SELECT statement using the same snapshot, discarding those rows already fetched and retrieving those rows that were not fetched initially. TAF verifies that the discarded rows are those that were returned initially, or it returns an error message.
-
Active transactions: Any active transactions are rolled back at the time of failure because TAF cannot preserve active transactions after failover. The application instead receives an error message until a ROLLBACK is submitted.
-
Server-side program variables: Server-side program variables, such as PL/SQL package states, are lost during failures, and TAF cannot recover them. They can be initialized by making a call from the failover callback.
See Also:
Oracle Call Interface Programmer's GuideAbout FAILOVER_MODE Parameters
The FAILOVER_MODE parameter must be included in the CONNECT_DATA section of a connect descriptor. FAILOVER_MODE can contain the parameters described in Table 13-4.
Table 13-4 Additional Parameters of the FAILOVER_MODE Parameter
FAILOVER_MODE Parameters | Description |
---|---|
A different net service name for backup connections. A backup should be specified when usingpreconnect to pre-establish connections. |
|
DELAY |
The amount of time in seconds to wait between connect attempts. If RETRIES is specified, thenDELAY defaults to one second. If a callback function is registered, then this parameter is ignored. |
Setting for fast failover from the primary node to the backup node:
|
|
RETRIES |
The number of times to attempt to connect after a failover. If DELAY is specified, then RETRIESdefaults to five retry attempts. If a callback function is registered, then this parameter is ignored. |
The type of failover. Three types of Oracle Net failover functionality are available by default to Oracle Call Interface (OCI) applications:
|
Note:
Oracle Net Manager does not provide support for TAF parameters. These parameters must be set manually.Implementing Transparent Application Failover
Important:
Do not set the GLOBAL_DBNAME parameter in the SID_LIST_listener_name section of the listener.ora file. A statically configured global database name disables TAF.Depending on the FAILOVER_MODE parameters, you can implement TAF in several ways. Oracle recommends the following methods:
TAF with Connect-Time Failover and Client Load Balancing
Implement TAF with connect-time failover and client load balancing for multiple addresses. In the following example, Oracle Net connects randomly to one of the protocol addresses on sales1-serveror sales2-server. If the instance fails after the connection, then the TAF application fails over to the other node's listener, reserving any SELECT statements in progress.
sales.us.example.com=
(DESCRIPTION= (LOAD_BALANCE=on) (FAILOVER=on) (ADDRESS=
(PROTOCOL=tcp)
(HOST=sales1-server)
(PORT=1521))
(ADDRESS=
(PROTOCOL=tcp)
(HOST=sales2-server)
(PORT=1521))
(CONNECT_DATA=
(SERVICE_NAME=sales.us.example.com) (FAILOVER_MODE= (TYPE=select) (METHOD=basic))))
TAF Retrying a Connection
TAF also provides the ability to automatically retry connecting if the first connection attempt fails with the RETRIES and DELAY parameters. In the following example, Oracle Net tries to reconnect to the listener on sales1-server. If the failover connection fails, then Oracle Net waits 15 seconds before trying to reconnect again. Oracle Net attempts to reconnect up to 20 times.
sales.us.example.com=
(DESCRIPTION=
(ADDRESS=
(PROTOCOL=tcp)
(HOST=sales1-server)
(PORT=1521))
(CONNECT_DATA=
(SERVICE_NAME=sales.us.example.com) (FAILOVER_MODE= (TYPE=select) (METHOD=basic) (RETRIES=20) (DELAY=15))))
TAF Pre-establishing a Connection
A backup connection can be pre-established. The initial and backup connections must be explicitly specified. In the following example, clients that use net service name sales1.us.example.com to connect to the listener on sales1-server are also preconnected to sales2-server. If sales1-server fails after the connection, then Oracle Net fails over to sales2-server, preserving anySELECT statements in progress. Similarly, Oracle Net preconnects to sales1-server for those clients that use sales2.us.example.com to connect to the listener on sales2-server.
sales1.us.example.com=
(DESCRIPTION=
(ADDRESS=
(PROTOCOL=tcp)
(HOST=sales1-server)
(PORT=1521))
(CONNECT_DATA=
(SERVICE_NAME=sales.us.example.com)
(INSTANCE_NAME=sales1) (FAILOVER_MODE= (BACKUP=sales2.us.example.com) (TYPE=select) (METHOD=preconnect))))
sales2.us.example.com=
(DESCRIPTION=
(ADDRESS=
(PROTOCOL=tcp)
(HOST=sales2-server)
(PORT=1521))
(CONNECT_DATA=
(SERVICE_NAME=sales.us.example.com)
(INSTANCE_NAME=sales2) (FAILOVER_MODE= (BACKUP=sales1.us.example.com) (TYPE=select) (METHOD=preconnect))))
Verifying Transparent Application Failover
You can query the FAILOVER_TYPE, FAILOVER_METHOD, and FAILED_OVER columns in the V$SESSION view to verify that TAF is correctly configured. To view the columns, use a query similar to the following:
SELECT MACHINE, FAILOVER_TYPE, FAILOVER_METHOD, FAILED_OVER, COUNT(*) FROM V$SESSION GROUP BY MACHINE, FAILOVER_TYPE, FAILOVER_METHOD, FAILED_OVER;
The output before failover looks similar to the following:
MACHINE FAILOVER_TYPE FAILOVER_M FAI COUNT(*) -------------------- ------------- ---------- --- ---------- sales1 NONE NONE NO 11 sales2 SELECT PRECONNECT NO 1
The output after failover looks similar to the following:
MACHINE FAILOVER_TYPE FAILOVER_M FAI COUNT(*) -------------------- ------------- ---------- --- ---------- sales2 NONE NONE NO 10 sales2 SELECT PRECONNECT YES 1
Note:
You can monitor each step of TAF using an appropriately configured OCI TAF CALLBACK function.See Also:
-
Oracle Database Reference for additional information about the V$SESSION view
Oracle RAC failover 测试(Server TAF方式)
Oracle RAC中,除了基于客户端的TAF方式之外,还有基于服务器端的TAF方式,可以把服务端的TAF方式看作是客户端TAF方式的一个升级版吧。服务器端的TAF,当然是需要在服务器端进行配置了,这个是通过Service来完成的。本文主要描述Oracle 10g rac 下通过service方式配置服务器端的TAF。
下面是一些关于这方面的基础参考链接:
有关负监听配置,载均衡(load balance)请参考
ORACLE RAC 监听配置 (listener.ora tnsnames.ora)
ORACLE RAC 下非缺省端口监听配置(listener.ora tnsnames.ora)
Oracle RAC 客户端连接负载均衡(Load Balance)
Oracle RAC 服务器端连接负载均衡(Load Balance)
Oracle RAC 负载均衡测试(结合服务器端与客户端)
有关Oracle RAC failover 连接时故障转移请参考
Oracle RAC failover 测试(连接时故障转移)
Oracle RAC failover 测试(TAF方式)
有关Services的创建请参考
Services in Oracle Database 10g
再说 Oracle RAC services
- 1、服务器端、客户端的环境
- #服务器端环境,host信息
- oracle@bo2dbp:~> cat /etc/hosts |grep vip
- 192.168.7.61 bo2dbp-vip.2gotrade.com bo2dbp-vip
- 192.168.7.62 bo2dbs-vip.2gotrade.com bo2dbs-vip
- #服务器端环境,集群信息
- oracle@bo2dbp:~> ./crs_stat.sh
- Resource name Target State
- -------------- ------ -----
- ora.GOBO4.GOBO4A.inst ONLINE ONLINE on bo2dbp
- ora.GOBO4.GOBO4B.inst ONLINE ONLINE on bo2dbs
- ora.GOBO4.db ONLINE ONLINE on bo2dbp
- ora.bo2dbp.ASM1.asm ONLINE ONLINE on bo2dbp
- ora.bo2dbp.LISTENER_BO2DBP.lsnr ONLINE ONLINE on bo2dbp
- ora.bo2dbp.LISTENER_ORA10G_BO2DBP.lsnr ONLINE ONLINE on bo2dbp
- ora.bo2dbp.gsd ONLINE ONLINE on bo2dbp
- ora.bo2dbp.ons ONLINE ONLINE on bo2dbp
- ora.bo2dbp.vip ONLINE ONLINE on bo2dbp
- ora.bo2dbs.ASM2.asm ONLINE ONLINE on bo2dbs
- ora.bo2dbs.LISTENER_BO2DBS.lsnr ONLINE ONLINE on bo2dbs
- ora.bo2dbs.LISTENER_ORA10G_BO2DBS.lsnr ONLINE ONLINE on bo2dbs
- ora.bo2dbs.gsd ONLINE ONLINE on bo2dbs
- ora.bo2dbs.ons ONLINE ONLINE on bo2dbs
- ora.bo2dbs.vip ONLINE ONLINE on bo2dbs
- ora.ora10g.db ONLINE ONLINE on bo2dbp
- #客户端环境
- robin@SZDB:~> cat /etc/issue
- Welcome to SUSE Linux Enterprise Server 10 SP3 (x86_64) - Kernel \r (\l).
- robin@SZDB:~> sqlplus -v
- SQL*Plus: Release 10.2.0.3.0 - Production
- #客户端tnsnames配置
- GOBO4 =
- (DESCRIPTION =
- (ADDRESS = (PROTOCOL = TCP)(HOST = 192.168.7.61)(PORT = 1521))
- (ADDRESS = (PROTOCOL = TCP)(HOST = 192.168.7.62)(PORT = 1521))
- (LOAD_BALANCE = yes)
- (CONNECT_DATA =
- (SERVER = DEDICATED)
- (SERVICE_NAME = TAF) #注意我们客户端的SERVICE_NAME,我们设置为TAF
- )
- )
- 2、在服务器端配置service
- 配置service有多种方式,如dbca,oem,srvctl命令行。下面直接以命令行方式配置
- 关于什么是service以及如何使用srvctl命令行创建service,请参考: http://blog.csdn.net/robinson_0612/article/details/8124232
- oracle@bo2dbp:~> srvctl add service -d GOBO4 -s TAF -r GOBO4A -a GOBO4B -P basic
- oracle@bo2dbp:~> srvctl start service -d GOBO4 -s TAF
- oracle@bo2dbp:~> ./crs_stat.sh | grep TAF
- ora.GOBO4.TAF.GOBO4A.srv ONLINE ONLINE on bo2dbp
- ora.GOBO4.TAF.cs ONLINE ONLINE on bo2dbp
- oracle@bo2dbp:~> srvctl config service -d GOBO4 -a
- TAF PREF: GOBO4A AVAIL: GOBO4B TAF: basic
- oracle@bo2dbp:~> export ORACLE_SID=GOBO4A
- oracle@bo2dbp:~> sqlplus / as sysdba
- SQL*Plus: Release 10.2.0.3.0 - Production on Mon Dec 17 14:55:02 2012
- Copyright (c) 1982, 2006, Oracle. All Rights Reserved.
- Connected to:
- Oracle Database 10g Release 10.2.0.3.0 - 64bit Production
- With the Real Application Clusters option
- SQL> show parameter service
- NAME TYPE VALUE
- ------------------------------------ ----------- ------------------------------
- service_names string SYS$SYS.KUPC$S_1_2012102317304
- 4.GOBO4, SYS$SYS.KUPC$C_1_2012
- 1023173044.GOBO4, GOBO4, TAF
- SQL> begin
- 2 dbms_service.modify_service(
- 3 service_name=>'TAF',
- 4 failover_method =>dbms_service.failover_method_basic,
- 5 failover_type =>dbms_service.failover_type_select,
- 6 failover_retries =>180,
- 7 failover_delay=>5);
- 8 end;
- 9 /
- PL/SQL procedure successfully completed.
- SQL> select name,failover_method,failover_type,goal,clb_goal from dba_services
- 2 where name='TAF';
- NAME FAILOVER_METHOD FAILOVER_TYPE GOAL CLB_G
- -------------------- -------------------- --------------- ------------ -----
- TAF BASIC SELECT LONG
- SQL> ho lsnrctl status
- ..........
- Service "TAF" has 1 instance(s).
- Instance "GOBO4A", status READY, has 2 handler(s) for this service...
- The command completed successfully
- 3、测试服务器端TAF
- robin@SZDB:~> sqlplus fail_over/fail@gobo4
- fail_over@GOBO4> get verify.sql
- 1 REM the following query is for TAF connection verification
- 2 col sid format 99999
- 3 col serial# format 9999999
- 4 col failover_type format a13
- 5 col failover_method format a15
- 6 col failed_over format a11
- 7 Prompt
- 8 Prompt Failover status for current user
- 9 Prompt ============================================
- 10 SELECT sid,
- 11 serial#,
- 12 failover_type,
- 13 failover_method,
- 14 failed_over
- 15 FROM v$session
- 16 WHERE username = 'FAIL_OVER';
- 17 REM the following query is for load balancing verification
- 18 col host_name format a20
- 19 Prompt
- 20 Prompt Current instance name and host name
- 21 Prompt ========================================
- 22* SELECT instance_name,host_name FROM v$instance;
- 23
- #下面的连接查询中表明客户端当前连接到了节点bo2dbp,其实例名为GOBO4A
- fail_over@GOBO4> @verify
- Failover status for current user
- ============================================
- SID SERIAL# FAILOVER_TYPE FAILOVER_METHOD FAILED_OVER
- ------ -------- ------------- --------------- -----------
- 1073 48 SELECT BASIC NO
- Current instance name and host name
- ========================================
- INSTANCE_NAME HOST_NAME
- ---------------- --------------------
- GOBO4A bo2dbp
- #此时停止节点bo2dbp
- oracle@bo2dbp:~> srvctl stop instance -d GOBO4 -i GOBO4A
- #查看停止节点bo2dbp即实例GOBO4A后的结果
- oracle@bo2dbp:~> ./crs_stat.sh |grep inst
- ora.GOBO4.GOBO4A.inst OFFLINE OFFLINE
- ora.GOBO4.GOBO4B.inst ONLINE ONLINE on bo2dbs
- #此时回到客户端再次执行查询,FAILED_OVER的值已经变成YES,即表明当前的session为failover过来的
- #同时实例名和节点名也发生了变化
- fail_over@GOBO4> set timing on;
- fail_over@GOBO4> @verify
- Failover status for current user
- ============================================
- SID SERIAL# FAILOVER_TYPE FAILOVER_METHOD FAILED_OVER
- ------ -------- ------------- --------------- -----------
- 1082 396 SELECT BASIC YES
- Elapsed: 00:00:04.19
- Current instance name and host name
- ========================================
- INSTANCE_NAME HOST_NAME
- ---------------- --------------------
- GOBO4B bo2dbs
- Elapsed: 00:00:00.01
- 4、小结
- a、服务器端的TAF方式的failover通过在服务器端配置service来完成
- b、服务器端的TAF方式与客户端的TAF方式产生同样的效果
- b、一旦在服务端配置了基于服务器端的TAF,客户端再无需通过在客户端添加FAILOVER_MODE项
- c、该方式简化客户端配置,通过集中统一管理service实现failover
Oracle RAC 负载均衡测试(结合服务器端与客户端)
Oracle RAC 负载均衡使得从客户端发起的连接能够有效地分配到监听器负载较小的实例上。有两种方式实现客户端负载均衡,一是通过配置客户端的load_balance,一是通过配置服务器端的remote_listener参数。两种方式各有优劣,而且两者并不相互排斥,因此可以结合两种方式来更加有效的实现负载均衡。本文将描述两者结合的使用情况(oralce 10g rac)。
有关客户端与服务端负载均衡的单独测试请参考:
Oracle RAC 客户端连接负载均衡(Load Balance)
Oracle RAC 服务器端连接负载均衡(Load Balance)
本文的测试将结合前篇文章使用的脚本与样例,是前两篇测试的一个总结。
一、配置需求
- 1、服务器端各节点监听器正常提供服务,如果使用非缺省的1521端口,请参考 ORACLE RAC 下非缺省端口监听配置(listener.ora tnsnames.ora)
- oracle@bo2dbp:~> more $ORACLE_HOME/network/admin/listener.ora
- # listener.ora.bo2dbp Network Configuration File: /u01/oracle/db/network/admin/listener.ora.bo2dbp
- # Generated by Oracle configuration tools.
- LISTENER_BO2DBP =
- (DESCRIPTION_LIST =
- (DESCRIPTION =
- (ADDRESS = (PROTOCOL = TCP)(HOST = bo2dbp-vip.2gotrade.com)(PORT = 1521)(IP = FIRST))
- (ADDRESS = (PROTOCOL = TCP)(HOST = 192.168.7.51)(PORT = 1521)(IP = FIRST))
- )
- )
- SID_LIST_LISTENER_BO2DBP =
- (SID_LIST =
- (SID_DESC =
- (SID_NAME = PLSExtProc)
- (ORACLE_HOME = /u01/oracle/db)
- (PROGRAM = extproc)
- )
- )
- oracle@bo2dbp:~> lsnrctl status
- Service "GOBO4" has 2 instance(s).
- Instance "GOBO4A", status READY, has 2 handler(s) for this service...
- Instance "GOBO4B", status READY, has 1 handler(s) for this service...
- 2、服务器端的remote_listener参数设置
- 要求remote_listener参数的连接标识符在服务器端的tnsnames.ora中有对应的条目
- SQL> show parameter listener
- NAME TYPE VALUE
- ------------------------------------ ----------- ------------------------------
- local_listener string
- remote_listener string remote_lsnr_gobo4
- oracle@bo2dbp:~> more $ORACLE_HOME/network/admin/tnsnames.ora
- # tnsnames.ora Network Configuration File: /u01/oracle/db/network/admin/tnsnames.ora
- # Generated by Oracle configuration tools.
- remote_lsnr_gobo4 =
- (ADDRESS_LIST =
- (ADDRESS = (PROTOCOL = TCP)(HOST = 192.168.7.61)(PORT = 1521))
- (ADDRESS = (PROTOCOL = TCP)(HOST = 192.168.7.62)(PORT = 1521))
- )
- 3、客户端tnsnames.ora中启用load_balance
- oracle@SZDB:~> tail -11 $ORACLE_HOME/network/admin/tnsnames.ora
- GOBO4 =
- (DESCRIPTION =
- (ADDRESS = (PROTOCOL = TCP)(HOST = 192.168.7.61)(PORT = 1521))
- (ADDRESS = (PROTOCOL = TCP)(HOST = 192.168.7.62)(PORT = 1521))
- (LOAD_BALANCE = yes)
- (CONNECT_DATA =
- (SERVER = DEDICATED)
- (SERVICE_NAME = GOBO4)
- )
- )
二、测试Load Balance
- #还是使用之前的脚步来进行测试
- #Author : Robinson
- #Blog : http://blog.csdn.net/robinson_0612
- oracle@SZDB:~> more load_balance.sh
- #!/bin/bash
- for i in {1..1000}
- do
- echo $i
- sqlplus -S system/oracle@GOBO4 <<eof < span="">
- select instance_name from v\$instance;
- EOF
- sleep 1
- done
- exit 0
- oracle@SZDB:~> ./load_balance.sh >load_bal.log
- #查看日志
- oracle@SZDB:~> grep GOBO4A load_bal.log |wc -l
- 750
- oracle@SZDB:~> grep GOBO4B load_bal.log |wc -l
- 250
- #查看监听器的日志
- oracle@bo2dbp:/u01/oracle/db/network/log> grep establish listener_bo2dbp.log |wc -l
- 894
- oracle@bo2dbp:/u01/oracle/db/network/log> grep INSTANCE_NAME=GOBO4 listener_bo2dbp.log |wc -l
- 415
- #从上面的查询中可以得知,节点bo2dbp总共接受了894个连接请求,而有415连接请求是由bo2dbs转发过来的
- #因此,实际上从客户端发起到bo2dbp的实际连接请求数为894-415=479
- #下面来看在节点bo2dbs上的监听日志
- oracle@bo2dbs:/u01/oracle/db/network/log> grep establish listener_bo2dbs.log |wc -l
- 665
- oracle@bo2dbs:/u01/oracle/db/network/log> grep INSTANCE_NAME=GOBO4 listener_bo2dbs.log |wc -l
- 144
- #从上面的查询中可知,节点bo2dbs总共接受了665个连接请求,而有144个连接请求是由bo2dbp转发过来的
- #因此,实际上从客户端发起到bo2dbs的实际连接请求数为655-144=511
- #从上面的结果可知,
- #基于客户端的连接请求数为节点bo2dbp为479,节点bo2dbs为511
- #监听器路由到本地实例数目为,节点bo2dbp,479-144=335,节点bo2dbs,511-415=96
- #远程监听器路由道本地实例的数据为,节点bo2dpb为415,节点bo2dbs为144
- #监听器路由的概念是指基于服务器端的负载均衡
- #即服务器端的监听器根据自身以及远程监听器的负载情况来确定将当前的连接请求转发到本地或远程,此即为路由。
Oracle RAC failover 测试(连接时故障转移)
Oracle RAC 集群最突出的表现就是高可用性,这些内容主要包括load balance以及failover,通过这些技术使得单点故障不影响客户端端应用程序对数据库的正常访问,以及通过创建service实现节点间负载均衡。本文主要描述Oracle 10g rac环境下的Oracle failover测试。
下面是一些关于这方面的基础参考或相关链接:
有关负监听配置,载均衡(load balance)以及Oracle service请参考
ORACLE RAC 监听配置 (listener.ora tnsnames.ora)
ORACLE RAC 下非缺省端口监听配置(listener.ora tnsnames.ora)
Oracle RAC 客户端连接负载均衡(Load Balance)
Oracle RAC 服务器端连接负载均衡(Load Balance)
Oracle RAC 负载均衡测试(结合服务器端与客户端)
Oracle RAC failover 测试(TAF方式)
Oracle RAC failover 测试(Server TAF方式)
- 1、Oracle failover的几种方式
- Oracle failover也叫故障转移,从Oracle 10g开始,分为3种方式:
- a. Client-Side Connect time Failover
- 客户端连接failover模式,此方式较为简单,只要安装了rac集群,缺省情况下即被启用。
- b. TAF
- 透明故障转移,此方式同样基于客户端完成,需要配置客户端tnsnames.ora,连接故障发生时,无须重新连接
- c. Service-Side TAF
- 服务器端透明故障转移,通过配置service来实现,客户端无须任何配置。
- 本文主要演示第一种情形,即客户端在发起连接请求时如何实现故障转移
- 注意事项: 不能在listener.ora 文件中设置GLOBAL_NAME
- 该参数会禁用Connect-time Failover 和 Transparent Application Failover
- 2、Client-Side Connect time Failover
- 下面关于Client-Side Connect time Failover来自Oracle 的官方描述 ID 453293.1
- The connect-time failover enables clients to connect to another listener if the initial connection to the first
- listener fails. The number of listener protocol addresses determines how many listeners are tried. Without
- connect-time failover, Oracle Net attempts a connection with only one listener. The default is on.
- Tnsnames Parameter: FAILOVER
- (failover=on) is default for ADDRESS_LISTs, DESCRIPTION_LISTs, and a set of
- DESCRIPTIONs., therefore, you do not have to specify it explicitly.
- 基于客户端的failover比较好理解。对于在客户端tnsnames.ora有多个VIP的情形,客户端会首先请求定位到第一个VIP,如果第一个VIP不
- 可达,则继续尝试使用下一个VIP,直到成功建立连接,如果所有的VIP无法连接将收到错误消息。
- 通常情况下,我们使用vip作为tnsnames.ora中的连接地址
- 3、服务器端、客户端的环境
- #服务器端环境,host信息
- oracle@bo2dbp:~> cat /etc/hosts |grep vip
- 192.168.7.61 bo2dbp-vip.2gotrade.com bo2dbp-vip
- 192.168.7.62 bo2dbs-vip.2gotrade.com bo2dbs-vip
- #服务器端环境,集群信息
- oracle@bo2dbp:~> ./crs_stat.sh
- Resource name Target State
- -------------- ------ -----
- ora.GOBO4.GOBO4A.inst ONLINE ONLINE on bo2dbp
- ora.GOBO4.GOBO4B.inst ONLINE ONLINE on bo2dbs
- ora.GOBO4.db ONLINE ONLINE on bo2dbp
- ora.bo2dbp.ASM1.asm ONLINE ONLINE on bo2dbp
- ora.bo2dbp.LISTENER_BO2DBP.lsnr ONLINE ONLINE on bo2dbp
- ora.bo2dbp.LISTENER_ORA10G_BO2DBP.lsnr ONLINE ONLINE on bo2dbp
- ora.bo2dbp.gsd ONLINE ONLINE on bo2dbp
- ora.bo2dbp.ons ONLINE ONLINE on bo2dbp
- ora.bo2dbp.vip ONLINE ONLINE on bo2dbp
- ora.bo2dbs.ASM2.asm ONLINE ONLINE on bo2dbs
- ora.bo2dbs.LISTENER_BO2DBS.lsnr ONLINE ONLINE on bo2dbs
- ora.bo2dbs.LISTENER_ORA10G_BO2DBS.lsnr ONLINE ONLINE on bo2dbs
- ora.bo2dbs.gsd ONLINE ONLINE on bo2dbs
- ora.bo2dbs.ons ONLINE ONLINE on bo2dbs
- ora.bo2dbs.vip ONLINE ONLINE on bo2dbs
- ora.ora10g.db ONLINE ONLINE on bo2dbp
- #客户端环境
- robin@SZDB:~> cat /etc/issue
- Welcome to SUSE Linux Enterprise Server 10 SP3 (x86_64) - Kernel \r (\l).
- robin@SZDB:~> sqlplus -v
- SQL*Plus: Release 10.2.0.3.0 - Production
- #客户端tnsnames配置
- GOBO4 =
- (DESCRIPTION =
- (ADDRESS = (PROTOCOL = TCP)(HOST = 192.168.7.61)(PORT = 1521))
- (ADDRESS = (PROTOCOL = TCP)(HOST = 192.168.7.62)(PORT = 1521))
- (LOAD_BALANCE = yes)
- (CONNECT_DATA =
- (SERVER = DEDICATED)
- (SERVICE_NAME = GOBO4)
- )
- )
- 4、连接测试
- #首次建立连接,此时客户端从tnsnames配置的第一个IP建立连接,即192.168.7.61
- #VIP 192.168.7.61对应的hostname以及instance_name分别为bo2dbp,GOBO4A,所以我们获得如下返回结果
- robin@SZDB:~> sqlplus fail_over/fail@gobo4
- fail_over@GOBO4> get verify.sql
- 1 REM the following query is for TAF connection verification
- 2 col sid format 99999
- 3 col serial# format 9999999
- 4 col failover_type format a13
- 5 col failover_method format a15
- 6 col failed_over format a11
- 7 Prompt
- 8 Prompt Failover status for current user
- 9 Prompt ============================================
- 10 SELECT sid,
- 11 serial#,
- 12 failover_type,
- 13 failover_method,
- 14 failed_over
- 15 FROM v$session
- 16 WHERE username = 'FAIL_OVER';
- 17 REM the following query is for load balancing verification
- 18 col host_name format a20
- 19 Prompt
- 20 Prompt Current instance name and host name
- 21 Prompt ========================================
- 22* SELECT instance_name,host_name FROM v$instance;
- 23
- fail_over@GOBO4> @verify.sql
- Failover status for current user
- ============================================
- SID SERIAL# FAILOVER_TYPE FAILOVER_METHOD FAILED_OVER
- ------ -------- ------------- --------------- -----------
- 1071 249 NONE NONE NO
- Current instance name and host name
- ========================================
- INSTANCE_NAME HOST_NAME
- ---------------- --------------------
- GOBO4A bo2dbp
- #停止集群数据库的第一个instance,即GOBO4A
- oracle@bo2dbp:~> srvctl stop instance -d GOBO4 -i GOBO4A
- #校验结果
- oracle@bo2dbp:~> ./crs_stat.sh | grep inst
- ora.GOBO4.GOBO4A.inst OFFLINE OFFLINE
- ora.GOBO4.GOBO4B.inst ONLINE ONLINE on bo2dbs
- #回到客户端原来的session,此时出现ORA-03114
- fail_over@GOBO4> /
- SELECT instance_name,host_name FROM v$instance
- *
- ERROR at line 1:
- ORA-03113: end-of-file on communication channel
- fail_over@GOBO4> /
- ERROR:
- ORA-03114: not connected to ORACLE
- ERROR:
- ORA-03114: not connected to ORACLE
- #下面尝试重新建立连接
- fail_over@GOBO4> conn fail_over/fail@gobo4
- Connected.
- fail_over@GOBO4> @verify
- Failover status for current user
- ============================================
- SID SERIAL# FAILOVER_TYPE FAILOVER_METHOD FAILED_OVER
- ------ -------- ------------- --------------- -----------
- 1062 94 NONE NONE NO
- Current instance name and host name
- ========================================
- INSTANCE_NAME HOST_NAME
- ---------------- --------------------
- GOBO4B bo2dbs
- #Author : Robinson
- #Blog : http://blog.csdn.net/robinson_0612
- #从上面的查询可知,当前的session已经连接到第二个实例。且FAILOVER_TYPE,FAILOVER_METHOD,FAILED_OVER几个至没有发生任何变化
- #因为这几个参数主要是针对TAF。
- 5、小结
- a、客户端连接时的故障转移,服务器端和客户端无需任何配置,缺省情况下即被开启,即failover=on
- b、只要集群环境存在(非单节点RAC),客户端的连接请求会逐个尝试列出的VIP,直到连接成功为止,如果所有不可连接,返回错误
- c、客户端已经建立后,服务器端实例或节点故障,都将导致客户端必须重新发起新的连接请求
Oracle RAC failover 测试(TAF方式)
Oracle RAC 客户端故障转移(failover),当采用TAF方式时,对于已经建立连接的客户端,在连接的实例或节点出现故障时,客户端无需再次发出连接请求,仍然可以继续之前的数据库操作,此称之为透明故障转移。本文描述基于Oracle 10g rac,客户端TAF方式的故障转移并给出示例。
下面是一些关于这方面的基础参考链接:
有关负监听配置,载均衡(load balance)以及Oracle service请参考
ORACLE RAC 监听配置 (listener.ora tnsnames.ora)
ORACLE RAC 下非缺省端口监听配置(listener.ora tnsnames.ora)
Oracle RAC 客户端连接负载均衡(Load Balance)
Oracle RAC 服务器端连接负载均衡(Load Balance)
Oracle RAC 负载均衡测试(结合服务器端与客户端)
有关Oracle RAC failover 连接时故障转移请参考
Oracle RAC failover 测试(连接时故障转移)
Oracle RAC failover 测试(Server TAF方式)
- 1、TAF描述
- #下面关于TAF来自Oracle 的官方描述 ID 453293.1
- Transparent Application Failover (TAF) is a feature of the Oracle Call Interface (OCI) driver at client side.
- It enables the application to automatically reconnect to a database, if the database instance to which the connection is
- made fails. In this case, the active transactions roll back.
- Tnsnames Parameter: FAILOVER_MODE
- When an instance to which a connection is established fails or is shutdown, the connection on the client side becomes
- stale and would throw exceptions to the caller trying to use it. TAF enables the application to transparently reconnect
- to a preconfigured secondary instance creating a fresh connection, but identical to the connection that was established
- on the first original instance.
- #简单一点来说,就是说对于那些已经成功连接到特定实例的客户端,如果该实例或节点异常宕机,客户端会自动重新发出到剩余实例的连
- #接请求。使得客户端感觉不到它所连接的实例或节点已经出现故障,这个就称之为透明转移。但其间的活动事务将被回滚。
- #通过在客户端的tnsnames.ora中配置FAILOVER_MODE项实现TAF
- 2、服务器端、客户端的环境
- #服务器端环境,host信息
- oracle@bo2dbp:~> cat /etc/hosts |grep vip
- 192.168.7.61 bo2dbp-vip.2gotrade.com bo2dbp-vip
- 192.168.7.62 bo2dbs-vip.2gotrade.com bo2dbs-vip
- #服务器端环境,集群信息
- oracle@bo2dbp:~> ./crs_stat.sh
- Resource name Target State
- -------------- ------ -----
- ora.GOBO4.GOBO4A.inst OFFLINE OFFLINE on bo2dbp #此时节点1上的实例被关闭
- ora.GOBO4.GOBO4B.inst ONLINE ONLINE on bo2dbs
- ora.GOBO4.db ONLINE ONLINE on bo2dbp
- ora.bo2dbp.ASM1.asm ONLINE ONLINE on bo2dbp
- ora.bo2dbp.LISTENER_BO2DBP.lsnr ONLINE ONLINE on bo2dbp
- ora.bo2dbp.LISTENER_ORA10G_BO2DBP.lsnr ONLINE ONLINE on bo2dbp
- ora.bo2dbp.gsd ONLINE ONLINE on bo2dbp
- ora.bo2dbp.ons ONLINE ONLINE on bo2dbp
- ora.bo2dbp.vip ONLINE ONLINE on bo2dbp
- ora.bo2dbs.ASM2.asm ONLINE ONLINE on bo2dbs
- ora.bo2dbs.LISTENER_BO2DBS.lsnr ONLINE ONLINE on bo2dbs
- ora.bo2dbs.LISTENER_ORA10G_BO2DBS.lsnr ONLINE ONLINE on bo2dbs
- ora.bo2dbs.gsd ONLINE ONLINE on bo2dbs
- ora.bo2dbs.ons ONLINE ONLINE on bo2dbs
- ora.bo2dbs.vip ONLINE ONLINE on bo2dbs
- ora.ora10g.db ONLINE ONLINE on bo2dbp
- #客户端环境
- robin@SZDB:~> cat /etc/issue
- Welcome to SUSE Linux Enterprise Server 10 SP3 (x86_64) - Kernel \r (\l).
- robin@SZDB:~> sqlplus -v
- SQL*Plus: Release 10.2.0.3.0 - Production
- #客户端tnsnames配置
- GOBO4_TAF =
- (DESCRIPTION =
- (ADDRESS = (PROTOCOL = TCP)(HOST = 192.168.7.61)(PORT = 1521))
- (ADDRESS = (PROTOCOL = TCP)(HOST = 192.168.7.62)(PORT = 1521))
- (LOAD_BALANCE = yes)
- (CONNECT_DATA =
- (SERVER = DEDICATED)
- (SERVICE_NAME = GOBO4)
- (FAILOVER_MODE = #FAILOVER_MODE项参数
- (TYPE = session)
- (METHOD = basic)
- (RETRIES = 180
- (DELAY = 5)
- )
- )
- )
- 3、FAILOVER_MODE项分析
- FAILOVER_MODE项是实现TAF的主要配置内容,下面对其进行描述.
- METHOD: 用户定义何时创建到其实例的连接,有BASIC 和 PRECONNECT 两种可选值
- BASIC: 客户端通过地址列表成功建立连接后,即仅当客户端感知到节点故障时才创建到其他实例的连接
- PRECONNECT: 预连接模式,是在最初建立连接时就同时建立到所有实例的连接,当发生故障时,立刻就可以切换到其他链路上
- 上述两种方式各有优劣,前者建立连接的开销相对较小,但failover时会产生延迟,而后者正好与前者相反
- TYPE: 用于定义发生故障时对完成的SQL 语句如何处理,其中有2种类型:session 和select
- select:使用select方式,Oracle net会跟踪事务期间的所有select语句,并跟踪每一个与当前select相关的游标已返回多少行给客户
- 端。此时,假定select查询已返回500行,客户端当前连接的节点出现故障,Oracle Net自动建立连接到幸存的实例上并继续返回
- 剩余的行数给客户端。假定总行数为1500,行,则1000行从剩余节点返回。
- session: 使用session方式,所有select查询相关的结果在重新建立新的连接后将全部丢失,需要重新发布select命令。
- 上述两种方式适用于不同的情形,对于select方式,通常使用与OLAP数据库,而对于session方式则使用与OLTP数据库。因为select
- 方式,Oracle 必须为每个session保存更多的内容,包括游标,用户上下文等,需要更多的资源。
- 其次,两种方式期间所有未提交的DML事务将自动回滚且必须重启启动。alter session语句不会failover。
- 临时对象不会failover也不能被重新启动。
- RETRIES: 表示重试的次数
- DELAY:表示重试的间隔时间
- 4、测试TAF
- #首次建立连接,此时客户端从tnsnames配置的第一个IP建立连接,由于第一个VIP所在的实例已经关闭,故连接到192.168.7.62
- #VIP 192.168.7.62对应的hostname以及instance_name分别为bo2dbs,GOBO4B,所以我们获得如下返回结果
- #其次我们可以看到当前session failover的相关参数
- robin@SZDB:~> sqlplus <a href="mailto:fail_over/fail@gobo4_taf"
- fail_over@GOBO4> get verify.sql
- 1 REM the following query is for TAF connection verification
- 2 col sid format 99999
- 3 col serial# format 9999999
- 4 col failover_type format a13
- 5 col failover_method format a15
- 6 col failed_over format a11
- 7 Prompt
- 8 Prompt Failover status for current user
- 9 Prompt ============================================
- 10 SELECT sid,
- 11 serial#,
- 12 failover_type,
- 13 failover_method,
- 14 failed_over
- 15 FROM v$session
- 16 WHERE username = 'FAIL_OVER';
- 17 REM the following query is for load balancing verification
- 18 col host_name format a20
- 19 Prompt
- 20 Prompt Current instance name and host name
- 21 Prompt ========================================
- 22* SELECT instance_name,host_name FROM v$instance;
- 23
- fail_over@GOBO4> @verify
- Failover status for current user
- ============================================
- SID SERIAL# FAILOVER_TYPE FAILOVER_METHOD FAILED_OVER
- ------ -------- ------------- --------------- -----------
- 1063 1175 SESSION BASIC NO
- Current instance name and host name
- ========================================
- INSTANCE_NAME HOST_NAME
- ---------------- --------------------
- GOBO4B bo2dbs
- #此时启动第一个实例GOBO4A,并停止第二个实例
- oracle@bo2dbp:~> srvctl start instance -d GOBO4 -i GOBO4A
- oracle@bo2dbp:~> srvctl stop instance -d GOBO4 -i GOBO4B
- #查看两个实例的状态
- oracle@bo2dbp:~> ./crs_stat.sh | grep inst
- ora.GOBO4.GOBO4A.inst ONLINE ONLINE on bo2dbp
- ora.GOBO4.GOBO4B.inst OFFLINE OFFLINE
- #在客户端的session再次检查连接状态,即执行查询,结果如下,我们收到了ORA-25408
- fail_over@GOBO4> @verify
- Failover status for current user
- ============================================
- SELECT sid,
- *
- ERROR at line 1:
- ORA-25408: can not safely replay call
- Current instance name and host name
- ========================================
- INSTANCE_NAME HOST_NAME
- ---------------- --------------------
- GOBO4A bo2dbp
- #再次执行查询,此时客户端已经自动实现了重新连接,从查询返回得到的INSTANCE_NAME与HOST_NAME可知。
- #最重要的一个FAILED_OVER值为YES,表明当前的session是一个failover来的session。
- #关于METHOD使用PRECONNECT与TYPE使用SELECT的方式在此不作演示
- fail_over@GOBO4> @verify
- Failover status for current user
- ============================================
- SID SERIAL# FAILOVER_TYPE FAILOVER_METHOD FAILED_OVER
- ------ -------- ------------- --------------- -----------
- 1073 29 SESSION BASIC YES
- Current instance name and host name
- ========================================
- INSTANCE_NAME HOST_NAME
- ---------------- --------------------
- GOBO4A bo2dbp
- #Author : Robinson
- #Blog : http://blog.csdn.net/robinson_0612
- 5、小结:
- a、客户端TAF方式实现了Oracle客户端到服务器透明故障转移
- b、主要在客户端tnsnames.ora配置FAILOVER_MODE来实现基于客户端的TAF
- c、FAILOVER_MODE中基于连接方式(METHOD)可以分为BASIC与PRECONNECT两种方式,后者开销更大,延迟小,与前者相反
- d、FAILOVER_MODE中TYPE可以分为select与session两种方式,两者所有未提交的事务全部回滚,select方式会failover查询,
- session方式不会。select方式多用在OLAP类型数据库,而session多用在OLTP类型数据库
- e、一旦所在的实例发生故障,会自动failover,无需手动重新连接,这就是与连接时故障转移所不同的。
出处:http://www.cnblogs.com/lhrbest