Oracle rac环境系统需要调整的参数

概述

一套oracle 11.2.0.4 2Nodes RAC on RHEL 7的环境,数据库第二个节点被驱逐出集群,并且多次自动重启以失败告终,驱逐原因在GI Alert log显示是私网通信丢失,ASM db alert显示IPC Send timeout.  当时ping和tracert并没发现什么异常,从OSW中的netstat -s查看IP packet reassembles failed时间段值大量增长。

Node1 GI ALERT LOG

2017-03-02 11:46:26.607: 
[/u01/11.2.0/grid/bin/oraagent.bin(121496)]CRS-5011:Check of resource "testdb" failed: details at "(:CLSN00007:)" in "/u01/11.2.0/grid/log/anbob/agent/crsd/oraagent_oracle//oraagent_oracle.log"
2017-03-02 11:46:26.612: 
[crsd(175117)]CRS-2765:Resource 'ora.testdb.db' has failed on server 'anbob'.
2017-03-02 11:46:42.866: 
[cssd(172139)]CRS-1612:Network communication with node db2 (2) missing for 50% of timeout interval.  Removal of this node from cluster in 14.620 seconds
2017-03-02 11:46:50.869: 
[cssd(172139)]CRS-1611:Network communication with node db2 (2) missing for 75% of timeout interval.  Removal of this node from cluster in 6.620 seconds
2017-03-02 11:46:54.870: 
[cssd(172139)]CRS-1610:Network communication with node db2 (2) missing for 90% of timeout interval.  Removal of this node from cluster in 2.620 seconds
2017-03-02 11:46:57.493: 
[cssd(172139)]CRS-1607:Node db2 is being evicted in cluster incarnation 351512591; details at (:CSSNM00007:) in /u01/11.2.0/grid/log/anbob/cssd/ocssd.log.
2017-03-02 11:46:58.626: 
[cssd(172139)]CRS-1662:Member kill requested by node db2 for member number 0, group DBHBCRM

Node2 GI ALERT LOG

2017-03-02 11:46:45.378: 
[cssd(177450)]CRS-1663:Member kill issued by PID 84816 for 1 members, group DBCRM. Details at (:CSSGM00044:) in /u01/11.2.0/grid/log/db2/cssd/ocssd.log.
2017-03-02 11:46:58.982: 
[cssd(177450)]CRS-1608:This node was evicted by node 1, anbob; details at (:CSSNM00005:) in /u01/11.2.0/grid/log/db2/cssd/ocssd.log.
2017-03-02 11:46:58.983: 
[cssd(177450)]CRS-1656:The CSS daemon is terminating due to a fatal error; Details at (:CSSSC00012:) in /u01/11.2.0/grid/log/db2/cssd/ocssd.log 

 

解决方案

针对oracle的RAC环境,linux内核需要特别优化的参数如下:

#set new
net.ipv4.conf.eno3.rp_filter = 2
net.ipv4.conf.bond0.rp_filter = 2

#即rp_filter参数有三个值,0、1、2,具体含义:

0:不开启源地址校验。
1:开启严格的反向路径校验。对每个进来的数据包,校验其反向路径是否是最佳路径。如果反向路径不是最佳路径,则直接丢弃该数据包。
2:开启松散的反向路径校验。对每个进来的数据包,校验其源地址是否可达,即反向路径是否能通(通过任意网口),如果反向路径不同,则直接丢弃该数据包。


net.ipv4.ipfrag_high_thresh = 67108864   #分片占用内存的高阈值,默认值4194304
net.ipv4.ipfrag_low_thresh = 66060288   #分片占用内存的低阈值,默认值3145728
net.ipv4.ipfrag_time = 120                        #分片超时时间,默认值30
net.ipv4.ipfrag_max_dist = 1024              #分片有效的最长间隔距离,默认值64
net.core.netdev_budget = 600    # 表示一次软中断所能接收的最大报文数,默认值为300
net.core.netdev_max_backlog = 2000  #每个网络接口接收数据包的速率比内核处理这些包的速率快时,允许送到队列的数据包的最大数目

posted @ 2023-02-05 08:31  雪竹子  阅读(208)  评论(0编辑  收藏  举报