代码改变世界

Oracle 12C CRS-5013

2019-02-13 17:04  WWJD_DBA  阅读(1195)  评论(0编辑  收藏  举报

1.背景

OS:SUSE 12SP3
DB:12.2.0.1.190115 2节点RAC
Q:crs alert日志一直刷如下报错
2019-02-12 12:46:18.163 [ORAAGENT(91622)]CRS-5013: Agent "ORAAGENT" failed to start process "/oracle/app/12.2.0/grid/bin/lsnrctl" for action "check": details at "(:CLSN00008:)" in "/oracle/app/grid/diag/crs/ssng3mcs-db2/crs/trace/crsd_oraagent_grid.trc"
2019-02-12 12:46:21.161 [ORAAGENT(91622)]CRS-5013: Agent "ORAAGENT" failed to start process "/oracle/app/12.2.0/grid/bin/lsnrctl" for action "check": details at "(:CLSN00008:)" in "/oracle/app/grid/diag/crs/ssng3mcs-db2/crs/trace/crsd_oraagent_grid.trc"
2019-02-12 12:46:24.168 [ORAAGENT(91622)]CRS-5013: Agent "ORAAGENT" failed to start process "/oracle/app/12.2.0/grid/bin/lsnrctl" for action "check": details at "(:CLSN00008:)" in "/oracle/app/grid/diag/crs/ssng3mcs-db2/crs/trace/crsd_oraagent_grid.trc"
2019-02-12 12:46:27.167 [ORAAGENT(91622)]CRS-5013: Agent "ORAAGENT" failed to start process "/oracle/app/12.2.0/grid/bin/lsnrctl" for action "check": details at "(:CLSN00008:)" in "/oracle/app/grid/diag/crs/ssng3mcs-db2/crs/trace/crsd_oraagent_grid.trc"
2019-02-12 12:46:39.163 [ORAAGENT(91622)]CRS-5013: Agent "ORAAGENT" failed to start process "/oracle/app/12.2.0/grid/bin/lsnrctl" for action "check": details at "(:CLSN00008:)" in "/oracle/app/grid/diag/crs/ssng3mcs-db2/crs/trace/crsd_oraagent_grid.trc"
2019-02-12 12:46:42.161 [ORAAGENT(91622)]CRS-5013: Agent "ORAAGENT" failed to start process "/oracle/app/12.2.0/grid/bin/lsnrctl" for action "check": details at "(:CLSN00008:)" in "/oracle/app/grid/diag/crs/ssng3mcs-db2/crs/trace/crsd_oraagent_grid.trc"
2019-02-12 12:46:48.167 [ORAAGENT(91622)]CRS-5013: Agent "ORAAGENT" failed to start process "/oracle/app/12.2.0/grid/bin/lsnrctl" for action "check": details at "(:CLSN00008:)" in "/oracle/app/grid/diag/crs/ssng3mcs-db2/crs/trace/crsd_oraagent_grid.trc"
2019-02-12 12:46:51.165 [ORAAGENT(91622)]CRS-5013: Agent "ORAAGENT" failed to start process "/oracle/app/12.2.0/grid/bin/lsnrctl" for action "check": details at "(:CLSN00008:)" in "/oracle/app/grid/diag/crs/ssng3mcs-db2/crs/trace/crsd_oraagent_grid.trc"
2019-02-12 12:46:51.724 [ORAAGENT(91622)]CRS-5013: Agent "ORAAGENT" failed to start process "/oracle/app/12.2.0/grid/bin/lsnrctl" for action "check": details at "(:CLSN00008:)" in "/oracle/app/grid/diag/crs/ssng3mcs-db2/crs/trace/crsd_oraagent_grid.trc"
2019-02-12 12:46:51.789 [ORAAGENT(91622)]CRS-5013: Agent "ORAAGENT" failed to start process "/oracle/app/12.2.0/grid/bin/lsnrctl" for action "check": details at "(:CLSN00008:)" in "/oracle/app/grid/diag/crs/ssng3mcs-db2/crs/trace/crsd_oraagent_grid.trc"

2019-02-12 12:46:54.166 [ORAAGENT(91622)]CRS-5013: Agent "ORAAGENT" failed to start process "/oracle/app/12.2.0/grid/bin/lsnrctl" for action "check": details at "(:CLSN00008:)" in "/oracle/app/grid/diag/crs/ssng3mcs-db2/crs/trace/crsd_oraagent_grid.trc"
2019-02-12 12:46:57.167 [ORAAGENT(91622)]CRS-5013: Agent "ORAAGENT" failed to start process "/oracle/app/12.2.0/grid/bin/lsnrctl" for action "check": details at "(:CLSN00008:)" in "/oracle/app/grid/diag/crs/ssng3mcs-db2/crs/trace/crsd_oraagent_grid.trc"
2019-02-12 12:47:00.158 [ORAAGENT(91622)]CRS-5013: Agent "ORAAGENT" failed to start process "/oracle/app/12.2.0/grid/bin/lsnrctl" for action "check": details at "(:CLSN00008:)" in "/oracle/app/grid/diag/crs/ssng3mcs-db2/crs/trace/crsd_oraagent_grid.trc"
2019-02-12 12:47:06.160 [ORAAGENT(91622)]CRS-5013: Agent "ORAAGENT" failed to start process "/oracle/app/12.2.0/grid/bin/lsnrctl" for action "check": details at "(:CLSN00008:)" in "/oracle/app/grid/diag/crs/ssng3mcs-db2/crs/trace/crsd_oraagent_grid.trc"
2019-02-12 12:47:09.167 [ORAAGENT(91622)]CRS-5013: Agent "ORAAGENT" failed to start process "/oracle/app/12.2.0/grid/bin/lsnrctl" for action "check": details at "(:CLSN00008:)" in "/oracle/app/grid/diag/crs/ssng3mcs-db2/crs/trace/crsd_oraagent_grid.trc"
2019-02-12 12:47:12.166 [ORAAGENT(91622)]CRS-5013: Agent "ORAAGENT" failed to start process "/oracle/app/12.2.0/grid/bin/lsnrctl" for action "check": details at "(:CLSN00008:)" in "/oracle/app/grid/diag/crs/ssng3mcs-db2/crs/trace/crsd_oraagent_grid.trc"
2019-02-12 12:47:13.588 [ORAAGENT(91622)]CRS-5016: Process "/oracle/app/12.2.0/grid/opmn/bin/onsctli" spawned by agent "ORAAGENT" for action "check" failed: details at "(:CLSN00010:)" in "/oracle/app/grid/diag/crs/ssng3mcs-db2/crs/trace/crsd_oraagent_grid.trc"
2019-02-12 12:47:18.743 [ORAAGENT(91622)]CRS-5016: Process "/oracle/app/12.2.0/grid/opmn/bin/onsctli" spawned by agent "ORAAGENT" for action "start" failed: details at "(:CLSN00010:)" in "/oracle/app/grid/diag/crs/ssng3mcs-db2/crs/trace/crsd_oraagent_grid.trc"
2019-02-12 12:47:23.852 [ORAAGENT(91622)]CRS-5016: Process "/oracle/app/12.2.0/grid/opmn/bin/onsctli" spawned by agent "ORAAGENT" for action "check" failed: details at "(:CLSN00010:)" in "/oracle/app/grid/diag/crs/ssng3mcs-db2/crs/trace/crsd_oraagent_grid.trc"

2.检查DB、ASM alert日志

DB alert

Errors in file /oracle/app/oracle/diag/rdbms/mcsdb/MCSDB2/trace/MCSDB2_psp0_93951.trc:
ORA-27300: OS system dependent operation:fork failed with status: 11
ORA-27301: OS failure message: Resource temporarily unavailable
ORA-27302: failure occurred at: skgpspawn5

ASM alert

2019-02-12T16:06:08.891616+08:00
Process startup failed, error stack:
2019-02-12T16:06:08.891871+08:00
Errors in file /oracle/app/grid/diag/asm/+asm/+ASM2/trace/+ASM2_psp0_92477.trc:
ORA-27300: OS system dependent operation:fork failed with status: 11
ORA-27301: OS failure message: Resource temporarily unavailable
ORA-27302: failure occurred at: skgpspawn3
2019-02-12T16:06:09.889920+08:00
Process m000 died, see its trace file

3.搜MOS

  • MOS
1.Database And ASM Instance Ora-27300 OS System Dependent Operation Fork Failed With Status 11 (文档 ID 2331884.1)
2.SLES 12: Database Startup Error with ORA-27300 ORA-27301 ORA-27303 While Starting using Srvctl (文档 ID 2340986.1)
  • 参考链接
http://feed.askmaclean.com/archives/suse-12-redhat-7%E4%B8%AD%E7%9A%84ora-27300-os-system-dependent-operationfork-failed-with-status-11%E7%9A%84%E6%95%85%E9%9A%9C%E5%A4%84%E7%90%86.html
  • 根据上面的参考,检查本环境
cat /etc/security/limits.conf
ps h -Led -o user | sort | uniq -c | sort -n
cat /etc/systemd/system.conf|grep DefaultTasksMax
#DefaultTasksMax=512

# systemctl status ohasd
● ohasd.service - LSB: Start and Stop Oracle High Availability Service
   Loaded: loaded (/etc/init.d/ohasd; bad; vendor preset: disabled)
   Active: active (exited) since Tue 2019-02-12 12:42:37 CST; 4h 3min ago
     Docs: man:systemd-sysv-generator(8)
  Process: 78777 ExecStart=/etc/init.d/ohasd start (code=exited, status=0/SUCCESS)
    Tasks: 512 (limit: 512)    ========>限制是512

Feb 12 12:42:32 SSNG3MCS-DB2 systemd[1]: Starting LSB: Start and Stop Oracle High Availability Service...
Feb 12 12:42:32 SSNG3MCS-DB2 ohasd[78777]: Starting ohasd:
Feb 12 12:42:37 SSNG3MCS-DB2 ohasd[78777]: CRS-4123: Oracle High Availability Services has been started.
Feb 12 12:42:37 SSNG3MCS-DB2 systemd[1]: Started LSB: Start and Stop Oracle High Availability Service.

4.解决方案

1.vi /etc/systemd/system.conf
Set DefaultTasksMax to 'infinity'
2.Restart OS

5.问题原因

  在Linux 7或Suse 12上,使用了systemd新的启动方式,这个在Linux 6和SUSE 11上是没有的,当这个启动之后,就会忽略掉/etc/security/limits.conf下的设置。
  而该文件的一个参数,DefaultTasksMax设置为默认值(512),限制了可在节点上创建的最大任务数。此设置还影响OS上的maxpid值。

6. SUSE 12sp3修改方法及参数检查

#################添加limits参数###################
在/etc/security/limits.conf文件中添加参数:
###################For HW  soft #################
*               soft     nofile         1200000
*               hard     nofile         1220000
*               soft     memlock        32
*               soft     core           10485760
*               soft     data           -1
*               soft     nproc          148270
*               soft     stack          -1
*               soft     as             -1
*               soft     rss            -1
*               hard    nproc       1200000

############添加内核参数##################
在/etc/sysctl.conf中添加参数:
fs.aio-max-nr = 1048576
fs.file-max = 6815744
fs.nr_open = 2000000
fs.inotify.max_user_watches = 2000000
kernel.acct = 100 100 30
kernel.msgmax = 1048576
kernel.msgmnb = 8388608
kernel.msgmni = 256
kernel.sem = 1250 320000 100 256
##############512G#####################
kernel.shmmax = 433517314048
kernel.shmmni = 4096
##############512G#####################
kernel.shmall = 549755813888
kernel.suid_dumpable = 1
kernel.sysrq = 8
kernel.core_pattern = /corefiles/core.%p.%e
vm.min_free_kbytes = 3000000
vm.swappiness = 10
net.core.rmem_default = 8388608
net.core.rmem_max = 16777216
net.core.wmem_default = 8388608
net.core.wmem_max = 41943040
net.ipv4.ip_local_port_range = 50000 65000
net.ipv4.tcp_fin_timeout = 5
net.ipv4.tcp_keepalive_time = 1200
net.ipv4.tcp_rmem = 8388608 8388608 33554432
net.ipv4.tcp_sack = 1
net.ipv4.tcp_timestamps = 1
net.ipv4.tcp_tw_recycle = 1
net.ipv4.tcp_window_scaling = 1
net.ipv4.tcp_wmem = 524288 524288 33554432
net.ipv4.tcp_tw_reuse = 1
net.ipv4.tcp_retries2 = 5
net.ipv4.tcp_syn_retries = 1
net.ipv4.icmp_echo_ignore_broadcasts = 1
net.ipv4.conf.all.rp_filter = 1
net.ipv4.conf.default.promote_secondaries = 1
net.ipv4.conf.all.promote_secondaries = 1
net.ipv4.conf.all.arp_ignore = 1
net.ipv4.conf.default.arp_ignore = 1
net.ipv4.conf.all.arp_announce = 1
net.ipv4.conf.default.arp_announce = 1
net.ipv4.tcp_max_syn_backlog = 8192
net.ipv4.tcp_syncookies = 1
net.ipv4.tcp_synack_retries = 5
vm.overcommit_memory = 1
vm.drop_caches = 1
vm.zone_reclaim_mode = 0
vm.max_map_count = 655360
vm.dirty_background_ratio = 60
vm.dirty_ratio = 60
vm.page-cluster = 3
vm.dirty_writeback_centisecs = 360000
net.ipv4.conf.all.accept_source_route = 0
net.ipv4.conf.all.accept_redirects = 0
net.ipv4.icmp_ignore_bogus_error_responses = 1
kernel.sched_child_runs_first = 1
kernel.sched_latency_ns = 40000000
kernel.sched_nr_migrate = 64
net.ipv4.tcp_moderate_rcvbuf = 1
kernel.sched_compat_yield = 1
net.ipv4.tcp_max_tw_buckets = 5000
kernel.sched_migration_cost = 250000
kernel.sched_min_granularity_ns = 8000000
kernel.sched_wakeup_granularity_ns = 2500000
kernel.sched_rt_period_us = 2000000
kernel.pid_max = 1310720

/sbin/sysctl -p

#####################修改system.conf参数#####################
在/etc/systemd/system.conf中将#DefaultTasksMax=512修改为DefaultTasksMax=infinity

...重启OS...

# systemctl status ohasd
● ohasd.service - LSB: Start and Stop Oracle High Availability Service
   Loaded: loaded (/etc/init.d/ohasd; bad; vendor preset: disabled)
   Active: active (exited) since Thu 2019-02-21 14:27:32 CST; 2h 9min ago
     Docs: man:systemd-sysv-generator(8)
  Process: 86737 ExecStart=/etc/init.d/ohasd start (code=exited, status=0/SUCCESS)
    Tasks: 959

Feb 21 14:27:26 SSNG3MCS-DB1 systemd[1]: Starting LSB: Start and Stop Oracle High Availability Service...
Feb 21 14:27:26 SSNG3MCS-DB1 ohasd[86737]: Starting ohasd:
Feb 21 14:27:31 SSNG3MCS-DB1 ohasd[86737]: CRS-4123: Oracle High Availability Services has been started.
Feb 21 14:27:32 SSNG3MCS-DB1 systemd[1]: Started LSB: Start and Stop Oracle High Availability Service.

# cat /etc/systemd/system.conf|grep DefaultTasksMax
#DefaultTasksMax=512
DefaultTasksMax=infinity