12C RAC中的一个数据库实例自动crash并报ORA-27157、ORA-27300等错误
2016-09-10 11:03 abce 阅读(2195) 评论(0) 编辑 收藏 举报rhel7.2上安装12C RAC数据库后,其中一个数据库实例经常会自动crash。查看alert日志发现以下错误信息:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | Errors in file /d12/app/oracle/diag/rdbms/rac12c/rac12c2/trace/rac12c2_j000_21047 .trc: ORA-27157: OS post /wait facility removed ORA-27300: OS system dependent operation:semop failed with status: 43 ORA-27301: OS failure message: Identifier removed ORA-27302: failure occurred at: sskgpwwait1 Fri Sep 09 16:50:53 2016 Errors in file /d12/app/oracle/diag/rdbms/rac12c/rac12c2/trace/rac12c2_rmv0_20798 .trc: ORA-27157: OS post /wait facility removed Fri Sep 09 16:50:53 2016 Errors in file /d12/app/oracle/diag/rdbms/rac12c/rac12c2/trace/rac12c2_q005_21328 .trc: ORA-27157: OS post /wait facility removed ORA-27300: OS system dependent operation:semop failed with status: 43 ORA-27301: OS failure message: Identifier removed ORA-27302: failure occurred at: sskgpwwait1 |
错误原因描述:
在rhel7.2中,systemd-logind服务引入了一个新特性:在一个user完全退出OS后会remove掉所有的IPC对象。
该特性由/etc/systemd/logind.conf参数文件中RemoveIPC选项来控制。详细请看man logind.conf(5)。
在rhel7.2中,RemoveIPC的默认值是yes
因此,当最后一个oracle或者grid用户退出时,操作系统会remove掉这个user的shared memory segments和semaphores
而Oracle ASM和database的SGA需要使用 shared memory segments,因此remove shared memory segments将会crash掉Oracle ASM和database instances。
请参考Redhat bug 1264533 - https://bugzilla.redhat.com/show_bug.cgi?id=1264533
这个问题会影响使用shared memory segments和semaphores的所有应用,因此,Oracle ASM 实例和Oracle Database 实例均受到影响。
oel7.2为了避免这个问题,在/etc/systemd/logind.conf配置文件中明确设置RemoveIPC为no。
该问题会导致的现象:
1 2 3 | 1) Installing 11.2 and 12c GI /CRS fails, because ASM crashes towards the end of the installation. 2) Upgrading to 11.2 and 12c GI /CRS fails. 3) After Redhat Linux is upgraded to 7.2, 11.2 and 12c ASM and database instances crash. |
systemd-logind可能会在任何时候remove IPC对象,发生错误的时候对应的日志现象也不同。比如:
1 2 3 4 5 | Most common error that occurs is that the following is found in the asm or database alert.log: ORA-27157: OS post /wait facility removed ORA-27300: OS system dependent operation:semop failed with status: 43 ORA-27301: OS failure message: Identifier removed ORA-27302: failure occurred at: sskgpwwait1 |
1 2 3 | The second observed error occurs during installation and upgrade when asmca fails with the following error: KFOD-00313: No ASM instances available. CSS group services were successfully initilized by kgxgncin KFOD-00105: Could not open pfile 'init@.ora' |
1 2 3 4 5 6 7 8 9 | The third observed error occurred during installation and upgrade: Creation of ASM password file failed. Following error occurred: Error in Process: /d12/app/12 .1.0 /grid/bin/orapwd Enter password for SYS: OPW-00009: Could not establish connection to Automatic Storage Management instance 2015 /11/20 21:38:45 CLSRSC-184: Configuration of ASM failed 2015 /11/20 21:38:46 CLSRSC-258: Failed to configure and start ASM |
1 2 3 | The fourth observed error is the following message is found in the /var/log/messages file around the time that asm or database instance crashed: Nov 20 21:38:43 testc201 kernel: traps: oracle[24861] trap divide error ip:3896db8 sp:7ffef1de3c40 error:0 in oracle[400000+ef57000] |
修改方法:
1).设置/etc/systemd/logind.conf中RemoveIPC=no
2).重启服务器或者重启systemd-logind
重启systemd-logind:
1 2 | # systemctl daemon-reload # systemctl restart systemd-logind |
MOS Doc:
ALERT: Setting RemoveIPC=yes on Redhat 7.2 Crashes ASM and Database Instances as Well as Any Application That Uses a Shared Memory Segment (SHM) or Semaphores (SEM) (Doc ID 2081410.1)
【推荐】国内首个AI IDE,深度理解中文开发场景,立即下载体验Trae
【推荐】编程新体验,更懂你的AI,立即体验豆包MarsCode编程助手
【推荐】抖音旗下AI助手豆包,你的智能百科全书,全免费不限次数
【推荐】轻量又高性能的 SSH 工具 IShell:AI 加持,快人一步
· AI与.NET技术实操系列(二):开始使用ML.NET
· 记一次.NET内存居高不下排查解决与启示
· 探究高空视频全景AR技术的实现原理
· 理解Rust引用及其生命周期标识(上)
· 浏览器原生「磁吸」效果!Anchor Positioning 锚点定位神器解析
· DeepSeek 开源周回顾「GitHub 热点速览」
· 物流快递公司核心技术能力-地址解析分单基础技术分享
· .NET 10首个预览版发布:重大改进与新特性概览!
· AI与.NET技术实操系列(二):开始使用ML.NET
· .NET10 - 预览版1新功能体验(一)