Oracle数据库导致系统CPU使用率100%的案例

Oracle数据库导致系统CPU使用率100%的案例

▲点击上方名片“蜘蛛杂谈”关注

 

前言这个案例只是模拟案例。虽然作为一名系统工程师,稍有些数据库基础,但是碰到和系统相关的问题时,我们还是需要从系统层去做分析处理,希望这个案例能给大家带来些许处理思路。

 

故障现象

Oracle 19c RAC 2节点CPU使用率100%。(使用top、sar等命令)

故障分析过程1、查询负载很高前35个进程
# ps aux|head -1;ps aux|grep -v PID|sort -rn -k +3|head -35USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMANDoracle   21974 13.7  0.0 40453412 33848 ?      Rs   10:56   1:27 oracleorcl1 (LOCAL=NO)root      3204 13.6  0.0 114296  6480 ?        Sl   11:06   0:00 /opt/smartstorageadmin/ssacli/bin/ssacli ctrl all show configoracle   22790 13.6  0.0 40715556 33888 ?      Rs   10:56   1:25 oracleorcl1 (LOCAL=NO)oracle   22317 13.6  0.0 40715556 32004 ?      Rs   10:56   1:26 oracleorcl1 (LOCAL=NO)oracle   22260 13.6  0.0 40453408 33912 ?      Rs   10:56   1:26 oracleorcl1 (LOCAL=NO)oracle   22215 13.6  0.0 40453408 33892 ?      Rs   10:56   1:26 oracleorcl1 (LOCAL=NO)......oracle   22219 13.4  0.0 40453412 33904 ?      Rs   10:56   1:25 oracleorcl1 (LOCAL=NO)oracle   22156 13.4  0.0 40453872 34464 ?      Rs   10:56   1:26 oracleorcl1 (LOCAL=NO)oracle   22145 13.4  0.0 40453412 33848 ?      Rs   10:56   1:25 oracleorcl1 (LOCAL=NO)oracle   22124 13.4  0.0 40453412 31724 ?      Rs   10:56   1:26 oracleorcl1 (LOCAL=NO)oracle   22109 13.4  0.0 40453408 33836 ?      Rs   10:56   1:26 oracleorcl1 (LOCAL=NO)
2、查询前35负载高的sPID
# ps aux| grep -v PID | sort -rn -k +3 | head -35 | awk '{print$2}'230332294823015229412309123061......235012344023429234042326623161
3、把得到的spid替换到下面sql语句中
select sql_id from v$sessionwhere paddr in(select addr from v$processwhere spid in('23033','22948','23015','22941',......'23429','23404','23266','23161')    );

SQL_ID--------------------------f5kskn9df2h2pf5kskn9df2h2pf5kskn9df2h2pf5kskn9df2h2pf5kskn9df2h2pf5kskn9df2h2p......f5kskn9df2h2pf5kskn9df2h2pf5kskn9df2h2pf5kskn9df2h2pf5kskn9df2h2pf5kskn9df2h2p
35 rows selected.
4、查询“f5kskn9df2h2p”相关的sql_id和数量
> SELECT sid,serial#,sql_id from v$session where sql_id in ( 'f5kskn9df2h2p');SID    SERIAL# SQL_ID---------- ---------- --------------------------4609      49390 f5kskn9df2h2p4722       1377 f5kskn9df2h2p4723       7168 f5kskn9df2h2p4724      51376 f5kskn9df2h2p4725      62475 f5kskn9df2h2p4726      64323 f5kskn9df2h2p4727      19318 f5kskn9df2h2p4728      23453 f5kskn9df2h2p4729      34066 f5kskn9df2h2p
300 rows selected.

> SELECT count(*) from v$session where sql_id in ( 'f5kskn9df2h2p' );      COUNT(*)----------       300
5、随机抽一条,查看一下具体是执行的什么内容
> select sql_text from v$sqlarea a,v$session b where a.SQL_ID=b.PREV_SQL_ID and b.SID='4729';    SQL_TEXT
6、思路1:系统层查看相关进程启动时间,发现在同一时间有sh脚本在执行,脚本内容如下:(感觉不准确,可能会误判)
/home/oracle/test$ cat /home/oracle/test/connOracleDB.sh*******
杀掉所有connOracleDB.sh进程:
# pkill -9 connOracleDB.sh
7、思路2:通过SQL语句查询sql_id对应的系统进程,再查其对应的父进程
> col MACHINE for a30> select process,MACHINE,PORT,PROGRAM,PADDR from v$session where sql_id = 'f5kskn9df2h2p';PROCESS                                          MACHINE                              PORT PROGRAM                             PADDR------------------------------------------------ ------------------------------ ---------- ----------------------------------- ----------------3480                                             vone1                               26564 sqlplus@vone1 (TNS V1-V3)           000000013870A3183155                                             vone1                               26280 sqlplus@vone1 (TNS V1-V3)           000000016076BF183118                                             vone1                               25964 sqlplus@vone1 (TNS V1-V3)           00000001507198883393                                             vone1                               26572 sqlplus@vone1 (TNS V1-V3)           00000001686EB5103244                                             vone1                               26268 sqlplus@vone1 (TNS V1-V3)           00000001587024A83170                                             vone1                               25970 sqlplus@vone1 (TNS V1-V3)           00000001487592A03457                                             vone1                               26580 sqlplus@vone1 (TNS V1-V3)           0000000160773F583338                                             vone1                               26286 sqlplus@vone1 (TNS V1-V3)           00000001507218C83218                                             vone1                               25976 sqlplus@vone1 (TNS V1-V3)           000000014081F7883410                                             vone1                               26592 sqlplus@vone1 (TNS V1-V3)           000000015870A4E83139                                             vone1                               26292 sqlplus@vone1 (TNS V1-V3)           00000001487612E03148                                             vone1                               25978 sqlplus@vone1 (TNS V1-V3)           00000001386FB7F8......
/home/oracle/test$ ps -ef| grep 3480oracle    3480  3399  0 14:10 pts/3    00:00:00 /opt/oracle/products/19.3.0/bin/sqlplusoracle   23509 21062  0 14:38 pts/4    00:00:00 grep --color=auto 3480/home/oracle/test$ ps -ef|grep 3399oracle    3399     1  0 14:10 pts/3    00:00:00 /bin/bash /home/oracle/test/connOracleDB.shoracle    3480  3399  0 14:10 pts/3    00:00:00 /opt/oracle/products/19.3.0/bin/sqlplusoracle   23686 21062  0 14:38 pts/4    00:00:00 grep --color=auto 3399
再查看connOracleDB.sh脚本,分析脚本内容是否会引起异常,杀掉该脚本。
/home/oracle/test$ cat /home/oracle/test/connOracleDB.sh******# pkill -9 connOracleDB.sh
8、在sql下执行生成kill的语句 
SELECT 'alter system kill session''' || SID || ',' || serial# || ''';' FROM v$session WHERE sql_id IN ('f5kskn9df2h2p');'ALTERSYSTEMKILLSESSION'''||SID||','||SERIAL#||''';'------------------------------------------------------alter system kill session'3,19584' immediate;alter system kill session'4,26016';alter system kill session'5,543';alter system kill session'6,22344';......alter system kill session'4727,35784';alter system kill session'4728,57499';alter system kill session'4729,45135';
300 rows selected.

后语
问题解决,激起了些许深入学习数据库的兴趣,但是想到那些又长又复杂的sql语句又开始头疼了。。。

★ 欢迎大家阅读、关注、点赞、收藏、转发 ★★ 关注我,关注东方蜘蛛,交流更多硬件系统运维干货 ★★“蜘蛛杂谈”诚邀广大IT爱好者投稿,投稿邮箱:45778530@qq.com,或者添加作者微信:eastspider0214

 

                                                                                       点击“在看”支持一下吧图片

CU东方蜘蛛
收录于合集 #Oracle
 2
上一篇Oracle数据库常用运维SQL语句
 
阅读 1020
蜘蛛杂谈
 
 
关注后可发消息
 
 
 
posted @ 2023-09-08 14:05  往事已成昨天  阅读(1116)  评论(0编辑  收藏  举报