Oracle数据库导致系统CPU使用率100%的案例

Oracle数据库导致系统CPU使用率100%的案例

前言

这个案例只是模拟案例。虽然作为一名系统工程师,稍有些数据库基础,但是碰到和系统相关的问题时,我们还是需要从系统层去做分析处理,希望这个案例能给大家带来些许处理思路。

故障现象

Oracle 19c RAC 2节点CPU使用率100%。(使用top、sar等命令)

故障分析过程

1、查询负载很高前35个进程

# ps aux|head -1;ps aux|grep -v PID|sort -rn -k +3|head -35
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
oracle   21974 13.7  0.0 40453412 33848 ?      Rs   10:56   1:27 oracleorcl1 (LOCAL=NO)
root      3204 13.6  0.0 114296  6480 ?        Sl   11:06   0:00 /opt/smartstorageadmin/ssacli/bin/ssacli ctrl all show config
oracle   22790 13.6  0.0 40715556 33888 ?      Rs   10:56   1:25 oracleorcl1 (LOCAL=NO)
oracle   22317 13.6  0.0 40715556 32004 ?      Rs   10:56   1:26 oracleorcl1 (LOCAL=NO)
oracle   22260 13.6  0.0 40453408 33912 ?      Rs   10:56   1:26 oracleorcl1 (LOCAL=NO)
oracle   22215 13.6  0.0 40453408 33892 ?      Rs   10:56   1:26 oracleorcl1 (LOCAL=NO)
......
oracle   22219 13.4  0.0 40453412 33904 ?      Rs   10:56   1:25 oracleorcl1 (LOCAL=NO)
oracle   22156 13.4  0.0 40453872 34464 ?      Rs   10:56   1:26 oracleorcl1 (LOCAL=NO)
oracle   22145 13.4  0.0 40453412 33848 ?      Rs   10:56   1:25 oracleorcl1 (LOCAL=NO)
oracle   22124 13.4  0.0 40453412 31724 ?      Rs   10:56   1:26 oracleorcl1 (LOCAL=NO)
oracle   22109 13.4  0.0 40453408 33836 ?      Rs   10:56   1:26 oracleorcl1 (LOCAL=NO)

2、查询前35负载高的sPID

# ps aux| grep -v PID | sort -rn -k +3 | head -35 | awk '{print$2}'
23033
22948
23015
22941
23091
23061
......
23501
23440
23429
23404
23266
23161

3、把得到的spid替换到下面sql语句中

select sql_id from v$session
where paddr in(
select addr from v$process
where spid in(
'23033',
'22948',
'23015',
'22941',
......
'23429',
'23404',
'23266',
'23161'
)    
);


SQL_ID
--------------------------
f5kskn9df2h2p
f5kskn9df2h2p
f5kskn9df2h2p
f5kskn9df2h2p
f5kskn9df2h2p
f5kskn9df2h2p
......
f5kskn9df2h2p
f5kskn9df2h2p
f5kskn9df2h2p
f5kskn9df2h2p
f5kskn9df2h2p
f5kskn9df2h2p

35 rows selected.

4、查询“f5kskn9df2h2p”相关的sql_id和数量

> SELECT sid,serial#,sql_id from v$session where sql_id in ( 'f5kskn9df2h2p');
SID    SERIAL# SQL_ID
---------- ---------- --------------------------
4609      49390 f5kskn9df2h2p
4722       1377 f5kskn9df2h2p
4723       7168 f5kskn9df2h2p
4724      51376 f5kskn9df2h2p
4725      62475 f5kskn9df2h2p
4726      64323 f5kskn9df2h2p
4727      19318 f5kskn9df2h2p
4728      23453 f5kskn9df2h2p
4729      34066 f5kskn9df2h2p

300 rows selected.
> SELECT count(*) from v$session where sql_id in ( 'f5kskn9df2h2p' );    
  COUNT(*)
----------
       300

5、随机抽一条,查看一下具体是执行的什么内容

> select sql_text from v$sqlarea a,v$session b where a.SQL_ID=b.PREV_SQL_ID and b.SID='4729';    
SQL_TEXT

6、思路1:系统层查看相关进程启动时间,发现在同一时间有sh脚本在执行,脚本内容如下:(感觉不准确,可能会误判)

/home/oracle/test$ cat /home/oracle/test/connOracleDB.sh
*******

杀掉所有connOracleDB.sh进程:

# pkill -9 connOracleDB.sh

7、思路2:通过SQL语句查询sql_id对应的系统进程,再查其对应的父进程

> col MACHINE for a30
> select process,MACHINE,PORT,PROGRAM,PADDR from v$session where sql_id = 'f5kskn9df2h2p';
PROCESS                                          MACHINE                              PORT PROGRAM                             PADDR
------------------------------------------------ ------------------------------ ---------- ----------------------------------- ----------------
3480                                             vone1                               26564 sqlplus@vone1 (TNS V1-V3)           000000013870A318
3155                                             vone1                               26280 sqlplus@vone1 (TNS V1-V3)           000000016076BF18
3118                                             vone1                               25964 sqlplus@vone1 (TNS V1-V3)           0000000150719888
3393                                             vone1                               26572 sqlplus@vone1 (TNS V1-V3)           00000001686EB510
3244                                             vone1                               26268 sqlplus@vone1 (TNS V1-V3)           00000001587024A8
3170                                             vone1                               25970 sqlplus@vone1 (TNS V1-V3)           00000001487592A0
3457                                             vone1                               26580 sqlplus@vone1 (TNS V1-V3)           0000000160773F58
3338                                             vone1                               26286 sqlplus@vone1 (TNS V1-V3)           00000001507218C8
3218                                             vone1                               25976 sqlplus@vone1 (TNS V1-V3)           000000014081F788
3410                                             vone1                               26592 sqlplus@vone1 (TNS V1-V3)           000000015870A4E8
3139                                             vone1                               26292 sqlplus@vone1 (TNS V1-V3)           00000001487612E0
3148                                             vone1  
/home/oracle/test$ ps -ef| grep 3480
oracle    3480  3399  0 14:10 pts/3    00:00:00 /opt/oracle/products/19.3.0/bin/sqlplus
oracle   23509 21062  0 14:38 pts/4    00:00:00 grep --color=auto 3480
/home/oracle/test$ ps -ef|grep 3399
oracle    3399     1  0 14:10 pts/3    00:00:00 /bin/bash /home/oracle/test/connOracleDB.sh
oracle    3480  3399  0 14:10 pts/3    00:00:00 /opt/oracle/products/19.3.0/bin/sqlplus
oracle   23686 21062  0 14:38 pts/4    00:00:00 grep --color=auto 3399

再查看connOracleDB.sh脚本,分析脚本内容是否会引起异常,杀掉该脚本。

/home/oracle/test$ cat /home/oracle/test/connOracleDB.sh
******
# pkill -9 connOracleDB.sh

8、在sql下执行生成kill的语句

SELECT 'alter system kill session''' || SID || ',' || serial# || ''';' FROM v$session WHERE sql_id IN ('f5kskn9df2h2p');
'ALTERSYSTEMKILLSESSION'''||SID||','||SERIAL#||''';'
------------------------------------------------------
alter system kill session'3,19584' immediate;
alter system kill session'4,26016';
alter system kill session'5,543';
alter system kill session'6,22344';
......
alter system kill session'4727,35784';
alter system kill session'4728,57499';
alter system kill session'4729,45135';

300 rows selected.

后语

问题解决,激起了些许深入学习数据库的兴趣,但是想到那些又长又复杂的sql语句又开始头疼了。。。

posted @ 2023-09-05 17:14  寻梦99  阅读(690)  评论(0)    收藏  举报