Oracle数据库导致系统CPU使用率100%的案例
Oracle数据库导致系统CPU使用率100%的案例
▲点击上方名片“蜘蛛杂谈”关注
后语
问题解决,激起了些许深入学习数据库的兴趣,但是想到那些又长又复杂的sql语句又开始头疼了。。。
★ 欢迎大家阅读、关注、点赞、收藏、转发 ★★ 关注我,关注东方蜘蛛,交流更多硬件系统运维干货 ★★“蜘蛛杂谈”诚邀广大IT爱好者投稿,投稿邮箱:45778530@qq.com,或者添加作者微信:eastspider0214
前言这个案例只是模拟案例。虽然作为一名系统工程师,稍有些数据库基础,但是碰到和系统相关的问题时,我们还是需要从系统层去做分析处理,希望这个案例能给大家带来些许处理思路。
故障现象
Oracle 19c RAC 2节点CPU使用率100%。(使用top、sar等命令)
故障分析过程1、查询负载很高前35个进程2、查询前35负载高的sPID# ps aux|head -1;ps aux|grep -v PID|sort -rn -k +3|head -35
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
oracle 21974 13.7 0.0 40453412 33848 ? Rs 10:56 1:27 oracleorcl1 (LOCAL=NO)
root 3204 13.6 0.0 114296 6480 ? Sl 11:06 0:00 /opt/smartstorageadmin/ssacli/bin/ssacli ctrl all show config
oracle 22790 13.6 0.0 40715556 33888 ? Rs 10:56 1:25 oracleorcl1 (LOCAL=NO)
oracle 22317 13.6 0.0 40715556 32004 ? Rs 10:56 1:26 oracleorcl1 (LOCAL=NO)
oracle 22260 13.6 0.0 40453408 33912 ? Rs 10:56 1:26 oracleorcl1 (LOCAL=NO)
oracle 22215 13.6 0.0 40453408 33892 ? Rs 10:56 1:26 oracleorcl1 (LOCAL=NO)
......
oracle 22219 13.4 0.0 40453412 33904 ? Rs 10:56 1:25 oracleorcl1 (LOCAL=NO)
oracle 22156 13.4 0.0 40453872 34464 ? Rs 10:56 1:26 oracleorcl1 (LOCAL=NO)
oracle 22145 13.4 0.0 40453412 33848 ? Rs 10:56 1:25 oracleorcl1 (LOCAL=NO)
oracle 22124 13.4 0.0 40453412 31724 ? Rs 10:56 1:26 oracleorcl1 (LOCAL=NO)
oracle 22109 13.4 0.0 40453408 33836 ? Rs 10:56 1:26 oracleorcl1 (LOCAL=NO)
3、把得到的spid替换到下面sql语句中# ps aux| grep -v PID | sort -rn -k +3 | head -35 | awk '{print$2}'
23033
22948
23015
22941
23091
23061
......
23501
23440
23429
23404
23266
23161
4、查询“f5kskn9df2h2p”相关的sql_id和数量select sql_id from v$session
where paddr in(
select addr from v$process
where spid in(
'23033',
'22948',
'23015',
'22941',
......
'23429',
'23404',
'23266',
'23161'
)
);
SQL_ID
--------------------------
f5kskn9df2h2p
f5kskn9df2h2p
f5kskn9df2h2p
f5kskn9df2h2p
f5kskn9df2h2p
f5kskn9df2h2p
......
f5kskn9df2h2p
f5kskn9df2h2p
f5kskn9df2h2p
f5kskn9df2h2p
f5kskn9df2h2p
f5kskn9df2h2p
35 rows selected.
> SELECT sid,serial#,sql_id from v$session where sql_id in ( 'f5kskn9df2h2p');
SID SERIAL# SQL_ID
---------- ---------- --------------------------
4609 49390 f5kskn9df2h2p
4722 1377 f5kskn9df2h2p
4723 7168 f5kskn9df2h2p
4724 51376 f5kskn9df2h2p
4725 62475 f5kskn9df2h2p
4726 64323 f5kskn9df2h2p
4727 19318 f5kskn9df2h2p
4728 23453 f5kskn9df2h2p
4729 34066 f5kskn9df2h2p
300 rows selected.
5、随机抽一条,查看一下具体是执行的什么内容> SELECT count(*) from v$session where sql_id in ( 'f5kskn9df2h2p' );
COUNT(*)
----------
300
6、思路1:系统层查看相关进程启动时间,发现在同一时间有sh脚本在执行,脚本内容如下:(感觉不准确,可能会误判)> select sql_text from v$sqlarea a,v$session b where a.SQL_ID=b.PREV_SQL_ID and b.SID='4729';
SQL_TEXT
杀掉所有connOracleDB.sh进程:/home/oracle/test$ cat /home/oracle/test/connOracleDB.sh
*******
# pkill -9 connOracleDB.sh
7、思路2:通过SQL语句查询sql_id对应的系统进程,再查其对应的父进程> col MACHINE for a30
> select process,MACHINE,PORT,PROGRAM,PADDR from v$session where sql_id = 'f5kskn9df2h2p';
PROCESS MACHINE PORT PROGRAM PADDR
------------------------------------------------ ------------------------------ ---------- ----------------------------------- ----------------
3480 vone1 26564 sqlplus@vone1 (TNS V1-V3) 000000013870A318
3155 vone1 26280 sqlplus@vone1 (TNS V1-V3) 000000016076BF18
3118 vone1 25964 sqlplus@vone1 (TNS V1-V3) 0000000150719888
3393 vone1 26572 sqlplus@vone1 (TNS V1-V3) 00000001686EB510
3244 vone1 26268 sqlplus@vone1 (TNS V1-V3) 00000001587024A8
3170 vone1 25970 sqlplus@vone1 (TNS V1-V3) 00000001487592A0
3457 vone1 26580 sqlplus@vone1 (TNS V1-V3) 0000000160773F58
3338 vone1 26286 sqlplus@vone1 (TNS V1-V3) 00000001507218C8
3218 vone1 25976 sqlplus@vone1 (TNS V1-V3) 000000014081F788
3410 vone1 26592 sqlplus@vone1 (TNS V1-V3) 000000015870A4E8
3139 vone1 26292 sqlplus@vone1 (TNS V1-V3) 00000001487612E0
3148 vone1 25978 sqlplus@vone1 (TNS V1-V3) 00000001386FB7F8
......
再查看connOracleDB.sh脚本,分析脚本内容是否会引起异常,杀掉该脚本。/home/oracle/test$ ps -ef| grep 3480
oracle 3480 3399 0 14:10 pts/3 00:00:00 /opt/oracle/products/19.3.0/bin/sqlplus
oracle 23509 21062 0 14:38 pts/4 00:00:00 grep --color=auto 3480
/home/oracle/test$ ps -ef|grep 3399
oracle 3399 1 0 14:10 pts/3 00:00:00 /bin/bash /home/oracle/test/connOracleDB.sh
oracle 3480 3399 0 14:10 pts/3 00:00:00 /opt/oracle/products/19.3.0/bin/sqlplus
oracle 23686 21062 0 14:38 pts/4 00:00:00 grep --color=auto 3399
8、在sql下执行生成kill的语句/home/oracle/test$ cat /home/oracle/test/connOracleDB.sh
******
# pkill -9 connOracleDB.sh
SELECT 'alter system kill session''' || SID || ',' || serial# || ''';' FROM v$session WHERE sql_id IN ('f5kskn9df2h2p');
'ALTERSYSTEMKILLSESSION'''||SID||','||SERIAL#||''';'
------------------------------------------------------
alter system kill session'3,19584' immediate;
alter system kill session'4,26016';
alter system kill session'5,543';
alter system kill session'6,22344';
......
alter system kill session'4727,35784';
alter system kill session'4728,57499';
alter system kill session'4729,45135';
300 rows selected.
后语
问题解决,激起了些许深入学习数据库的兴趣,但是想到那些又长又复杂的sql语句又开始头疼了。。。
★ 欢迎大家阅读、关注、点赞、收藏、转发 ★★ 关注我,关注东方蜘蛛,交流更多硬件系统运维干货 ★★“蜘蛛杂谈”诚邀广大IT爱好者投稿,投稿邮箱:45778530@qq.com,或者添加作者微信:eastspider0214
点击“在看”支持一下吧
收录于合集 #Oracle
2个上一篇Oracle数据库常用运维SQL语句
蜘蛛杂谈
关注后可发消息