Oracle使用errorstack跟踪客户端的ORA报错
Oracle使用errorstack跟踪客户端的ORA报错
前言
最近某个业务模块报错ORA-00942: table or view does not exist,想让我帮忙找出是哪个sql引起的。
一般客户端的ORA报错若单纯是客户端原因引起的,这样的ORA报错在数据库服务器是不会有记录的。
比如这个表或视图不存在的报错,如果客户端故意查一张不存在的表并且大量执行,在考虑并发的情况下,如果服务器都记录日志那么日志将会大量产生。
又或者客户端执行一条sql,但是带有语法错误类似ORA-00936: missing expression的时候,同理。
不过可以使用errorstack开关打开某个ORA错误号的监控,获取到足够的信息后再关闭。
实验
就以ORA-00942报错为例子,使用sys用户打开错误号为942的监控开关。
15:23:27 SYS@test(1218)> alter system set events '942 trace name errorstack forever, level 3'; System altered. Elapsed: 00:00:00.00
此时的警告日志:
Thu Sep 30 15:25:47 2021
OS Pid: 21259 executed alter system set events '942 trace name errorstack forever, level 3'
然后普通用户登录执行一个查询,查询一张不存在的表。
15:23:17 ZKM@test(432)> select * from no; select * from no * ERROR at line 1: ORA-00942: table or view does not exist Elapsed: 00:00:04.16
此时的警告日志:
Thu Sep 30 15:27:47 2021
Errors in file /u01/app/oracle/diag/rdbms/test/test/trace/test_ora_21173.trc:
Errors in file /u01/app/oracle/diag/rdbms/test/test/trace/test_ora_21173.trc:
ORA-00942: table or view does not exist
Thu Sep 30 15:27:49 2021
Dumping diagnostic data in directory=[cdmp_20210930152749], requested by (instance=1, osid=21173), summary=[abnormal process termination].
Dumping diagnostic data in directory=[cdmp_20210930152752], requested by (instance=1, osid=21173), summary=[abnormal process termination].
及时关闭开关:
15:30:45 SYS@test(1218)> alter system set events '942 trace name errorstack off'; System altered. Elapsed: 00:00:00.00
查看trace文件:
Trace file /u01/app/oracle/diag/rdbms/test/test/trace/test_ora_21173.trc
Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
ORACLE_HOME = /u01/app/oracle/product/11.2.0/db_1
System name: Linux
Node name: testdb
Release: 5.4.17-2011.6.2.el7uek.x86_64
Version: #2 SMP Thu Sep 3 14:09:14 PDT 2020
Machine: x86_64
VM name: VMWare Version: 6
Instance name: test
Redo thread mounted by this instance: 1
Oracle process number: 129
Unix process pid: 21173, image: oracle@testdb (TNS V1-V3)
*** 2021-09-30 15:27:47.079
*** SESSION ID:(432.4017) 2021-09-30 15:27:47.079
*** CLIENT ID:() 2021-09-30 15:27:47.079
*** SERVICE NAME:(SYS$USERS) 2021-09-30 15:27:47.079
*** MODULE NAME:(SQL*Plus) 2021-09-30 15:27:47.079
*** ACTION NAME:() 2021-09-30 15:27:47.079
dbkedDefDump(): Starting a non-incident diagnostic dump (flags=0x0, level=3, mask=0x0)
----- Error Stack Dump -----
----- Current SQL Statement for this session (sql_id=2611xga5f8w95) -----
select * from no
----- Call Stack Trace -----
calling call entry argument values in hex
location type point (? means dubious value)
-------------------- -------- -------------------- ----------------------------
*** 2021-09-30 15:27:47.306
skdstdst()+41 call kgdsdst() 000000000 ? 000000000 ?
7FFDE0D2CA80 ? 7FFDE0D2CB58 ?
7FFDE0D31600 ? 000000002 ?
ksedst1()+103 call skdstdst() 000000000 ? 000000000 ?
7FFDE0D2CA80 ? 7FFDE0D2CB58 ?
7FFDE0D31600 ? 000000002 ?
ksedst()+39 call ksedst1() 000000000 ? 000000001 ?
7FFDE0D2CA80 ? 7FFDE0D2CB58 ?
7FFDE0D31600 ? 000000002 ?
dbkedDefDump()+2746 call ksedst() 000000000 ? 000000001 ?
7FFDE0D2CA80 ? 7FFDE0D2CB58 ?
7FFDE0D31600 ? 000000002 ?
ksedmp()+41 call dbkedDefDump() 000000003 ? 000000000 ?
7FFDE0D2CA80 ? 7FFDE0D2CB58 ?
7FFDE0D31600 ? 000000002 ?
dbkdaKsdActDriver() call ksedmp() 000000003 ? 000000000 ?
+1960 7FFDE0D2CA80 ? 7FFDE0D2CB58 ?
7FFDE0D31600 ? 000000002 ?
dbgdaExecuteAction( call dbkdaKsdActDriver() 7F0DE7A1F730 ? 7FFDE0D33D00 ?
)+1065 7FFDE0D2CA80 ? 7FFDE0D2CB58 ?
7FFDE0D31600 ? 000000002 ?
...省略部分内容...
Starting SQL statement dump
SQL Information
user_id=703 user_name=ZKM module=SQL*Plus action=
sql_id=2611xga5f8w95 plan_hash_value=0 problem_type=0
----- Current SQL Statement for this session (sql_id=2611xga5f8w95) -----
select * from no
sql_text_length=17
sql=select * from no
Optimizer state dump:
Compilation Environment Dump
optimizer_mode_hinted = false
optimizer_features_hinted = 0.0.0
parallel_execution_enabled = true
parallel_query_forced_dop = 0
...省略部分内容...
可以看出已经有足够多的信息了。