【问题记录】Druid ORA-01013 问题排查记录
项目场景:
公司某现场经常在停服务之前报SQLException:ORA-01013,正常业务也偶发SQLException:ORA-01013但是频率较低。问题描述:
软件版本如下:- 数据源:alibaba druid-1.1.9版本
- 数据库:Oracle11g 11.2.0.3.0
- JDBC驱动:ojdbc7-11
报错异常堆栈如下:
2020-11-04 09:41:22.228 ERROR [Thread-170] druid.sql.Statement[Slf4jLogFilter.java:149]
{conn-10738, pstmt-20197} execute error. select *
from xxxxx
where xxxxxxxxxxxxxxx
java.sql.SQLException: ORA-01013: 用户请求取消当前的操作
at oracle.jdbc.driver.SQLStateMapping.newSQLException(SQLStateMapping.java:70)
at oracle.jdbc.driver.DatabaseError.newSQLException(DatabaseError.java:133)
at oracle.jdbc.driver.DatabaseError.throwSqlException(DatabaseError.java:206)
at oracle.jdbc.driver.T4CTTIoer.processError(T4CTTIoer.java:455)
at oracle.jdbc.driver.T4CTTIoer.processError(T4CTTIoer.java:413)
at oracle.jdbc.driver.T4C8Oall.receive(T4C8Oall.java:1035)
at oracle.jdbc.driver.T4CPreparedStatement.doOall8(T4CPreparedStatement.java:194)
at oracle.jdbc.driver.T4CPreparedStatement.executeForRows(T4CPreparedStatement.java:953)
at oracle.jdbc.driver.T4CPreparedStatement.executeMaybeDescribe(T4CPreparedStatement.java:897)
at oracle.jdbc.driver.OracleStatement.doExecuteWithTimeout(OracleStatement.java:1188)
at oracle.jdbc.driver.OraclePreparedStatement.executeInternal(OraclePreparedStatement.java:3386)
at oracle.jdbc.driver.OraclePreparedStatement.execute(OraclePreparedStatement.java:3487)
at oracle.jdbc.driver.OraclePreparedStatementWrapper.execute(OraclePreparedStatementWrapper.java:1374)
at com.alibaba.druid.filter.FilterChainImpl.preparedStatement_execute(FilterChainImpl.java:3051)
at com.alibaba.druid.filter.FilterEventAdapter.preparedStatement_execute(FilterEventAdapter.java:440)
at com.alibaba.druid.filter.FilterChainImpl.preparedStatement_execute(FilterChainImpl.java:3049)
at com.alibaba.druid.filter.FilterEventAdapter.preparedStatement_execute(FilterEventAdapter.java:440)
at com.alibaba.druid.filter.FilterChainImpl.preparedStatement_execute(FilterChainImpl.java:3049)
at com.alibaba.druid.wall.WallFilter.preparedStatement_execute(WallFilter.java:619)
at com.alibaba.druid.filter.FilterChainImpl.preparedStatement_execute(FilterChainImpl.java:3049)
at com.alibaba.druid.filter.FilterEventAdapter.preparedStatement_execute(FilterEventAdapter.java:440)
at com.alibaba.druid.filter.FilterChainImpl.preparedStatement_execute(FilterChainImpl.java:3049)
at com.alibaba.druid.filter.FilterAdapter.preparedStatement_execute(FilterAdapter.java:1080)
at com.alibaba.druid.filter.FilterChainImpl.preparedStatement_execute(FilterChainImpl.java:3049)
at com.alibaba.druid.proxy.jdbc.PreparedStatementProxyImpl.execute(PreparedStatementProxyImpl.java:167)
at com.alibaba.druid.pool.DruidPooledPreparedStatement.execute(DruidPooledPreparedStatement.java:498)
at sun.reflect.GeneratedMethodAccessor514.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.ibatis.logging.jdbc.PreparedStatementLogger.invoke(PreparedStatementLogger.java:59)
at com.sun.proxy.$Proxy36.execute(Unknown Source)
at org.apache.ibatis.executor.statement.PreparedStatementHandler.query(PreparedStatementHandler.java:63)
at org.apache.ibatis.executor.statement.RoutingStatementHandler.query(RoutingStatementHandler.java:79)
at sun.reflect.GeneratedMethodAccessor503.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.ibatis.plugin.Plugin.invoke(Plugin.java:63)
at com.sun.proxy.$Proxy34.query(Unknown Source)
at sun.reflect.GeneratedMethodAccessor503.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.ibatis.plugin.Invocation.proceed(Invocation.java:49)
...
at org.apache.ibatis.plugin.Plugin.invoke(Plugin.java:61)
at com.sun.proxy.$Proxy34.query(Unknown Source)
at org.apache.ibatis.executor.SimpleExecutor.doQuery(SimpleExecutor.java:63)
at org.apache.ibatis.executor.BaseExecutor.queryFromDatabase(BaseExecutor.java:324)
at org.apache.ibatis.executor.BaseExecutor.query(BaseExecutor.java:156)
at org.apache.ibatis.executor.CachingExecutor.query(CachingExecutor.java:109)
at org.apache.ibatis.executor.CachingExecutor.query(CachingExecutor.java:83)
at org.apache.ibatis.session.defaults.DefaultSqlSession.selectList(DefaultSqlSession.java:148)
at org.apache.ibatis.session.defaults.DefaultSqlSession.selectList(DefaultSqlSession.java:141)
...
at sun.reflect.GeneratedMethodAccessor666.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:317)
at org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:190)
at org.springframework.aop.framework.CglibAopProxy$CglibMethodInvocation.invokeJoinpoint(CglibAopProxy.java:720)
at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:157)
at org.springframework.aop.framework.CglibAopProxy$DynamicAdvisedInterceptor.intercept(CglibAopProxy.java:653)
...
at org.springframework.cglib.proxy.MethodProxy.invoke(MethodProxy.java:204)
at org.springframework.aop.framework.CglibAopProxy$DynamicAdvisedInterceptor.intercept(CglibAopProxy.java:649)
...
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
原因分析:
-
观察Oracle数据库服务端是否存在异常
结论: 查看Oracle Alert日志,发现每次报错时,数据库都在进行redo日志切换,怀疑可能是redo日志文件大小设置过小导致,查阅资料发现设置为20分钟切换一次比较合理,调大redo日志大小之后问题仍然存在。但是异常从SQLException:ORA-01013变成了SQLTimeOutException:ORA-01013。证明问题不在Oracle服务端。 -
测试环境做压力测试,复现问题,观察问题出现是否存在规律
结论: 复现之后发现报错位置并无规律。 -
发现每次停服务之前问题发生频率较高,看stop.sh脚本中发现每次停服务之前会进行jmap dump操作。jmap dump时会发生STW,可能导致SQL执行超时导致异常。
结论: stop.sh脚本去掉dump操作之后,问题发生频率明显降低。 -
怀疑druid在调用oralce jdbc时传递的queryTimeout参数存在问题。
验证方案:
首先在本地复现此报错。
给某条记录加行级锁,编写测试代码设置queryTimeout为1之后执行更新操作,复现了ORA-01013问题
通过druid filter,打印出PreparedStatement的queryTimeout变量值,观察是否此值配置存在问题。
结论:
(1) 默认未配置时queryTimeout变量值全为0(为0代表永不超时),问题存在。
(2) 将DruidDataSource的queryTimeout设置为1800,代表SQL执行超时时间为30分钟,问题存在。
不是此参数存在问题导致,可能是数据库驱动对此变量处理存在问题,无论此值设置多少,都会存在ORA-01013. -
升级oracle驱动包为ojdbc8-12.2.0.1.0.jar
结论: 更换之后,压测之后未复现此问题。
此问题在github上有多个issue都未解决,有的公司将数据源更换为HikariCP之后问题解决。此方案我尝试之后确实可以解决,但是具体为什么更换数据源、升级jdbc驱动包都可解决此问题,仍是未解之谜,革命仍未成功,同志仍需努力呀!!!
github issue地址:
https://github.com/alibaba/druid/issues/4178
解决方案:
- 升级JDBC驱动程序
- 更换数据源(许多公司使用druid存在此问题,druid官方未给出解决方案)