【问题记录】Druid ORA-01013 问题排查记录

项目场景:

公司某现场经常在停服务之前报SQLException:ORA-01013,正常业务也偶发SQLException:ORA-01013但是频率较低。

问题描述:

软件版本如下:
  • 数据源:alibaba druid-1.1.9版本
  • 数据库:Oracle11g 11.2.0.3.0
  • JDBC驱动:ojdbc7-11

报错异常堆栈如下:

2020-11-04 09:41:22.228 ERROR [Thread-170] druid.sql.Statement[Slf4jLogFilter.java:149] 
{conn-10738, pstmt-20197} execute error. select *
from xxxxx
where xxxxxxxxxxxxxxx
java.sql.SQLException: ORA-01013: 用户请求取消当前的操作

	at oracle.jdbc.driver.SQLStateMapping.newSQLException(SQLStateMapping.java:70)
	at oracle.jdbc.driver.DatabaseError.newSQLException(DatabaseError.java:133)
	at oracle.jdbc.driver.DatabaseError.throwSqlException(DatabaseError.java:206)
	at oracle.jdbc.driver.T4CTTIoer.processError(T4CTTIoer.java:455)
	at oracle.jdbc.driver.T4CTTIoer.processError(T4CTTIoer.java:413)
	at oracle.jdbc.driver.T4C8Oall.receive(T4C8Oall.java:1035)
	at oracle.jdbc.driver.T4CPreparedStatement.doOall8(T4CPreparedStatement.java:194)
	at oracle.jdbc.driver.T4CPreparedStatement.executeForRows(T4CPreparedStatement.java:953)
	at oracle.jdbc.driver.T4CPreparedStatement.executeMaybeDescribe(T4CPreparedStatement.java:897)
	at oracle.jdbc.driver.OracleStatement.doExecuteWithTimeout(OracleStatement.java:1188)
	at oracle.jdbc.driver.OraclePreparedStatement.executeInternal(OraclePreparedStatement.java:3386)
	at oracle.jdbc.driver.OraclePreparedStatement.execute(OraclePreparedStatement.java:3487)
	at oracle.jdbc.driver.OraclePreparedStatementWrapper.execute(OraclePreparedStatementWrapper.java:1374)
	at com.alibaba.druid.filter.FilterChainImpl.preparedStatement_execute(FilterChainImpl.java:3051)
	at com.alibaba.druid.filter.FilterEventAdapter.preparedStatement_execute(FilterEventAdapter.java:440)
	at com.alibaba.druid.filter.FilterChainImpl.preparedStatement_execute(FilterChainImpl.java:3049)
	at com.alibaba.druid.filter.FilterEventAdapter.preparedStatement_execute(FilterEventAdapter.java:440)
	at com.alibaba.druid.filter.FilterChainImpl.preparedStatement_execute(FilterChainImpl.java:3049)
	at com.alibaba.druid.wall.WallFilter.preparedStatement_execute(WallFilter.java:619)
	at com.alibaba.druid.filter.FilterChainImpl.preparedStatement_execute(FilterChainImpl.java:3049)
	at com.alibaba.druid.filter.FilterEventAdapter.preparedStatement_execute(FilterEventAdapter.java:440)
	at com.alibaba.druid.filter.FilterChainImpl.preparedStatement_execute(FilterChainImpl.java:3049)
	at com.alibaba.druid.filter.FilterAdapter.preparedStatement_execute(FilterAdapter.java:1080)
	at com.alibaba.druid.filter.FilterChainImpl.preparedStatement_execute(FilterChainImpl.java:3049)
	at com.alibaba.druid.proxy.jdbc.PreparedStatementProxyImpl.execute(PreparedStatementProxyImpl.java:167)
	at com.alibaba.druid.pool.DruidPooledPreparedStatement.execute(DruidPooledPreparedStatement.java:498)
	at sun.reflect.GeneratedMethodAccessor514.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.apache.ibatis.logging.jdbc.PreparedStatementLogger.invoke(PreparedStatementLogger.java:59)
	at com.sun.proxy.$Proxy36.execute(Unknown Source)
	at org.apache.ibatis.executor.statement.PreparedStatementHandler.query(PreparedStatementHandler.java:63)
	at org.apache.ibatis.executor.statement.RoutingStatementHandler.query(RoutingStatementHandler.java:79)
	at sun.reflect.GeneratedMethodAccessor503.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.apache.ibatis.plugin.Plugin.invoke(Plugin.java:63)
	at com.sun.proxy.$Proxy34.query(Unknown Source)
	at sun.reflect.GeneratedMethodAccessor503.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.apache.ibatis.plugin.Invocation.proceed(Invocation.java:49)
	...
	at org.apache.ibatis.plugin.Plugin.invoke(Plugin.java:61)
	at com.sun.proxy.$Proxy34.query(Unknown Source)
	at org.apache.ibatis.executor.SimpleExecutor.doQuery(SimpleExecutor.java:63)
	at org.apache.ibatis.executor.BaseExecutor.queryFromDatabase(BaseExecutor.java:324)
	at org.apache.ibatis.executor.BaseExecutor.query(BaseExecutor.java:156)
	at org.apache.ibatis.executor.CachingExecutor.query(CachingExecutor.java:109)
	at org.apache.ibatis.executor.CachingExecutor.query(CachingExecutor.java:83)
	at org.apache.ibatis.session.defaults.DefaultSqlSession.selectList(DefaultSqlSession.java:148)
	at org.apache.ibatis.session.defaults.DefaultSqlSession.selectList(DefaultSqlSession.java:141)
	...
	at sun.reflect.GeneratedMethodAccessor666.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:317)
	at org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:190)
	at org.springframework.aop.framework.CglibAopProxy$CglibMethodInvocation.invokeJoinpoint(CglibAopProxy.java:720)
	at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:157)
	at org.springframework.aop.framework.CglibAopProxy$DynamicAdvisedInterceptor.intercept(CglibAopProxy.java:653)
	...
	at org.springframework.cglib.proxy.MethodProxy.invoke(MethodProxy.java:204)
	at org.springframework.aop.framework.CglibAopProxy$DynamicAdvisedInterceptor.intercept(CglibAopProxy.java:649)
	...
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)

原因分析:

  1. 观察Oracle数据库服务端是否存在异常
    结论: 查看Oracle Alert日志,发现每次报错时,数据库都在进行redo日志切换,怀疑可能是redo日志文件大小设置过小导致,查阅资料发现设置为20分钟切换一次比较合理,调大redo日志大小之后问题仍然存在。但是异常从SQLException:ORA-01013变成了SQLTimeOutException:ORA-01013。证明问题不在Oracle服务端。

  2. 测试环境做压力测试,复现问题,观察问题出现是否存在规律
    结论: 复现之后发现报错位置并无规律。

  3. 发现每次停服务之前问题发生频率较高,看stop.sh脚本中发现每次停服务之前会进行jmap dump操作。jmap dump时会发生STW,可能导致SQL执行超时导致异常。
    结论: stop.sh脚本去掉dump操作之后,问题发生频率明显降低。

  4. 怀疑druid在调用oralce jdbc时传递的queryTimeout参数存在问题。
    验证方案:
    首先在本地复现此报错。
    给某条记录加行级锁,编写测试代码设置queryTimeout为1之后执行更新操作,复现了ORA-01013问题
    通过druid filter,打印出PreparedStatement的queryTimeout变量值,观察是否此值配置存在问题。
    结论:
    (1) 默认未配置时queryTimeout变量值全为0(为0代表永不超时),问题存在。
    (2) 将DruidDataSource的queryTimeout设置为1800,代表SQL执行超时时间为30分钟,问题存在。
    不是此参数存在问题导致,可能是数据库驱动对此变量处理存在问题,无论此值设置多少,都会存在ORA-01013.

  5. 升级oracle驱动包为ojdbc8-12.2.0.1.0.jar
    结论: 更换之后,压测之后未复现此问题。

此问题在github上有多个issue都未解决,有的公司将数据源更换为HikariCP之后问题解决。此方案我尝试之后确实可以解决,但是具体为什么更换数据源、升级jdbc驱动包都可解决此问题,仍是未解之谜,革命仍未成功,同志仍需努力呀!!!

github issue地址:
https://github.com/alibaba/druid/issues/4178


解决方案:

  1. 升级JDBC驱动程序
  2. 更换数据源(许多公司使用druid存在此问题,druid官方未给出解决方案)
posted @ 2021-04-21 22:39  itaot  阅读(1752)  评论(0编辑  收藏  举报