低版本spring-cloud-sleuth的TraceableExecutorService一个坑,注意绕行
我们知道Java 线程池提交任务有两种:
1,submit 会抛出异常,会有异常日志
2,execute 没有抛出异常,需要执行返回值Future的get方法触发抛出异常
execute
在执行中没有做任何包装,在线程池源码中会执行到runWorker
方法,这个方法中可以看到会把异常抛出,我们也可以通过覆盖afterExecute
方法对异常进行自定义操作。这也就是execute
会有异常信息打印的原因。
final void runWorker(Worker w) {
Thread wt = Thread.currentThread();
Runnable task = w.firstTask;
w.firstTask = null;
w.unlock(); // allow interrupts
boolean completedAbruptly = true;
try {
while (task != null || (task = getTask()) != null) {
w.lock();
// If pool is stopping, ensure thread is interrupted;
// if not, ensure thread is not interrupted. This
// requires a recheck in second case to deal with
// shutdownNow race while clearing interrupt
if ((runStateAtLeast(ctl.get(), STOP) ||
(Thread.interrupted() &&
runStateAtLeast(ctl.get(), STOP))) &&
!wt.isInterrupted())
wt.interrupt();
try {
beforeExecute(wt, task);
Throwable thrown = null;
try {
task.run();
} catch (RuntimeException x) {
thrown = x; throw x;
} catch (Error x) {
thrown = x; throw x;
} catch (Throwable x) {
thrown = x; throw new Error(x);
} finally {
afterExecute(task, thrown);
}
} finally {
task = null;
w.completedTasks++;
w.unlock();
}
}
completedAbruptly = false;
} finally {
processWorkerExit(w, completedAbruptly);
}
}
submit
在提交过程中会把任务包装成FutureTask
,FutureTask
的run
方法:
public void run() {
if (state != NEW ||
!UNSAFE.compareAndSwapObject(this, runnerOffset,
null, Thread.currentThread()))
return;
try {
Callable<v> c = callable;
if (c != null && state == NEW) {
V result;
boolean ran;
try {
result = c.call();
ran = true;
} catch (Throwable ex) {
result = null;
ran = false;
setException(ex);
}
if (ran)
set(result);
}
} finally {
// runner must be non-null until state is settled to
// prevent concurrent calls to run()
runner = null;
// state must be re-read after nulling runner to prevent
// leaked interrupts
int s = state;
if (s >= INTERRUPTING)
handlePossibleCancellationInterrupt(s);
}
}
看代码就知道,它不会直接把在执行任务的时候把异常直接抛出来,而是执行了 setException(ex);
方法把异常存了起来,等到get
方法请求时触发异常。
话要从前几天一个问题讲起,我们应用使用Spring Cloud,我对应用使用的线程池进行统一管理,研发在使用execute方法提交任务后,没有看到执行任务的代码,也没有任何报错信息。
根据前面的分析,execute有异常是会抛出异常也是有日志信息的,我调试了一下异常情况,居然没有异常抛出,发现我们的线程池都被TraceableExecutorService
代理了,打开这个类,我就惊呆了,execute
方法被篡改成submit
提交了。
Spring Cloud 里会使用spring-cloud-sleuth
来做日志链路跟踪,一种常见的场景就是在应用中线程池使用的线程,如何把跟踪的用的traceId
传递过去,Spring的做法是把所有ExecutorService类型的Bean都用 TraceableExecutorService
进行代理,这个代理类生产代码实现在ExecutorBeanPostProcessor
里的postProcessAfterInitialization
方法中,具体代码:
if (bean instanceof ExecutorService
&& !(bean instanceof TraceableExecutorService)) {
if (isProxyNeeded(beanName)) {
return wrapExecutorService(bean);
}
else {
log.info("Not instrumenting bean " + beanName);
}
}
注意,我使用的是Spring Cloud 的版本是Greenwich.SR1
,它依赖的spring-cloud-starter-sleuth
是2.1.1.RELEASE
而这个版本的TraceableExecutorService
实现了execute
代理方法,把一个execute
篡改成submit
提交任务,带来的后果就是预期的异常抛出却没有抛出。
public void execute(Runnable command) {
this.delegate.submit(ContextUtil.isContextInCreation(this.beanFactory) ? command
: new TraceRunnable(tracing(), spanNamer(), command, this.spanName));
}
这个是bug产生时的改动:
这个是修复bug的描述:
https://github.com/spring-cloud/spring-cloud-sleuth/commit/e2e1cdb336ec89b26fcd22116510c2d5eee6924e
在我看来这是个低级错误,复制粘贴代码出的错,带来的影响虽然不大,但在一些场景没有异常信息其实是很致命的,甚至会导致一个bug隐藏很久或者一个bug要查很久。