tomcat挂掉问题定位分析
背景
上周五上线了job,运行到一半就有告警,tomcat挂了~~~什么鬼👻,线下环境测试都是ok的啊,为什么会导致tomcat挂???第一次遇到这个问题,真的不知道从哪下手,走了不少弯路,现记录下,仅供参考。
过程
看运行日志
26-Nov-2018 16:23:19.728 SEVERE [localhost-startStop-2] org.apache.catalina.loader.WebappClassLoaderBase.checkThreadLocalMapForLeaks The web application [ROOT] created a ThreadLocal with key of type [java.lang.ThreadLocal] (value [java.lang.ThreadLocal@4722a29c]) and a value of type [io.netty.util.internal.InternalThreadLocalMap] (value [io.netty.util.internal.InternalThreadLocalMap@7d10424b]) but failed to remove it when the web application was stopped. Threads are going to be renewed over time to try and avoid a probable memory leak.
26-Nov-2018 16:23:19.728 SEVERE [localhost-startStop-2] org.apache.catalina.loader.WebappClassLoaderBase.checkThreadLocalMapForLeaks The web application [ROOT] created a ThreadLocal with key of type [java.lang.ThreadLocal] (value [java.lang.ThreadLocal@6c42b33f]) and a value of type [com.message.internal.DefaultMessageManager.Context] (value [com.message.internal.DefaultMessageManager$Context@430918d7]) but failed to remove it when the web application was stopped. Threads are going to be renewed over time to try and avoid a probable memory leak.
26-Nov-2018 16:23:19.729 SEVERE [localhost-startStop-2] org.apache.catalina.loader.WebappClassLoaderBase.checkThreadLocalMapForLeaks The web application [ROOT] created a ThreadLocal with key of type [java.lang.ThreadLocal] (value [java.lang.ThreadLocal@6c42b33f]) and a value of type [com.message.internal.DefaultMessageManager.Context] (value [com.message.internal.DefaultMessageManager$Context@116d6468]) but failed to remove it when the web application was stopped. Threads are going to be renewed over time to try and avoid a probable memory leak.
......
内存溢出?看了下监控平台jvm参数
也没有发生gc
运行时间2018-11-23 16:12:39~2018-11-23 16:35:07
看了下代码,没有大对象或是大批量的对象产生。那是什么原因呢?还是没有头绪,复现吧。由于beta环境有问题,只能线上复现,通过dump日志来定位。还好这个服务不是核心服务,是旁路校验。又出现问题了,dump的是否tomcat已经挂了~~~试了好几次都是这样.后来一个同事说,能不能在本地复现呢,之前由于beta环境有问题,忽略了本地环境,于是在本地debug复现,终于发现了罪魁祸首
System.exit(0);直接把tomcat给关了。
目标渐渐清晰了。后又上网查了System.exit(0)表示正常退出JVM。哎😔~~~
后来上网查了下,发现我的定位过程有点问题,应该先看tomcat--catalina日志
有这样一段日志
这是非正常关闭,正常关闭会在 pause 之前有这样的一句warn日志:
org.apache.catalina.core.StandardServer await
A valid shutdown command was received via the shutdown port. Stopping the Server instance.
然后才是 pause -> stop -> destory
tomcat的shutdownhook被触发,一般有三种情况,
- 应用代码里有地方用到System.exit来退出jvm,比如说我~~~
- 系统发出的信号(kill -9除外,sigkill信号JVM不会有机会执行shutdownhook)
- 当我们startup后,会习惯性的去执行 tail -f catalina.out来查看日志,看程序是否正常启动执行,但是如果在你没有结束查看日志进程的时候强行关闭ssh回话,也会一定概率的导致tomcat意外退出。
对tomcat这块知识了解甚少,导致排查时间过长,后面找个时间专门学习下。