SpingBoot项目Tomcat假死,导致http(openfeign)请求无法响应问题定位
项目简介:
<spring-boot.version>2.3.2.RELEASE</spring-boot.version>
<spring-cloud.version>Hoxton.SR12</spring-cloud.version>
使用docker进行项目部署
问题描述:
项目中代码中大量使用异步多线程操作,没个异步过程中大量掺杂数据库查询、Redis查询、Feign调用、RabbitMq发送接收,由于是单机项目,很快导致Tomcat进入假死状态,服务正常运行,但是所有网络请求不通,基本探测页也无法打开.
容器正常运行
网络请求均报错feign.RetryableException: Read timed out executing POST
2024-04-11 10:26:00,016 [INFO] [] [Thread-11] c.t.r.xd.modules.bigserver.job.search.ScEvery1MinutesJob [ScEvery1MinutesJob.java : 52] 当前JOB推送门店商品搜索任务数量:null 2024-04-11 10:26:00,018 [INFO] [] [Thread-16] c.t.r.xd.modules.bigserver.job.biz.ServerMonitoringBiz [ServerMonitoringBiz.java : 26] monitorThreadPoll 中台系统监控非核心线程池:taskCount [9377], completedTaskCount [6479], activeCount [8], queueSize [2890] 2024-04-11 10:26:00,018 [INFO] [] [Thread-16] c.t.r.xd.modules.bigserver.job.biz.ServerMonitoringBiz [ServerMonitoringBiz.java : 30] monitorThreadPoll 中台系统监控核心线程池:taskCount [0], completedTaskCount [0], activeCount [0], queueSize [0] 2024-04-11 10:26:00,188 [INFO] [] [Thread-11] com.tunwu.retailcloud.xd.tools.aop.runlog.RunTimeLogAop [RunTimeLogAop.java : 56] AopMethod->ScEvery1MinutesJob.scEvery1MinutesJob[搜索每1分钟需要执行一次的处理任务],ArgsIn:,ArgsOut:{"code":200},RunTime:172ms 2024-04-11 10:26:01,609 [ERROR] [] [Thread-14] c.t.r.xd.modules.bigserver.job.biz.ExcelServerJobBiz [ExcelServerJobBiz.java : 376] exportExceptionHandler 执行异常: feign.RetryableException: Read timed out executing POST http://big-server/excelExportRecordService/queryExcelExportRecordPage at feign.FeignException.errorExecuting(FeignException.java:249) at feign.SynchronousMethodHandler.executeAndDecode(SynchronousMethodHandler.java:129) at feign.SynchronousMethodHandler.invoke(SynchronousMethodHandler.java:89) at feign.ReflectiveFeign$FeignInvocationHandler.invoke(ReflectiveFeign.java:100) at com.sun.proxy.$Proxy226.queryExcelExportRecordPage(Unknown Source) at com.tunwu.retailcloud.xd.modules.bigserver.job.biz.ExcelServerJobBiz.lambda$exportExceptionHandler$8(ExcelServerJobBiz.java:353) at com.tunwu.retailcloud.xd.tools.utils.XToolUtils.checkResponseDto(XToolUtils.java:443) at com.tunwu.retailcloud.xd.tools.utils.XToolUtils.checkResponseDto(XToolUtils.java:426) at com.tunwu.retailcloud.xd.modules.bigserver.job.biz.ExcelServerJobBiz.exportExceptionHandler(ExcelServerJobBiz.java:353) at com.tunwu.retailcloud.xd.modules.bigserver.job.biz.ExcelServerJobBiz$$FastClassBySpringCGLIB$$e6a9f45c.invoke(<generated>) at org.springframework.cglib.proxy.MethodProxy.invoke(MethodProxy.java:218) at org.springframework.aop.framework.CglibAopProxy$DynamicAdvisedInterceptor.intercept(CglibAopProxy.java:687) at com.tunwu.retailcloud.xd.modules.bigserver.job.biz.ExcelServerJobBiz$$EnhancerBySpringCGLIB$$7bcec4a.exportExceptionHandler(<generated>) at com.tunwu.retailcloud.xd.modules.bigserver.job.common.Every2MinutesJob.every2MinutesHandler(Every2MinutesJob.java:34) at sun.reflect.GeneratedMethodAccessor841.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at com.xxl.job.core.handler.impl.MethodJobHandler.execute(MethodJobHandler.java:31) at com.xxl.job.core.thread.JobThread.run(JobThread.java:163) Caused by: java.net.SocketTimeoutException: Read timed out at java.net.SocketInputStream.socketRead0(Native Method) at java.net.SocketInputStream.socketRead(SocketInputStream.java:116) at java.net.SocketInputStream.read(SocketInputStream.java:171) at java.net.SocketInputStream.read(SocketInputStream.java:141) at java.io.BufferedInputStream.fill(BufferedInputStream.java:246) at java.io.BufferedInputStream.read1(BufferedInputStream.java:286) at java.io.BufferedInputStream.read(BufferedInputStream.java:345) at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:735) at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:678) at sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1595) at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1500) at java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:480) at feign.Client$Default.convertResponse(Client.java:108) at feign.Client$Default.execute(Client.java:104) at org.springframework.cloud.openfeign.ribbon.RetryableFeignLoadBalancer.lambda$execute$0(RetryableFeignLoadBalancer.java:109) at org.springframework.retry.support.RetryTemplate.doExecute(RetryTemplate.java:287) at org.springframework.retry.support.RetryTemplate.execute(RetryTemplate.java:180) at org.springframework.cloud.openfeign.ribbon.RetryableFeignLoadBalancer.execute(RetryableFeignLoadBalancer.java:92) at org.springframework.cloud.openfeign.ribbon.RetryableFeignLoadBalancer.execute(RetryableFeignLoadBalancer.java:52) at com.netflix.client.AbstractLoadBalancerAwareClient$1.call(AbstractLoadBalancerAwareClient.java:104) at com.netflix.loadbalancer.reactive.LoadBalancerCommand$3$1.call(LoadBalancerCommand.java:303) at com.netflix.loadbalancer.reactive.LoadBalancerCommand$3$1.call(LoadBalancerCommand.java:287) at rx.internal.util.ScalarSynchronousObservable$3.call(ScalarSynchronousObservable.java:231) at rx.internal.util.ScalarSynchronousObservable$3.call(ScalarSynchronousObservable.java:228) at rx.Observable.unsafeSubscribe(Observable.java:10327) at rx.internal.operators.OnSubscribeConcatMap$ConcatMapSubscriber.drain(OnSubscribeConcatMap.java:286) at rx.internal.operators.OnSubscribeConcatMap$ConcatMapSubscriber.onNext(OnSubscribeConcatMap.java:144) at com.netflix.loadbalancer.reactive.LoadBalancerCommand$1.call(LoadBalancerCommand.java:185) at com.netflix.loadbalancer.reactive.LoadBalancerCommand$1.call(LoadBalancerCommand.java:180) at rx.Observable.unsafeSubscribe(Observable.java:10327) at rx.internal.operators.OnSubscribeConcatMap.call(OnSubscribeConcatMap.java:94) at rx.internal.operators.OnSubscribeConcatMap.call(OnSubscribeConcatMap.java:42) at rx.Observable.unsafeSubscribe(Observable.java:10327) at rx.internal.operators.OperatorRetryWithPredicate$SourceSubscriber$1.call(OperatorRetryWithPredicate.java:127) at rx.internal.schedulers.TrampolineScheduler$InnerCurrentThreadScheduler.enqueue(TrampolineScheduler.java:73) at rx.internal.schedulers.TrampolineScheduler$InnerCurrentThreadScheduler.schedule(TrampolineScheduler.java:52) at rx.internal.operators.OperatorRetryWithPredicate$SourceSubscriber.onNext(OperatorRetryWithPredicate.java:79) at rx.internal.operators.OperatorRetryWithPredicate$SourceSubscriber.onNext(OperatorRetryWithPredicate.java:45) at rx.internal.util.ScalarSynchronousObservable$WeakSingleProducer.request(ScalarSynchronousObservable.java:276) at rx.Subscriber.setProducer(Subscriber.java:209) at rx.internal.util.ScalarSynchronousObservable$JustOnSubscribe.call(ScalarSynchronousObservable.java:138) at rx.internal.util.ScalarSynchronousObservable$JustOnSubscribe.call(ScalarSynchronousObservable.java:129) at rx.internal.operators.OnSubscribeLift.call(OnSubscribeLift.java:48) at rx.internal.operators.OnSubscribeLift.call(OnSubscribeLift.java:30) at rx.internal.operators.OnSubscribeLift.call(OnSubscribeLift.java:48) at rx.internal.operators.OnSubscribeLift.call(OnSubscribeLift.java:30) at rx.internal.operators.OnSubscribeLift.call(OnSubscribeLift.java:48) at rx.internal.operators.OnSubscribeLift.call(OnSubscribeLift.java:30) at rx.Observable.subscribe(Observable.java:10423) at rx.Observable.subscribe(Observable.java:10390) at rx.observables.BlockingObservable.blockForSingle(BlockingObservable.java:443) at rx.observables.BlockingObservable.single(BlockingObservable.java:340) at com.netflix.client.AbstractLoadBalancerAwareClient.executeWithLoadBalancer(AbstractLoadBalancerAwareClient.java:112) at org.springframework.cloud.openfeign.ribbon.LoadBalancerFeignClient.execute(LoadBalancerFeignClient.java:84) at feign.SynchronousMethodHandler.executeAndDecode(SynchronousMethodHandler.java:119) ... 17 common frames omitted
定位步骤
进入容器
docker exec -it big-server /bin/bash
安装工具
apt-get update apt-get install net-tools
查询网络连接
netstat -n | awk '/^tcp/ {++S[$NF]} END {for(a in S) print a, S[a]}'
由于应用未停止,所以一直增加,说明有连接堵塞
top命令查看java进程
通过jstack查询线程信息
# 查看服务进程ID ps -ef |grep java项目名称 # 查看进程的线程信息 jstack $PID # 线程进行归并到txt文件中 jstack $PID > jstack.txt
退出容器执行cp命令把刚才生成的文件带回来
分析文件,发现大量CompletableFuture$Signaller.block
NEW:未启动的。不会出现在Dump中。
RUNNABLE:在虚拟机内执行的。运行中状态,可能里面还能看到locked字样,表明它获得了某把锁。
BLOCKED:受阻塞并等待监视器锁。被某个锁(synchronizers)給block住了。
WATING:无限期等待另一个线程执行特定操作。等待某个condition或monitor发生,一般停留在park(), wait(), sleep(),join() 等语句里。
TIMED_WATING:有时限的等待另一个线程的特定操作。和WAITING的区别是wait() 等语句加上了时间限制 wait(timeout)。
TERMINATED:已退出的。
定位到是异步拿取结果的时候死锁,改成超时时间限制10秒,问题解决
应该是多线程耗尽资源的问题,优化代码
一般引起这个问题都是资源池相关
检查数据库连接池、redis连接池、线程池等
参考
https://blog.csdn.net/zcjluse/article/details/125974518
https://www.cnblogs.com/fengyege/p/16936291.html