SpingBoot项目Tomcat假死,导致http(openfeign)请求无法响应问题定位

项目简介:
<spring-boot.version>2.3.2.RELEASE</spring-boot.version>
<spring-cloud.version>Hoxton.SR12</spring-cloud.version>
使用docker进行项目部署

问题描述:
项目中代码中大量使用异步多线程操作,没个异步过程中大量掺杂数据库查询、Redis查询、Feign调用、RabbitMq发送接收,由于是单机项目,很快导致Tomcat进入假死状态,服务正常运行,但是所有网络请求不通,基本探测页也无法打开.
容器正常运行

 

 网络请求均报错feign.RetryableException: Read timed out executing POST

2024-04-11 10:26:00,016 [INFO] [] [Thread-11] c.t.r.xd.modules.bigserver.job.search.ScEvery1MinutesJob [ScEvery1MinutesJob.java : 52] 当前JOB推送门店商品搜索任务数量:null
2024-04-11 10:26:00,018 [INFO] [] [Thread-16] c.t.r.xd.modules.bigserver.job.biz.ServerMonitoringBiz [ServerMonitoringBiz.java : 26]  monitorThreadPoll 中台系统监控非核心线程池:taskCount [9377], completedTaskCount [6479], activeCount [8], queueSize [2890]
2024-04-11 10:26:00,018 [INFO] [] [Thread-16] c.t.r.xd.modules.bigserver.job.biz.ServerMonitoringBiz [ServerMonitoringBiz.java : 30]  monitorThreadPoll 中台系统监控核心线程池:taskCount [0], completedTaskCount [0], activeCount [0], queueSize [0]
2024-04-11 10:26:00,188 [INFO] [] [Thread-11] com.tunwu.retailcloud.xd.tools.aop.runlog.RunTimeLogAop [RunTimeLogAop.java : 56] AopMethod->ScEvery1MinutesJob.scEvery1MinutesJob[搜索每1分钟需要执行一次的处理任务],ArgsIn:,ArgsOut:{"code":200},RunTime:172ms
2024-04-11 10:26:01,609 [ERROR] [] [Thread-14] c.t.r.xd.modules.bigserver.job.biz.ExcelServerJobBiz [ExcelServerJobBiz.java : 376] exportExceptionHandler 执行异常:
feign.RetryableException: Read timed out executing POST http://big-server/excelExportRecordService/queryExcelExportRecordPage
    at feign.FeignException.errorExecuting(FeignException.java:249)
    at feign.SynchronousMethodHandler.executeAndDecode(SynchronousMethodHandler.java:129)
    at feign.SynchronousMethodHandler.invoke(SynchronousMethodHandler.java:89)
    at feign.ReflectiveFeign$FeignInvocationHandler.invoke(ReflectiveFeign.java:100)
    at com.sun.proxy.$Proxy226.queryExcelExportRecordPage(Unknown Source)
    at com.tunwu.retailcloud.xd.modules.bigserver.job.biz.ExcelServerJobBiz.lambda$exportExceptionHandler$8(ExcelServerJobBiz.java:353)
    at com.tunwu.retailcloud.xd.tools.utils.XToolUtils.checkResponseDto(XToolUtils.java:443)
    at com.tunwu.retailcloud.xd.tools.utils.XToolUtils.checkResponseDto(XToolUtils.java:426)
    at com.tunwu.retailcloud.xd.modules.bigserver.job.biz.ExcelServerJobBiz.exportExceptionHandler(ExcelServerJobBiz.java:353)
    at com.tunwu.retailcloud.xd.modules.bigserver.job.biz.ExcelServerJobBiz$$FastClassBySpringCGLIB$$e6a9f45c.invoke(<generated>)
    at org.springframework.cglib.proxy.MethodProxy.invoke(MethodProxy.java:218)
    at org.springframework.aop.framework.CglibAopProxy$DynamicAdvisedInterceptor.intercept(CglibAopProxy.java:687)
    at com.tunwu.retailcloud.xd.modules.bigserver.job.biz.ExcelServerJobBiz$$EnhancerBySpringCGLIB$$7bcec4a.exportExceptionHandler(<generated>)
    at com.tunwu.retailcloud.xd.modules.bigserver.job.common.Every2MinutesJob.every2MinutesHandler(Every2MinutesJob.java:34)
    at sun.reflect.GeneratedMethodAccessor841.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at com.xxl.job.core.handler.impl.MethodJobHandler.execute(MethodJobHandler.java:31)
    at com.xxl.job.core.thread.JobThread.run(JobThread.java:163)
Caused by: java.net.SocketTimeoutException: Read timed out
    at java.net.SocketInputStream.socketRead0(Native Method)
    at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
    at java.net.SocketInputStream.read(SocketInputStream.java:171)
    at java.net.SocketInputStream.read(SocketInputStream.java:141)
    at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
    at java.io.BufferedInputStream.read1(BufferedInputStream.java:286)
    at java.io.BufferedInputStream.read(BufferedInputStream.java:345)
    at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:735)
    at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:678)
    at sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1595)
    at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1500)
    at java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:480)
    at feign.Client$Default.convertResponse(Client.java:108)
    at feign.Client$Default.execute(Client.java:104)
    at org.springframework.cloud.openfeign.ribbon.RetryableFeignLoadBalancer.lambda$execute$0(RetryableFeignLoadBalancer.java:109)
    at org.springframework.retry.support.RetryTemplate.doExecute(RetryTemplate.java:287)
    at org.springframework.retry.support.RetryTemplate.execute(RetryTemplate.java:180)
    at org.springframework.cloud.openfeign.ribbon.RetryableFeignLoadBalancer.execute(RetryableFeignLoadBalancer.java:92)
    at org.springframework.cloud.openfeign.ribbon.RetryableFeignLoadBalancer.execute(RetryableFeignLoadBalancer.java:52)
    at com.netflix.client.AbstractLoadBalancerAwareClient$1.call(AbstractLoadBalancerAwareClient.java:104)
    at com.netflix.loadbalancer.reactive.LoadBalancerCommand$3$1.call(LoadBalancerCommand.java:303)
    at com.netflix.loadbalancer.reactive.LoadBalancerCommand$3$1.call(LoadBalancerCommand.java:287)
    at rx.internal.util.ScalarSynchronousObservable$3.call(ScalarSynchronousObservable.java:231)
    at rx.internal.util.ScalarSynchronousObservable$3.call(ScalarSynchronousObservable.java:228)
    at rx.Observable.unsafeSubscribe(Observable.java:10327)
    at rx.internal.operators.OnSubscribeConcatMap$ConcatMapSubscriber.drain(OnSubscribeConcatMap.java:286)
    at rx.internal.operators.OnSubscribeConcatMap$ConcatMapSubscriber.onNext(OnSubscribeConcatMap.java:144)
    at com.netflix.loadbalancer.reactive.LoadBalancerCommand$1.call(LoadBalancerCommand.java:185)
    at com.netflix.loadbalancer.reactive.LoadBalancerCommand$1.call(LoadBalancerCommand.java:180)
    at rx.Observable.unsafeSubscribe(Observable.java:10327)
    at rx.internal.operators.OnSubscribeConcatMap.call(OnSubscribeConcatMap.java:94)
    at rx.internal.operators.OnSubscribeConcatMap.call(OnSubscribeConcatMap.java:42)
    at rx.Observable.unsafeSubscribe(Observable.java:10327)
    at rx.internal.operators.OperatorRetryWithPredicate$SourceSubscriber$1.call(OperatorRetryWithPredicate.java:127)
    at rx.internal.schedulers.TrampolineScheduler$InnerCurrentThreadScheduler.enqueue(TrampolineScheduler.java:73)
    at rx.internal.schedulers.TrampolineScheduler$InnerCurrentThreadScheduler.schedule(TrampolineScheduler.java:52)
    at rx.internal.operators.OperatorRetryWithPredicate$SourceSubscriber.onNext(OperatorRetryWithPredicate.java:79)
    at rx.internal.operators.OperatorRetryWithPredicate$SourceSubscriber.onNext(OperatorRetryWithPredicate.java:45)
    at rx.internal.util.ScalarSynchronousObservable$WeakSingleProducer.request(ScalarSynchronousObservable.java:276)
    at rx.Subscriber.setProducer(Subscriber.java:209)
    at rx.internal.util.ScalarSynchronousObservable$JustOnSubscribe.call(ScalarSynchronousObservable.java:138)
    at rx.internal.util.ScalarSynchronousObservable$JustOnSubscribe.call(ScalarSynchronousObservable.java:129)
    at rx.internal.operators.OnSubscribeLift.call(OnSubscribeLift.java:48)
    at rx.internal.operators.OnSubscribeLift.call(OnSubscribeLift.java:30)
    at rx.internal.operators.OnSubscribeLift.call(OnSubscribeLift.java:48)
    at rx.internal.operators.OnSubscribeLift.call(OnSubscribeLift.java:30)
    at rx.internal.operators.OnSubscribeLift.call(OnSubscribeLift.java:48)
    at rx.internal.operators.OnSubscribeLift.call(OnSubscribeLift.java:30)
    at rx.Observable.subscribe(Observable.java:10423)
    at rx.Observable.subscribe(Observable.java:10390)
    at rx.observables.BlockingObservable.blockForSingle(BlockingObservable.java:443)
    at rx.observables.BlockingObservable.single(BlockingObservable.java:340)
    at com.netflix.client.AbstractLoadBalancerAwareClient.executeWithLoadBalancer(AbstractLoadBalancerAwareClient.java:112)
    at org.springframework.cloud.openfeign.ribbon.LoadBalancerFeignClient.execute(LoadBalancerFeignClient.java:84)
    at feign.SynchronousMethodHandler.executeAndDecode(SynchronousMethodHandler.java:119)
    ... 17 common frames omitted

定位步骤

进入容器
docker exec -it big-server /bin/bash

   安装工具

apt-get update
apt-get install net-tools

查询网络连接

netstat -n | awk '/^tcp/ {++S[$NF]} END {for(a in S) print a, S[a]}'

 

 由于应用未停止,所以一直增加,说明有连接堵塞

top命令查看java进程

 

通过jstack查询线程信息

# 查看服务进程ID
ps -ef |grep java项目名称 
# 查看进程的线程信息
jstack $PID 
# 线程进行归并到txt文件中
jstack $PID > jstack.txt

 

 

 

退出容器执行cp命令把刚才生成的文件带回来

分析文件,发现大量CompletableFuture$Signaller.block

NEW:未启动的。不会出现在Dump中。

RUNNABLE:在虚拟机内执行的。运行中状态,可能里面还能看到locked字样,表明它获得了某把锁。

BLOCKED:受阻塞并等待监视器锁。被某个锁(synchronizers)給block住了。

WATING:无限期等待另一个线程执行特定操作。等待某个condition或monitor发生,一般停留在park(), wait(), sleep(),join() 等语句里。

TIMED_WATING:有时限的等待另一个线程的特定操作。和WAITING的区别是wait() 等语句加上了时间限制 wait(timeout)。

TERMINATED:已退出的。

 

 定位到是异步拿取结果的时候死锁,改成超时时间限制10秒,问题解决

 

应该是多线程耗尽资源的问题,优化代码


一般引起这个问题都是资源池相关
检查数据库连接池、redis连接池、线程池等

参考
https://blog.csdn.net/zcjluse/article/details/125974518
https://www.cnblogs.com/fengyege/p/16936291.html
 
posted @ 2024-04-11 15:40  疯癫大圣  阅读(477)  评论(0编辑  收藏  举报