记一次http连接池连接泄漏排查过程

问题现象:

一个批处理发货系统,前几天商家反馈提交批处理文件后,一直处于处理中状态。

排查的过程中,商家的问题不断的报过来,发现一台机器上所有批处理请求全部失败。我们立刻隔离了有问题的机器开始开始排查原因。

排查过程:

查看处理中的任务日志里面没有任何异常报错,服务器的指标(cpu、内存、io等等)也是平时正常状态。

首先获取服务的线程堆栈信息,指令如下:

1 ps -ef|grep java 获取服务进程
2 jstack -l xxxxx >> trade.stack 获取进程xxxxx的堆栈到trade.stack文件

堆栈文件中发现并无任何死锁,并且cpu也是正常水平。

发现堆栈有大量线程处于java.lang.Thread.State: WAITING (parking)和java.lang.Thread.State: BLOCKED (on object monitor)。

其中WAITING的线程为如下:

 1 "ConsumerImplV2-ClientBusiness-trade-Pool-Thread-2" #261 prio=10 os_prio=0 tid=0x00007f5878003800 nid=0x2d8 waiting on condition [0x00007f572e512000]
 2    java.lang.Thread.State: WAITING (parking)
 3     at sun.misc.Unsafe.park(Native Method)
 4     - parking to wait for  <0x0000000742b2fc68> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
 5     at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
 6     at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
 7     at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
 8     at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074)
 9     at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
10     at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
11     at java.lang.Thread.run(Thread.java:748)
12 
13    Locked ownable synchronizers:
14     - None

其中BLOCKED有两种:

1 "ConsumerImplV2-ClientBusiness-trade-Pool-Thread-2" #1223 prio=10 os_prio=0 tid=0x00007f5870003000 nid=0x6f3 waiting for monitor entry [0x00007f5666f36000]
2    java.lang.Thread.State: BLOCKED (on object monitor)
3     at com.google.common.eventbus.Subscriber$SynchronizedSubscriber.invokeSubscriberMethod(Subscriber.java:153)
4     - waiting to lock <0x0000000742485c78> (a com.google.common.eventbus.Subscriber$SynchronizedSubscriber)
5     at com.google.common.eventbus.Subscriber$1.run(Subscriber.java:80)
6     at com.google.common.util.concurrent.MoreExecutors$DirectExecutor.execute(MoreExecutors.java:456)
7     at com.google.common.eventbus.Subscriber.dispatchEvent(Subscriber.java:76)
8     at com.google.common.eventbus.Dispatcher$PerThreadQueuedDispatcher.dispatch(Dispatcher.java:119)
9     at com.google.common.eventbus.EventBus.post(EventBus.java:215)
 1 "ConsumerImplV2-ClientBusiness-trade-Pool-Thread-12" #2225 prio=10 os_prio=0 tid=0x00007f5870011800 nid=0xb66 waiting on condition [0x00007f56625ee000]
 2    java.lang.Thread.State: WAITING (parking)
 3     at sun.misc.Unsafe.park(Native Method)
 4     - parking to wait for  <0x000000074305f6a0> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
 5     at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
 6     at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
 7     at org.apache.http.pool.AbstractConnPool.getPoolEntryBlocking(AbstractConnPool.java:380)
 8     at org.apache.http.pool.AbstractConnPool.access$200(AbstractConnPool.java:69)
 9     at org.apache.http.pool.AbstractConnPool$2.get(AbstractConnPool.java:246)
10     - locked <0x000000075675a7e0> (a org.apache.http.pool.AbstractConnPool$2)
11     at org.apache.http.pool.AbstractConnPool$2.get(AbstractConnPool.java:193)
12     at org.apache.http.impl.conn.PoolingHttpClientConnectionManager.leaseConnection(PoolingHttpClientConnectionManager.java:303)
13     at org.apache.http.impl.conn.PoolingHttpClientConnectionManager$1.get(PoolingHttpClientConnectionManager.java:279)
14     at org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:191)
15     at org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:185)
16     at org.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:89)
17     at org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:111)
18     at org.apache.http.impl.client.InternalHttpClient.doExecute$original$W51oFvtD(InternalHttpClient.java:185)
19     at org.apache.http.impl.client.InternalHttpClient.doExecute$original$W51oFvtD$accessor$t4YoGKDC(InternalHttpClient.java)

我们首先关注到了第一个BLOCKED,因为整个系统采用事件驱动的,使用了eventBus,在机器已经隔离的情况,还是有订阅冲突,可能业务上的事件驱动出现了闭环导致某几个任务一直在执行。

然而将整个业务进行梳理过后发现并没有闭环出现。

再看第二个BLOCKED有20个是获取http连接池的时候出现了等待,看下我们代码里的配置发现没有设置ConnectionRequestTimeout这个属性导致http连接池满的时候其他任务都阻塞了。但是为什么http连接池会一直处于满载的状态却不知道。

  static {
    RequestConfig config = RequestConfig.custom()
        .setConnectTimeout(2000)
        .setSocketTimeout(3000)
        .build();
    CLIENT = HttpClientBuilder.create()
        .setMaxConnTotal(512)
        .setMaxConnPerRoute(64)
        .setDefaultRequestConfig(config)
        .build();
  }

从机器上dump了整个堆栈信息,3.6G🤯大小,使用mat分析dump文件发现我们用的http连接池里面,pending一共20个是处于等待的那几个请求获取连接的线程,available是0。

 

就很疑惑🤔,为什么这些连接会持续这么久不释放。

看服务器上tcp连接状态:

1 # netstat -n | awk '/^tcp/ {++State[$NF]} END {for(i in State) print i, State[i]}'
2 TIME_WAIT 19
3 CLOSE_WAIT 65
4 ESTABLISHED 176

65个处于CLOSE_WAIT状态,和我们配置的MaxConnPerRoute基本吻合,那么说明这几个连接异常了。

看下TCP协议断开连接四次挥手的过程:

 

 其实就是连接异常了,但是没有关闭导致的。

再看看http访问的那段代码:其中有一段

  try {
      log.info("http request url = {},param = {}", url, JSON.toJSONString(get));
      CloseableHttpResponse response = CLIENT.execute(get);
      log.info("http response url = {},param = {}", url, JSON.toJSONString(response));
      if (HttpStatus.SC_OK == response.getStatusLine().getStatusCode()) {
         。。。
        return buffer.toByteArray();
      }
    } catch (SocketTimeoutException | SocketException e) {
      throw new BizException(BizErrorCodeEnum.BIZ_COMPENSATE);
    } catch (Throwable throwable) {
      log.error("[文件http 客户端请求返回为null] url: {},  e : {}", url, throwable);
    } finally {
      try {
        //关闭流
        EntityUtils.consume(entity);
      } catch (IOException e) {
        return null;
      }
    }
    return null;
  }

里面的处理是如果服务端返回的不是200,就直接返回了,finally里面关闭的流是entity,但是如果不是返回200,response里面的响应不会放到entity里面,也就是不会被清理。那么问题终于找到了,只要请求的文件服务器返回非200,连接就会一直卡在这里不会被释放。其实最早看的线程栈里面的WAITING的线程就是🤯。但是单纯的看waiting的线程里面的堆栈信息没有任何头绪。

解决的办法就比较简单了,利用语法糖,把response放到try里面就可以了。

 1  try (CloseableHttpResponse response = CLIENT.execute(post)) {
 2       if (HttpStatus.SC_OK == response.getStatusLine().getStatusCode()) {
 3         HttpEntity entity = response.getEntity();
 4         if (entity != null) {
 5           String result = EntityUtils.toString(entity);
 6           EntityUtils.consume(entity);
 7           return JSON.parseObject(result, access);
 8         }
 9       }
10     } catch (Exception e) {
11       log.info("http 请求异常,url={}, failmsg={}", url, e.getMessage());
12     }

------------------------------------------------------------------------------------------------------------------------------------------------

模拟复现的步骤也很简单:

客户端:

 1   public static void main(String[] args) {
 2 
 3     Runnable r = ()->{
 4       System.out.println("开始执行子线程请求任务");
 5       System.out.println("子线程执行结束,结果:" + HttpClientUtilsTest.multipartGet("http://127.0.0.1:9076/test"));
 6     };
 7 
 8     ExecutorService thread = Executors.newFixedThreadPool(1,new ThreadFactory() {
 9       int i = 0;
10       @Override
11       public Thread newThread(Runnable r) {
12         return new Thread(r, "TestThreadPool-" + (i++));
13       }
14     });
15     thread.execute(r);
16 
17     try {
18       Thread.sleep(1000);
19     } catch (InterruptedException e) {
20       e.printStackTrace();
21     }
22 
23     System.out.println("主线程请求执行开始");
24     System.out.println("主线程请求执行结束,结果" + HttpClientUtilsTest.multipartGet("http://127.0.0.1:9076/test"));
25   }

服务端:

1     @RequestMapping(value = "/test", method = RequestMethod.GET)
2     public String test() throws Exception {
3         try {
4             Thread.sleep(10000);
5         } catch (InterruptedException e) {
6             e.printStackTrace();
7         }
8         throw new Exception();
9     }

客户端执行结果:

开始执行子线程请求任务
主线程请求执行开始
子线程执行结束,结果:null

可以看到主线程的请求一直没有结果。

打印客户端调用服务的线程栈:

jps -l     
    
17650 org.jetbrains.jps.cmdline.Launcher
17652 com.avalon.academy.academy.design.AcademyDesignPatternApplication
70554 sun.tools.jps.Jps
70125 com.avalon.util.HttpClientUtilsTest
9342 
70126 org.jetbrains.jps.cmdline.Launcher

jstack -l 70125

结果:

  1 2019-12-26 14:04:37
  2 Full thread dump Java HotSpot(TM) 64-Bit Server VM (25.211-b12 mixed mode):
  3 
  4 "Attach Listener" #13 daemon prio=9 os_prio=31 tid=0x00007f85ddc1b800 nid=0xc07 waiting on condition [0x0000000000000000]
  5    java.lang.Thread.State: RUNNABLE
  6 
  7    Locked ownable synchronizers:
  8         - None
  9 
 10 "TestThreadPool-0" #12 prio=5 os_prio=31 tid=0x00007f85ddc28800 nid=0x3e03 waiting on condition [0x0000700004337000]
 11    java.lang.Thread.State: WAITING (parking)
 12         at sun.misc.Unsafe.park(Native Method)
 13         - parking to wait for  <0x0000000797da30c0> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
 14         at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
 15         at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
 16         at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
 17         at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074)
 18         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
 19         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
 20         at java.lang.Thread.run(Thread.java:748)
 21 
 22    Locked ownable synchronizers:
 23         - None
 24 
 25 "Service Thread" #11 daemon prio=9 os_prio=31 tid=0x00007f85dc16f000 nid=0x3903 runnable [0x0000000000000000]
 26    java.lang.Thread.State: RUNNABLE
 27 
 28    Locked ownable synchronizers:
 29         - None
 30 
 31 "C1 CompilerThread2" #10 daemon prio=9 os_prio=31 tid=0x00007f85dd255800 nid=0x4103 waiting on condition [0x0000000000000000]
 32    java.lang.Thread.State: RUNNABLE
 33 
 34    Locked ownable synchronizers:
 35         - None
 36 
 37 "C2 CompilerThread1" #9 daemon prio=9 os_prio=31 tid=0x00007f85dc16e000 nid=0x3703 waiting on condition [0x0000000000000000]
 38    java.lang.Thread.State: RUNNABLE
 39 
 40    Locked ownable synchronizers:
 41         - None
 42 
 43 "C2 CompilerThread0" #8 daemon prio=9 os_prio=31 tid=0x00007f85dcb1f000 nid=0x4403 waiting on condition [0x0000000000000000]
 44    java.lang.Thread.State: RUNNABLE
 45 
 46    Locked ownable synchronizers:
 47         - None
 48 
 49 "JDWP Command Reader" #7 daemon prio=10 os_prio=31 tid=0x00007f85dc027000 nid=0x4503 runnable [0x0000000000000000]
 50    java.lang.Thread.State: RUNNABLE
 51 
 52    Locked ownable synchronizers:
 53         - None
 54 
 55 "JDWP Event Helper Thread" #6 daemon prio=10 os_prio=31 tid=0x00007f85dd02a800 nid=0x4703 runnable [0x0000000000000000]
 56    java.lang.Thread.State: RUNNABLE
 57 
 58    Locked ownable synchronizers:
 59         - None
 60 
 61 "JDWP Transport Listener: dt_socket" #5 daemon prio=10 os_prio=31 tid=0x00007f85dd82c000 nid=0x4807 runnable [0x0000000000000000]
 62    java.lang.Thread.State: RUNNABLE
 63 
 64    Locked ownable synchronizers:
 65         - None
 66 
 67 "Signal Dispatcher" #4 daemon prio=9 os_prio=31 tid=0x00007f85dd845000 nid=0x3303 runnable [0x0000000000000000]
 68    java.lang.Thread.State: RUNNABLE
 69 
 70    Locked ownable synchronizers:
 71         - None
 72 
 73 "Finalizer" #3 daemon prio=8 os_prio=31 tid=0x00007f85dc80a000 nid=0x4e03 in Object.wait() [0x0000700003919000]
 74    java.lang.Thread.State: WAITING (on object monitor)
 75         at java.lang.Object.wait(Native Method)
 76         - waiting on <0x0000000797b00bc0> (a java.lang.ref.ReferenceQueue$Lock)
 77         at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:144)
 78         - locked <0x0000000797b00bc0> (a java.lang.ref.ReferenceQueue$Lock)
 79         at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:165)
 80         at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:216)
 81 
 82    Locked ownable synchronizers:
 83         - None
 84 
 85 "Reference Handler" #2 daemon prio=10 os_prio=31 tid=0x00007f85dc023000 nid=0x5003 in Object.wait() [0x0000700003816000]
 86    java.lang.Thread.State: WAITING (on object monitor)
 87         at java.lang.Object.wait(Native Method)
 88         - waiting on <0x0000000797b1a678> (a java.lang.ref.Reference$Lock)
 89         at java.lang.Object.wait(Object.java:502)
 90         at java.lang.ref.Reference.tryHandlePending(Reference.java:191)
 91         - locked <0x0000000797b1a678> (a java.lang.ref.Reference$Lock)
 92         at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:153)
 93 
 94    Locked ownable synchronizers:
 95         - None
 96 
 97 "main" #1 prio=5 os_prio=31 tid=0x00007f85dc00b800 nid=0x1903 waiting on condition [0x0000700003203000]
 98    java.lang.Thread.State: WAITING (parking)
 99         at sun.misc.Unsafe.park(Native Method)
100         - parking to wait for  <0x0000000797b1f480> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
101         at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
102         at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
103         at org.apache.http.pool.AbstractConnPool.getPoolEntryBlocking(AbstractConnPool.java:380)
104         at org.apache.http.pool.AbstractConnPool.access$200(AbstractConnPool.java:69)
105         at org.apache.http.pool.AbstractConnPool$2.get(AbstractConnPool.java:246)
106         - locked <0x00000007968cf630> (a org.apache.http.pool.AbstractConnPool$2)
107         at org.apache.http.pool.AbstractConnPool$2.get(AbstractConnPool.java:193)
108         at org.apache.http.impl.conn.PoolingHttpClientConnectionManager.leaseConnection(PoolingHttpClientConnectionManager.java:303)
109         at org.apache.http.impl.conn.PoolingHttpClientConnectionManager$1.get(PoolingHttpClientConnectionManager.java:279)
110         at org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:191)
111         at org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:185)
112         at org.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:89)
113         at org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:111)
114         at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:185)
115         at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:83)
116         at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:108)
117         at com.avalon.util.HttpClientUtilsTest.multipartGet(HttpClientUtilsTest.java:179)
118         at com.avalon.util.HttpClientUtilsTest.main(HttpClientUtilsTest.java:293)
119 
120    Locked ownable synchronizers:
121         - None
122 
123 "VM Thread" os_prio=31 tid=0x00007f85dc01c800 nid=0x2d03 runnable 
124 
125 "GC task thread#0 (ParallelGC)" os_prio=31 tid=0x00007f85dd003800 nid=0x1f07 runnable 
126 
127 "GC task thread#1 (ParallelGC)" os_prio=31 tid=0x00007f85dd004800 nid=0x2a03 runnable 
128 
129 "GC task thread#2 (ParallelGC)" os_prio=31 tid=0x00007f85dd005000 nid=0x5303 runnable 
130 
131 "GC task thread#3 (ParallelGC)" os_prio=31 tid=0x00007f85dd005800 nid=0x5203 runnable 
132 
133 "VM Periodic Task Thread" os_prio=31 tid=0x00007f85dc16f800 nid=0x3f03 waiting on condition 
134 
135 JNI global references: 5906
View Code

可以看到主线程tid=0x00007f85dc00b800是被卡在了从连接池获取连接的地方,TestThreadPool-0  tid=0x00007f85ddc28800这个自线程处于一个WAITING的状态,在等待连接释放。完美的复现了我们的问题。

posted @ 2019-12-26 11:57  Over_Watch  阅读(3669)  评论(0编辑  收藏  举报