项目遇到的问题

重现

  1. 线下自助收银机的接口不稳定,导致下线请求线上获取收银机背景图信息的接口不稳定老是超时。
  2. 一开始业务先是进行在群里通知反馈这个问题
  3. 通过排查该接口的服务发现并没有问题,由于skywalking 的tid 收集出现了Ingore_Trace,没法通过链路进行追踪,并且该服务对外的接口由middleware服务进行转发,这个服务没有做skywalking
  4. 当时就对该接口加了打印日志记录请求信息和处理时间,发线上看情况,发完后发现相应时间正常。
  5. 经过线下系统超时日志判断出了时间点,然后对比middleware服务和接口服务进行分析,得出来了请求进入了middleware,但是接口服务就没有打印日志,就怀疑是middleware服务出现了问题,middleware服务的此时的特点是进行http请求接口服务只有请求参数的日志,没有相应结果的日志,所以就怀疑是线程被阻塞了
  6. 线上对该代码部署了四个服务器,让运维通过jstack -l pid >> thread.txt 命令 分别把四台机器的线程情况进行了导出
  7. 发给我后四个文件中有一个特别明显的文件达到了6.2M,然后就直接对该文件分析,把数据放到在线网站上分析 http://spotify.github.io/threaddump-analyzer/#thread-0x00007f130c0a3000
  8. 发现了俩特殊的阻塞地方,就是下面的代码段


"http-apr-8084-exec-257" #288 daemon prio=5 os_prio=0 tid=0x00007f130c3f0000 nid=0x7bca in Object.wait() [0x00007f1276ae8000]
   java.lang.Thread.State: WAITING (on object monitor)
	at java.lang.Object.wait(Native Method)
	at java.lang.Object.wait(Object.java:502)
	at java.net.InetAddress.checkLookupTable(InetAddress.java:1393)
	- locked <0x00000000800a06c0> (a java.util.HashMap)
	at java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1310)


"http-apr-8084-exec-1" #22 daemon prio=5 os_prio=0 tid=0x00007f130c004000 nid=0x450d in Object.wait() [0x00007f137aafc000]
   java.lang.Thread.State: WAITING (on object monitor)
	at java.lang.Object.wait(Native Method)
	at java.lang.Object.wait(Object.java:502)
	at java.net.InetAddress.checkLookupTable(InetAddress.java:1393)
	- locked <0x00000000800a06c0> (a java.util.HashMap)
	at java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1310)
	at java.net.InetAddress.getAllByName0(InetAddress.java:1276)
	at java.net.InetAddress.getAllByName(InetAddress.java:1192)
	at java.net.InetAddress.getAllByName(InetAddress.java:1126)
	at org.apache.http.impl.conn.SystemDefaultDnsResolver.resolve(SystemDefaultDnsResolver.java:45)
	at org.apache.http.impl.conn.DefaultHttpClientConnectionOperator.connect(DefaultHttpClientConnectionOperator.java:112)
	at org.apache.http.impl.conn.PoolingHttpClientConnectionManager.connect(PoolingHttpClientConnectionManager.java:373)
	at org.apache.http.impl.execchain.MainClientExec.establishRoute(MainClientExec.java:394)
	at org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:237)
	at org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:185)
	at org.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:89)
	at org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:110)
	at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:185)
	at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:72)
	at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:221)
	at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:165)
	at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:140)
	at com.diwayou.utils.http.impl.HttpClientTemplate.executePost(HttpClientTemplate.java:256)
	at com.diwayou.utils.http.impl.HttpClientTemplate.executePost(HttpClientTemplate.java:162)





"http-apr-8084-exec-737" #768 daemon prio=5 os_prio=0 tid=0x00007f130cb84000 nid=0x29fa waiting on condition [0x00007f12588c6000]
   java.lang.Thread.State: WAITING (parking)
	at sun.misc.Unsafe.park(Native Method)
	- parking to wait for  <0x000000008c70e428> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
	at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
	at org.apache.http.pool.PoolEntryFuture.await(PoolEntryFuture.java:139)
	at org.apache.http.pool.AbstractConnPool.getPoolEntryBlocking(AbstractConnPool.java:307)
	at org.apache.http.pool.AbstractConnPool.access$000(AbstractConnPool.java:65)
	at org.apache.http.pool.AbstractConnPool$2.getPoolEntry(AbstractConnPool.java:193)
	at org.apache.http.pool.AbstractConnPool$2.getPoolEntry(AbstractConnPool.java:186)
	at org.apache.http.pool.PoolEntryFuture.get(PoolEntryFuture.java:108)
	at org.apache.http.impl.conn.PoolingHttpClientConnectionManager.leaseConnection(PoolingHttpClientConnectionManager.java:303)
	at org.apache.http.impl.conn.PoolingHttpClientConnectionManager$1.get(PoolingHttpClientConnectionManager.java:279)
	at org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:191)
	at org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:185)
	at org.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:89)
	at org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:110)
	at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:185)
	at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:72)
	at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:221)
	at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:165)
	at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:140)
	at com.diwayou.utils.http.impl.HttpClientTemplate.executePost(HttpClientTemplate.java:256)
	at com.diwayou.utils.http.impl.HttpClientTemplate.executePost(HttpClientTemplate.java:162)
  1. 查询一些网络资料发现是 dns解析超时引发的线程池故障 导致了这个问题

参考文章:
dns解析超时引发的线程池故障

由一次线上故障来理解下 TCP 三握、四挥 & Java 堆栈分析到源码的探秘

posted @ 2022-09-20 23:29  liuhuayiye  阅读(64)  评论(0编辑  收藏  举报