OKHttp请求超时无效问题记录(自动重试)

参考:https://www.jianshu.com/p/3ef261ab157c

参考:https://www.jianshu.com/p/89033630ab7a

发现问题

在项目开发中发现,发起网络请求是会一直显示Loading。但是我们在okhttp初始化的时候已经设置的网络请求超时时间为30s。为什么会出现这种情况 WTF!最后发现原来是OKHttp的重试机制挖的坑

OKHttp重试机制剖析

OKHttp拥有网络连接失败时的重试功能:

OkHttp perseveres when the network is troublesome: it will silently recover from common connection problems. If your service has multiple IP addresses OkHttp will attempt alternate addresses if the first connect fails. This is necessary for IPv4+IPv6 and for services hosted in redundant data centers. OkHttp initiates new connections with modern TLS features (SNI, ALPN), and falls back to TLS 1.0 if the handshake fails.
要了解OKHttp的重试机制,我们最关心的就是RetryAndFollowUpInterceptor, 在遭遇网络异常时,OKHttp的网络异常相关的重试都在RetryAndFollowUpInterceptor完成。具体我们先从RetryAndFollowUpInterceptor的#intercept(Chain chian)方法开始入手
复制代码
 1  public Response intercept(Chain chain) throws IOException {
 2         Request request = chain.request();
 3         this.streamAllocation = new StreamAllocation(this.client.connectionPool(), this.createAddress(request.url()));
 4         int followUpCount = 0;
 5         Response priorResponse = null;
 6        //while循环
 7         while(!this.canceled) {
 8             Response response = null;
 9             boolean releaseConnection = true;
10 
11             try {
12                 response = ((RealInterceptorChain)chain).proceed(request, this.streamAllocation, (HttpStream)null, (Connection)null);
13                 releaseConnection = false;
14             } catch (RouteException var12) {
15                 if(!this.recover(var12.getLastConnectException(), true, request)) {
16                     throw var12.getLastConnectException();
17                 }
18 
19                 releaseConnection = false;
20                 continue;
21             } catch (IOException var13) {
22                 if(!this.recover(var13, false, request)) {
23                     throw var13;
24                 }
25 
26                 releaseConnection = false;
27                 continue;
28             } finally {
29                 if(releaseConnection) {
30                     this.streamAllocation.streamFailed((IOException)null);
31                     this.streamAllocation.release();
32                 }
33 
34             }
35 
36             if(priorResponse != null) {
37                 response = response.newBuilder().priorResponse(priorResponse.newBuilder().body((ResponseBody)null).build()).build();
38             }
39 
40             Request followUp = this.followUpRequest(response);
41             if(followUp == null) {
42                 if(!this.forWebSocket) {
43                     this.streamAllocation.release();
44                 }
45 
46                 return response;
47             }
48 
49             Util.closeQuietly(response.body());
50             ++followUpCount;
51             if(followUpCount > 20) {
52                 this.streamAllocation.release();
53                 throw new ProtocolException("Too many follow-up requests: " + followUpCount);
54             }
55 
56             if(followUp.body() instanceof UnrepeatableRequestBody) {
57                 throw new HttpRetryException("Cannot retry streamed HTTP body", response.code());
58             }
59 
60             if(!this.sameConnection(response, followUp.url())) {
61                 this.streamAllocation.release();
62                 this.streamAllocation = new StreamAllocation(this.client.connectionPool(), this.createAddress(followUp.url()));
63             } else if(this.streamAllocation.stream() != null) {
64                 throw new IllegalStateException("Closing the body of " + response + " didn\'t close its backing stream. Bad interceptor?");
65             }
66 
67             request = followUp;
68             priorResponse = response;
69         }
70 
71         this.streamAllocation.release();
72         throw new IOException("Canceled");
73     }
复制代码

去掉代码片段中的非核心逻辑:

复制代码
 1   //StreamAllocation init...
 2   Response priorResponse = null;
 3     while (true) {
 4       if (canceled) {
 5         streamAllocation.release();
 6         throw new IOException("Canceled");
 7       }
 8 
 9       Response response;
10       boolean releaseConnection = true;
11       try {
12         response = realChain.proceed(request, streamAllocation, null, null);
13         releaseConnection = false;
14       } catch (RouteException e) {
15         //socket连接阶段,如果发生连接失败,会统一封装成该异常并抛出
16         `RouteException`:通过路由的尝试失败了,请求将不会被发送,此时会尝试通过调用`#recover`来恢复;
17         // The attempt to connect via a route failed. The request will not have been sent.
18         if (!recover(e.getLastConnectException(), false, request)) {
19           throw e.getLastConnectException();
20         }
21         releaseConnection = false;
22         continue;
23       } catch (IOException e) {
24         //socket连接成功后,发生请求阶段时抛出的各类网络异常
25         // An attempt to communicate with a server failed. The request may have been sent.
26         boolean requestSendStarted = !(e instanceof ConnectionShutdownException);
27         if (!recover(e, requestSendStarted, request)) throw e;
28         releaseConnection = false;
29         continue;
30       } finally {
31         // We're throwing an unchecked exception. Release any resources.
32         if (releaseConnection) {
33           streamAllocation.streamFailed(null);
34           streamAllocation.release();
35         }
36       }
复制代码

原来一直在执行while循环,Okhttp在网络请示出现错误时会重新发送请求,最终会不断执行

1  catch (IOException var13) {
2                 if(!this.recover(var13, false, request)) {
3                     throw var13;
4                 }
5 
6                 releaseConnection = false;
7                 continue;
8 } 

接下来看核心的recover方法:

复制代码
 1 /**
 2    * Report and attempt to recover from a failure to communicate with a server. Returns true if
 3    * {@code e} is recoverable, or false if the failure is permanent. Requests with a body can only
 4    * be recovered if the body is buffered or if the failure occurred before the request has been
 5    * sent.
 6    */
 7   private boolean recover(IOException e, boolean requestSendStarted, Request userRequest) {
 8     streamAllocation.streamFailed(e);
 9 
10     // The application layer has forbidden retries. 应用层禁止重试则不再重试
11     if (!client.retryOnConnectionFailure()) return false;
12 
13     // We can't send the request body again. 如果请求已经发出,并且请求的body不支持重试则不再重试
14     if (requestSendStarted && userRequest.body() instanceof UnrepeatableRequestBody) return false;
15 
16     // This exception is fatal. //致命错误
17     if (!isRecoverable(e, requestSendStarted)) return false;
18 
19     // No more routes to attempt. 没有更多route发起重试
20     if (!streamAllocation.hasMoreRoutes()) return false;
21 
22     // For failure recovery, use the same route selector with a new connection.
23     return true;
24   }
复制代码
在该方法中,首先是通过调用streamAllocation.streamFailed(e)来记录该次异常,进而在RouteDatabase中记录错误的route以降低优先级,避免下次相同address的请求依然使用这个失败过的route。如果没有更多可用的连接线路则不能重试连接。
复制代码
 1 public final class RouteDatabase {
 2   private final Set<Route> failedRoutes = new LinkedHashSet<>();
 3 
 4   /** Records a failure connecting to {@code failedRoute}. */
 5   public synchronized void failed(Route failedRoute) {
 6     failedRoutes.add(failedRoute);
 7   }
 8 
 9   /** Records success connecting to {@code route}. */
10   public synchronized void connected(Route route) {
11     failedRoutes.remove(route);
12   }
13 
14   /** Returns true if {@code route} has failed recently and should be avoided. */
15   public synchronized boolean shouldPostpone(Route route) {
16     return failedRoutes.contains(route);
17   }
18 }
复制代码

接着我们重点再关注isRecoverable方法:

复制代码
 1   private boolean isRecoverable(IOException e, boolean requestSendStarted) {
 2     // If there was a protocol problem, don't recover.  协议错误不再重试
 3     if (e instanceof ProtocolException) {
 4       return false;
 5     }
 6 
 7     // If there was an interruption don't recover, but if there was a timeout connecting to a route
 8     // we should try the next route (if there is one)
 9     if (e instanceof InterruptedIOException) {
10       return e instanceof SocketTimeoutException && !requestSendStarted;
11     }
12 
13     // Look for known client-side or negotiation errors that are unlikely to be fixed by trying
14     // again with a different route.
15     if (e instanceof SSLHandshakeException) {
16       // If the problem was a CertificateException from the X509TrustManager,
17       // do not retry.
18       if (e.getCause() instanceof CertificateException) {
19         return false;
20       }
21     }
22 //使用 HostnameVerifier 来验证 host 是否合法,如果不合法会抛出 SSLPeerUnverifiedException
23  // 握手HandShake#getSeesion 抛出的异常,属于握手过程中的一环
24     if (e instanceof SSLPeerUnverifiedException) {
25       // e.g. a certificate pinning error.
26       return false;
27     }
28 
29     // An example of one we might want to retry with a different route is a problem connecting to a
30     // proxy and would manifest as a standard IOException. Unless it is one we know we should not
31     // retry, we return true and try a new route.
32     return true;
33   }
复制代码

问题解决

可以关闭okhttp的重试,让retryOnConnectionFailure返回false就好了:

1 sClient = builder.retryOnConnectionFailure(false).build();

更新

该问题 在3.4.2版本已处理
https://github.com/square/okhttp/issues/2756

 

常见网络异常分析:

UnknowHostException

产生原因:
  • 网络中断
  • DNS 服务器故障
  • 域名解析劫持
解决办法:
  • HttpDNS
  • 合理的兜底策略

![Uploading image_079055.png . . .]

InterruptedIOException

产生原因:
  • 请求读写阶段,请求线程被中断
解决办法:
  • 检查是否符合业务逻辑

SocketTimeoutException

产生原因:
  • 带宽低、延迟高
  • 路径拥堵、服务端负载吃紧
  • 路由节点临时异常
解决办法:
  • 合理设置重试
  • 切换ip重试

要特别注意: 请求时因为读写超时等原因产生的SocketTimeoutException,OkHttp内部是不会重试的

 

 

因此如果app层特别关心该异常,则应该自定义intercetors,对该异常进行特殊处理。

SSLHandshakeException

产生原因:
  • Tls协议协商失败/握手格式不兼容
  • 办法服务器证书的CA未知
  • 服务器证书不是由CA签名的,而是自签名
  • 服务器配置缺少中间CA(不完整的证书链)
  • 服务器主机名不匹配(SNI);
  • 遭遇了中间人攻击。
解决办法:
  • 指定SNI
  • 证书锁定
  • 降级Http。。。
  • 联系SA

SSLPeerUnverifiedException

产生原因:
  • 证书域名校验错误
解决办法:
  • 指定SNI
  • 证书锁定
  • 降级Http。。。
  • 联系SA

 

posted @   Boblim  阅读(7471)  评论(0编辑  收藏  举报
相关博文:
阅读排行:
· winform 绘制太阳,地球,月球 运作规律
· 震惊!C++程序真的从main开始吗?99%的程序员都答错了
· 【硬核科普】Trae如何「偷看」你的代码?零基础破解AI编程运行原理
· 超详细:普通电脑也行Windows部署deepseek R1训练数据并当服务器共享给他人
· 上周热点回顾(3.3-3.9)
点击右上角即可分享
微信分享提示