OKHttp请求超时无效问题记录(自动重试)
参考:https://www.jianshu.com/p/3ef261ab157c
参考:https://www.jianshu.com/p/89033630ab7a
发现问题
在项目开发中发现,发起网络请求是会一直显示Loading。但是我们在okhttp初始化的时候已经设置的网络请求超时时间为30s。为什么会出现这种情况 WTF!最后发现原来是OKHttp的重试机制挖的坑
OKHttp重试机制剖析
OKHttp拥有网络连接失败时的重试功能:
OkHttp perseveres when the network is troublesome: it will silently recover from common connection problems. If your service has multiple IP addresses OkHttp will attempt alternate addresses if the first connect fails. This is necessary for IPv4+IPv6 and for services hosted in redundant data centers. OkHttp initiates new connections with modern TLS features (SNI, ALPN), and falls back to TLS 1.0 if the handshake fails.
要了解OKHttp的重试机制,我们最关心的就是
RetryAndFollowUpInterceptor
, 在遭遇网络异常时,OKHttp的网络异常相关的重试都在RetryAndFollowUpInterceptor
完成。具体我们先从RetryAndFollowUpInterceptor
的#intercept(Chain chian)
方法开始入手1 public Response intercept(Chain chain) throws IOException { 2 Request request = chain.request(); 3 this.streamAllocation = new StreamAllocation(this.client.connectionPool(), this.createAddress(request.url())); 4 int followUpCount = 0; 5 Response priorResponse = null; 6 //while循环 7 while(!this.canceled) { 8 Response response = null; 9 boolean releaseConnection = true; 10 11 try { 12 response = ((RealInterceptorChain)chain).proceed(request, this.streamAllocation, (HttpStream)null, (Connection)null); 13 releaseConnection = false; 14 } catch (RouteException var12) { 15 if(!this.recover(var12.getLastConnectException(), true, request)) { 16 throw var12.getLastConnectException(); 17 } 18 19 releaseConnection = false; 20 continue; 21 } catch (IOException var13) { 22 if(!this.recover(var13, false, request)) { 23 throw var13; 24 } 25 26 releaseConnection = false; 27 continue; 28 } finally { 29 if(releaseConnection) { 30 this.streamAllocation.streamFailed((IOException)null); 31 this.streamAllocation.release(); 32 } 33 34 } 35 36 if(priorResponse != null) { 37 response = response.newBuilder().priorResponse(priorResponse.newBuilder().body((ResponseBody)null).build()).build(); 38 } 39 40 Request followUp = this.followUpRequest(response); 41 if(followUp == null) { 42 if(!this.forWebSocket) { 43 this.streamAllocation.release(); 44 } 45 46 return response; 47 } 48 49 Util.closeQuietly(response.body()); 50 ++followUpCount; 51 if(followUpCount > 20) { 52 this.streamAllocation.release(); 53 throw new ProtocolException("Too many follow-up requests: " + followUpCount); 54 } 55 56 if(followUp.body() instanceof UnrepeatableRequestBody) { 57 throw new HttpRetryException("Cannot retry streamed HTTP body", response.code()); 58 } 59 60 if(!this.sameConnection(response, followUp.url())) { 61 this.streamAllocation.release(); 62 this.streamAllocation = new StreamAllocation(this.client.connectionPool(), this.createAddress(followUp.url())); 63 } else if(this.streamAllocation.stream() != null) { 64 throw new IllegalStateException("Closing the body of " + response + " didn\'t close its backing stream. Bad interceptor?"); 65 } 66 67 request = followUp; 68 priorResponse = response; 69 } 70 71 this.streamAllocation.release(); 72 throw new IOException("Canceled"); 73 }
去掉代码片段中的非核心逻辑:
1 //StreamAllocation init... 2 Response priorResponse = null; 3 while (true) { 4 if (canceled) { 5 streamAllocation.release(); 6 throw new IOException("Canceled"); 7 } 8 9 Response response; 10 boolean releaseConnection = true; 11 try { 12 response = realChain.proceed(request, streamAllocation, null, null); 13 releaseConnection = false; 14 } catch (RouteException e) { 15 //socket连接阶段,如果发生连接失败,会统一封装成该异常并抛出 16 `RouteException`:通过路由的尝试失败了,请求将不会被发送,此时会尝试通过调用`#recover`来恢复; 17 // The attempt to connect via a route failed. The request will not have been sent. 18 if (!recover(e.getLastConnectException(), false, request)) { 19 throw e.getLastConnectException(); 20 } 21 releaseConnection = false; 22 continue; 23 } catch (IOException e) { 24 //socket连接成功后,发生请求阶段时抛出的各类网络异常 25 // An attempt to communicate with a server failed. The request may have been sent. 26 boolean requestSendStarted = !(e instanceof ConnectionShutdownException); 27 if (!recover(e, requestSendStarted, request)) throw e; 28 releaseConnection = false; 29 continue; 30 } finally { 31 // We're throwing an unchecked exception. Release any resources. 32 if (releaseConnection) { 33 streamAllocation.streamFailed(null); 34 streamAllocation.release(); 35 } 36 }
原来一直在执行while循环,Okhttp在网络请示出现错误时会重新发送请求,最终会不断执行
1 catch (IOException var13) { 2 if(!this.recover(var13, false, request)) { 3 throw var13; 4 } 5 6 releaseConnection = false; 7 continue; 8 }
接下来看核心的recover方法:
1 /** 2 * Report and attempt to recover from a failure to communicate with a server. Returns true if 3 * {@code e} is recoverable, or false if the failure is permanent. Requests with a body can only 4 * be recovered if the body is buffered or if the failure occurred before the request has been 5 * sent. 6 */ 7 private boolean recover(IOException e, boolean requestSendStarted, Request userRequest) { 8 streamAllocation.streamFailed(e); 9 10 // The application layer has forbidden retries. 应用层禁止重试则不再重试 11 if (!client.retryOnConnectionFailure()) return false; 12 13 // We can't send the request body again. 如果请求已经发出,并且请求的body不支持重试则不再重试 14 if (requestSendStarted && userRequest.body() instanceof UnrepeatableRequestBody) return false; 15 16 // This exception is fatal. //致命错误 17 if (!isRecoverable(e, requestSendStarted)) return false; 18 19 // No more routes to attempt. 没有更多route发起重试 20 if (!streamAllocation.hasMoreRoutes()) return false; 21 22 // For failure recovery, use the same route selector with a new connection. 23 return true; 24 }
在该方法中,首先是通过调用
streamAllocation.streamFailed(e)
来记录该次异常,进而在RouteDatabase
中记录错误的route以降低优先级,避免下次相同address的请求依然使用这个失败过的route。如果没有更多可用的连接线路则不能重试连接。1 public final class RouteDatabase { 2 private final Set<Route> failedRoutes = new LinkedHashSet<>(); 3 4 /** Records a failure connecting to {@code failedRoute}. */ 5 public synchronized void failed(Route failedRoute) { 6 failedRoutes.add(failedRoute); 7 } 8 9 /** Records success connecting to {@code route}. */ 10 public synchronized void connected(Route route) { 11 failedRoutes.remove(route); 12 } 13 14 /** Returns true if {@code route} has failed recently and should be avoided. */ 15 public synchronized boolean shouldPostpone(Route route) { 16 return failedRoutes.contains(route); 17 } 18 }
接着我们重点再关注isRecoverable
方法:
1 private boolean isRecoverable(IOException e, boolean requestSendStarted) { 2 // If there was a protocol problem, don't recover. 协议错误不再重试 3 if (e instanceof ProtocolException) { 4 return false; 5 } 6 7 // If there was an interruption don't recover, but if there was a timeout connecting to a route 8 // we should try the next route (if there is one) 9 if (e instanceof InterruptedIOException) { 10 return e instanceof SocketTimeoutException && !requestSendStarted; 11 } 12 13 // Look for known client-side or negotiation errors that are unlikely to be fixed by trying 14 // again with a different route. 15 if (e instanceof SSLHandshakeException) { 16 // If the problem was a CertificateException from the X509TrustManager, 17 // do not retry. 18 if (e.getCause() instanceof CertificateException) { 19 return false; 20 } 21 } 22 //使用 HostnameVerifier 来验证 host 是否合法,如果不合法会抛出 SSLPeerUnverifiedException 23 // 握手HandShake#getSeesion 抛出的异常,属于握手过程中的一环 24 if (e instanceof SSLPeerUnverifiedException) { 25 // e.g. a certificate pinning error. 26 return false; 27 } 28 29 // An example of one we might want to retry with a different route is a problem connecting to a 30 // proxy and would manifest as a standard IOException. Unless it is one we know we should not 31 // retry, we return true and try a new route. 32 return true; 33 }
问题解决
可以关闭okhttp的重试,让retryOnConnectionFailure返回false就好了:
1 sClient = builder.retryOnConnectionFailure(false).build();
更新
该问题 在3.4.2版本已处理
https://github.com/square/okhttp/issues/2756
常见网络异常分析:
UnknowHostException
产生原因:
- 网络中断
- DNS 服务器故障
- 域名解析劫持
解决办法:
- HttpDNS
- 合理的兜底策略
![Uploading image_079055.png . . .]
InterruptedIOException
产生原因:
- 请求读写阶段,请求线程被中断
解决办法:
- 检查是否符合业务逻辑
SocketTimeoutException
产生原因:
- 带宽低、延迟高
- 路径拥堵、服务端负载吃紧
- 路由节点临时异常
解决办法:
- 合理设置重试
- 切换ip重试
要特别注意: 请求时因为读写超时等原因产生的SocketTimeoutException,OkHttp内部是不会重试的
因此如果app层特别关心该异常,则应该自定义intercetors,对该异常进行特殊处理。
SSLHandshakeException
产生原因:
- Tls协议协商失败/握手格式不兼容
- 办法服务器证书的CA未知
- 服务器证书不是由CA签名的,而是自签名
- 服务器配置缺少中间CA(不完整的证书链)
- 服务器主机名不匹配(SNI);
- 遭遇了中间人攻击。
解决办法:
- 指定SNI
- 证书锁定
- 降级Http。。。
- 联系SA
SSLPeerUnverifiedException
产生原因:
- 证书域名校验错误
解决办法:
- 指定SNI
- 证书锁定
- 降级Http。。。
- 联系SA
【推荐】国内首个AI IDE,深度理解中文开发场景,立即下载体验Trae
【推荐】编程新体验,更懂你的AI,立即体验豆包MarsCode编程助手
【推荐】抖音旗下AI助手豆包,你的智能百科全书,全免费不限次数
【推荐】轻量又高性能的 SSH 工具 IShell:AI 加持,快人一步
· winform 绘制太阳,地球,月球 运作规律
· 震惊!C++程序真的从main开始吗?99%的程序员都答错了
· 【硬核科普】Trae如何「偷看」你的代码?零基础破解AI编程运行原理
· 超详细:普通电脑也行Windows部署deepseek R1训练数据并当服务器共享给他人
· 上周热点回顾(3.3-3.9)