MONGODB02 - MongoSocketWriteException 异常会迟到,但从不缺席

接上一个《MONGODB01 - Prematurely reached end of stream 错误定位及修复》处理完成之后,又报错了,场景也是一段时间不访问MongoDB,突然访问的时候报此异常

 org.springframework.data.mongodb.UncategorizedMongoDbException: Exception sending message; nested exception is com.mongodb.MongoSocketWriteException: Exception sending message
	at org.springframework.data.mongodb.core.MongoExceptionTranslator.translateExceptionIfPossible(MongoExceptionTranslator.java:132)
	at org.springframework.data.mongodb.core.MongoTemplate.potentiallyConvertRuntimeException(MongoTemplate.java:2607)
	at org.springframework.data.mongodb.core.MongoTemplate.executeFindMultiInternal(MongoTemplate.java:2474)
	at org.springframework.data.mongodb.core.MongoTemplate.doFind(MongoTemplate.java:2282)
	at org.springframework.data.mongodb.core.ExecutableFindOperationSupport$ExecutableFindSupport.doFind(ExecutableFindOperationSupport.java:213)
	at org.springframework.data.mongodb.core.ExecutableFindOperationSupport$ExecutableFindSupport.all(ExecutableFindOperationSupport.java:169)
	at org.springframework.data.mongodb.repository.query.AbstractMongoQuery.lambda$getExecution$1(AbstractMongoQuery.java:113)
	at org.springframework.data.mongodb.repository.query.AbstractMongoQuery.execute(AbstractMongoQuery.java:97)
	at org.springframework.data.repository.core.support.RepositoryFactorySupport$QueryExecutorMethodInterceptor.doInvoke(RepositoryFactorySupport.java:602)
	at org.springframework.data.repository.core.support.RepositoryFactorySupport$QueryExecutorMethodInterceptor.invoke(RepositoryFactorySupport.java:590)
	at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:185)
	at org.springframework.data.projection.DefaultMethodInvokingMethodInterceptor.invoke(DefaultMethodInvokingMethodInterceptor.java:59)
	at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:185)
	at org.springframework.aop.interceptor.ExposeInvocationInterceptor.invoke(ExposeInvocationInterceptor.java:92)
	at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:185)
	at org.springframework.data.repository.core.support.SurroundingTransactionDetectorMethodInterceptor.invoke(SurroundingTransactionDetectorMethodInterceptor.java:61)
	at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:185)
	at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:212)
	at com.sun.proxy.$Proxy185.findAllByPipelineId(Unknown Source)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)

原因分析

经搜索,大致原因应当是连接闲置一段时间,由于防火墙或者负载均衡的原因,导致连接被关闭,而客户端并不知道,当客户端继续使用这个关闭的连接进行读写时就会出错。

问题解决

方式一:写一个配置类设置keepalive为true

顺便每个host设置最少一个连接

@Configuration
public class MongoCongig {
 
    @Bean
    public MongoClientOptions mongoOptions() {
        return MongoClientOptions.builder().maxConnectionIdleTime(3600000).socketKeepAlive(true).minConnectionsPerHost(1).build();
    }
}

方式二:引入spring-boot-starter-mongodb-plus

POM文件

<dependency>
    <groupId>com.spring4all</groupId>
    <artifactId>mongodb-plus-spring-boot-starter</artifactId>
    <version>1.0.0.RELEASE</version>
</dependency>

配置

spring:
  data:
    mongodb:
      option:
        max-connection-idle-time: 3600000 #空闲一个小时清理一下
        socket-keep-alive: true
        min-connection-per-host: 1

问题貌似解决,代码提交,部署测试,问题没有复现,好像问题的修复有个完美的结局.

BUT

但是“BUT”来了,在写MongoClientOptions的socketKeepAlive方法时看到此方法已经被@deprecated修饰,源码如下:

/**
 * Sets whether socket keep-alive is enabled.
 *
 * @param socketKeepAlive keep-alive
 * @return {@code this}
 * @deprecated configuring keep-alive has been deprecated. It now defaults to true and disabling it is not recommended.
 * @see <a href="https://docs.mongodb.com/manual/faq/diagnostics/#does-tcp-keepalive-time-affect-mongodb-deployments">
 *     Does TCP keep-alive time affect MongoDB Deployments?</a>
 */
@Deprecated
public Builder socketKeepAlive(final boolean socketKeepAlive) {
    this.socketKeepAlive = socketKeepAlive;
    return this;
}

官方原文:传送门

Does TCP keepalive time affect MongoDB Deployments?
If you experience network timeouts or socket errors in communication between clients and servers, or between members of a sharded cluster or replica set, check the TCP keepalive value for the affected systems.
Many operating systems set this value to 7200 seconds (two hours) by default. For MongoDB, you will generally experience better results with a shorter keepalive value, on the order of 120 seconds (two minutes).
If your MongoDB deployment experiences keepalive-related issues, you must alter the keepalive value on all affected systems. This includes all machines running mongod or mongos processes and all machines hosting client processes that connect to MongoDB.

意思你是否有网络超时的体验哈,这个是操作系统的锅(雨我无瓜),按我方式改之即可;操作系统默认的tcp_keepalive_time7200(两小时),MongoDB推荐120(两分钟)

解决方式

cat /proc/sys/net/ipv4/tcp_keepalive_time
7200
sudo sysctl -w net.ipv4.tcp_keepalive_time=<value>
#重启MongoDB后生效

上述配置在系统重启后会丢失,若要永久生效则在/etc/sysctl.conf中追加

net.ipv4.tcp_keepalive_time = <value>

重启操作系统生效

posted @ 2020-10-21 21:42  蒲公英的狂想  阅读(2845)  评论(2编辑  收藏  举报