应用支持redis cluster拓扑刷新功能升级方案
1、异常现象
redis 集群模式(n主n从)HA部署时,当某一主节点故障,从节点切换升级为主节点,redis server HA生效,但依赖该集群的应用仍有『部分redis请求』发送到故障节点而抛出超时异常,且异常一直持续,现象如下:
org.springframework.dao.QueryTimeoutException: Redis command timed out; nested exception is io.lettuce.core.RedisCommandTimeoutException: Command timed out after 5 second(s) at org.springframework.data.redis.connection.lettuce.LettuceExceptionConverter.convert(LettuceExceptionConverter.java:70) at org.springframework.data.redis.connection.lettuce.LettuceExceptionConverter.convert(LettuceExceptionConverter.java:41) at org.springframework.data.redis.PassThroughExceptionTranslationStrategy.translate(PassThroughExceptionTranslationStrategy.java:44) at org.springframework.data.redis.FallbackExceptionTranslationStrategy.translate(FallbackExceptionTranslationStrategy.java:42) at org.springframework.data.redis.connection.lettuce.LettuceConnection.convertLettuceAccessException(LettuceConnection.java:273) at org.springframework.data.redis.connection.lettuce.LettuceStringCommands.convertLettuceAccessException(LettuceStringCommands.java:799) at org.springframework.data.redis.connection.lettuce.LettuceStringCommands.get(LettuceStringCommands.java:68) at org.springframework.data.redis.connection.DefaultedRedisConnection.get(DefaultedRedisConnection.java:266) at org.springframework.data.redis.core.DefaultValueOperations$1.inRedis(DefaultValueOperations.java:57) at org.springframework.data.redis.core.AbstractOperations$ValueDeserializingRedisCallback.doInRedis(AbstractOperations.java:60) at org.springframework.data.redis.core.RedisTemplate.execute(RedisTemplate.java:228) at org.springframework.data.redis.core.RedisTemplate.execute(RedisTemplate.java:188) at org.springframework.data.redis.core.AbstractOperations.execute(AbstractOperations.java:96) at org.springframework.data.redis.core.DefaultValueOperations.get(DefaultValueOperations.java:53) at com.xxx.price.settle.biz.utils.RedisTemplateUtil.get(RedisTemplateUtil.java:109) at com.xxx.price.settle.biz.service.impl.PriceGrossServiceImpl.selectBaseGross(PriceGrossServiceImpl.java:367) at com.xxx.price.settle.biz.service.impl.PriceGrossServiceImpl.lambda$getGrossListByCriteria$7(PriceGrossServiceImpl.java:454) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: io.lettuce.core.RedisCommandTimeoutException: Command timed out after 5 second(s) at io.lettuce.core.ExceptionFactory.createTimeoutException(ExceptionFactory.java:51) at io.lettuce.core.LettuceFutures.awaitOrCancel(LettuceFutures.java:119) at io.lettuce.core.cluster.ClusterFutureSyncInvocationHandler.handleInvocation(ClusterFutureSyncInvocationHandler.java:131) at io.lettuce.core.internal.AbstractInvocationHandler.invoke(AbstractInvocationHandler.java:79) at com.sun.proxy.$Proxy183.get(Unknown Source) at org.springframework.data.redis.connection.lettuce.LettuceStringCommands.get(LettuceStringCommands.java:66) ... 14 common frames omitted
2、原因分析
-
观察异常报警发现只是部分应用报警,而区别主要是redis client不同,使用jedis的应用很快恢复,但使用lettuce的应用一直超时
- 联想到Jedis依据自身异常反馈,能自动同步客户端与服务端的集群信息,所以能很快恢复,猜想luttuce对恢复后的故障集群没有拓扑刷新
- 进一步分析,由于项目使用springboot,开启redis组件自动装配,但默认构建的LettuceConnectionFactory对象并未开启拓扑刷新(springboot版本>=2.3.0后有属性配置支持开启自动刷新)
- 通过测试验证确实是lettuce未开启刷新导致,故障报告,测试点:
- 停掉slave会不会有问题?无问题
- 手工切主从会不会有问题?可能会有问题,但开启刷新很快恢复
- 异常切主从会不会有问题?可能会有问题,但开启刷新很快恢复
- 主从宕一组会如何?1/3节点不可用; 2/3节点可用?部分请求异常
- 重启主会如何?(很快) 是否发生主-从切换,程序会如何?可能会有问题,但开启刷新很快恢复
3、解决方案
3.1、 根本方法
解决办法就是要开启redis client的集群拓扑刷新功能,不同客户端,采用不同处理方式:
- jedis client默认自动支持,不需要升级(由于jedis通过自身异常反馈来识别重连、刷新服务端的集群信息机制,保证其自动故障恢复)
- luttuce client默认未开启,需要手动指定开启刷新
3.2、 具体方案
注意:不管哪种方式建议maxRedirections最好为集群节点数;集群节点redis.cluster.nodes配置为所有节点,而不是一个或几个节点,防止节点故障重启连接异常
普通spring项目,自定义RedisClientConfigurationBuilderCustomizer(参考3.2.3(1))即可
springboot 1.x之前版本
使用jedis(默认是jedis),不需要升级
使用lettuce,需要升级
通过matrix组件redis-starter操作redis,升级对应补丁版本
通过java config构造redisTemplate,自定义RedisClientConfigurationBuilderCustomizer(参考3.2.3(1))即可
springboot 2.x开始,redis client默认为Lettuce,但默认不支持拓扑刷新
使用jedis,不需要升级
使用lettuce(默认是lettuce),需要升级
通过matrix组件redis-starter操作redis,升级对应补丁版本
通过java config构造redisTemplate,自定义RedisClientConfigurationBuilderCustomizer(参考3.2.3(1))即可
springboot 2.3.0开始,支持集群拓扑刷新功能,开启属性配置即可
3.2.1、普通spring项目
(1)普通java构造或builder redis client,只需构造时传入ClusterTopologyRefreshOptions
ClusterClientOptions clusterClientOptions = ClusterClientOptions.builder() .autoReconnect(true) .maxRedirects(6) .topologyRefreshOptions(ClusterTopologyRefreshOptions.builder() .enablePeriodicRefresh(30000, TimeUnit.MILLISECONDS) .enableAllAdaptiveRefreshTriggers() .build()) .build(); RedisClusterClient redisClusterClient = RedisClusterClient.create(clientResources, redisURIs); redisClusterClient.setOptions(clusterClientOptions);
(2)通过spring-data-redis构建redis client,构造时传入ClusterTopologyRefreshOptions
@Bean public RedisConnectionFactory newLettuceConnectionFactory() { ClusterTopologyRefreshOptions clusterTopologyRefreshOptions = ClusterTopologyRefreshOptions.builder() .enablePeriodicRefresh(Duration.ofMillis(30000)) .enableAllAdaptiveRefreshTriggers() .build(); ClusterClientOptions clusterClientOptions = ClusterClientOptions.builder() .validateClusterNodeMembership(false) .topologyRefreshOptions(clusterTopologyRefreshOptions) .build(); LettuceClientConfiguration clientConfig = LettuceClientConfiguration.builder() .commandTimeout(redisProperties.getTimeout()) .shutdownTimeout(Duration.ZERO) .clientOptions(clusterClientOptions) .build(); RedisClusterConfiguration serverConfig = new RedisClusterConfiguration(redisProperties.getCluster().getNodes()); return new LettuceConnectionFactory(serverConfig, clientConfig); }
3.2.2、springboot项目,版本1.x
默认使用jedis client,不需要升级;若是使用luttuce,需要升级,修改规则如下:
使用matrix redis-starter的,直接升级补丁版本;
没有使用matrix redis-starter的自定义RedisClientConfigurationBuilderCustomizer(参考3.2.3(1))即可
3.2.3、springboot项目,2.0.X<版本<2.3.0
(1)springboot版本不升级
默认使用jedis client,不需要升级;若是使用luttuce,需要升级,修改规则如下:
使用matrix redis-starter的,直接升级对应补丁版本;
没有使用matrix redis-starter的自定义RedisClientConfigurationBuilderCustomizer即可
@Configuration public class MatrixLettuceClientConfigurationBuilderCustomizerConfig { @Value("${spring.redis.lettuce.cluster.refresh.adaptive:false}") private boolean refreshAdaptive; @Value("${spring.redis.lettuce.cluster.refresh.period:}") private Duration refreshPeriod; @Bean @ConditionalOnMissingBean(LettuceClientConfigurationBuilderCustomizer.class) public LettuceClientConfigurationBuilderCustomizer LettuceClientConfigurationBuilderCustomizer(RedisProperties properties){ return new MatrixLettuceClientConfigurationBuilderCustomizer(properties, refreshAdaptive, refreshPeriod); } static class MatrixLettuceClientConfigurationBuilderCustomizer implements LettuceClientConfigurationBuilderCustomizer { private final RedisProperties properties; private final boolean refreshAdaptive; private final Duration refreshPeriod; MatrixLettuceClientConfigurationBuilderCustomizer(RedisProperties properties,boolean refreshAdaptive,Duration refreshPeriod){ this.properties = properties; this.refreshAdaptive = refreshAdaptive; this.refreshPeriod = refreshPeriod; } @Override public void customize(LettuceClientConfiguration.LettuceClientConfigurationBuilder clientConfigurationBuilder) { ClientOptions.Builder builder = ClientOptions.builder(); if (this.properties.getCluster() != null) { ClusterTopologyRefreshOptions.Builder refreshBuilder = ClusterTopologyRefreshOptions.builder(); if (refreshPeriod != null) { refreshBuilder.enablePeriodicRefresh(refreshPeriod); } if (refreshAdaptive) { refreshBuilder.enableAllAdaptiveRefreshTriggers(); } builder = ClusterClientOptions.builder().topologyRefreshOptions(refreshBuilder.build()); } clientConfigurationBuilder.clientOptions(builder.timeoutOptions(TimeoutOptions.enabled()).build()); } } } spring: redis: lettuce: cluster: refresh: adaptive: true period: 30000
(2)springboot版本升级
可以直接升级springboot版本到2.3.4.RELEASE,或者通过升级到matrix-2.0.8.RELEASE间接升级springboot版本,参考下面配置
引入依赖 可以升级matrix版本 <parent> <groupId>com.xxx.mall</groupId> <artifactId>matrix</artifactId> <version>2.0.8.RELEASE</version> </parent> 或直接升级springboot版本2.3.4.RELEASE <dependency> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-dependencies</artifactId> <version>2.3.4.RELEASE</version> <type>pom</type> <scope>import</scope> </dependency> 配置属性 spring: redis: lettuce: cluster: refresh: adaptive: true period: 30000
3.2.4、springboot项目,版本>=2.3.0
spring: redis: lettuce: cluster: refresh: adaptive: true period: 30000
matrix补丁/springboot版本对应,如下
使用matrix 平台组件,版本对应的springboot关系如上表,请升级补丁版本或最新版本
需要配置属性,
spring.redis.lettuce.cluster.refresh.adaptive= true
spring.redis.lettuce.cluster.refresh.period=30000
或者
spring: redis: lettuce: cluster: refresh: adaptive: true period: 30000
【推荐】国内首个AI IDE,深度理解中文开发场景,立即下载体验Trae
【推荐】编程新体验,更懂你的AI,立即体验豆包MarsCode编程助手
【推荐】抖音旗下AI助手豆包,你的智能百科全书,全免费不限次数
【推荐】轻量又高性能的 SSH 工具 IShell:AI 加持,快人一步
· TypeScript + Deepseek 打造卜卦网站:技术与玄学的结合
· 阿里巴巴 QwQ-32B真的超越了 DeepSeek R-1吗?
· 【译】Visual Studio 中新的强大生产力特性
· 10年+ .NET Coder 心语 ── 封装的思维:从隐藏、稳定开始理解其本质意义
· 【设计模式】告别冗长if-else语句:使用策略模式优化代码结构