应用支持redis cluster拓扑刷新功能升级方案

1、异常现象

redis 集群模式(n主n从)HA部署时,当某一主节点故障,从节点切换升级为主节点,redis server HA生效,但依赖该集群的应用仍有『部分redis请求』发送到故障节点而抛出超时异常,且异常一直持续,现象如下:

org.springframework.dao.QueryTimeoutException: Redis command timed out; nested exception is io.lettuce.core.RedisCommandTimeoutException: Command timed out after 5 second(s)
    at org.springframework.data.redis.connection.lettuce.LettuceExceptionConverter.convert(LettuceExceptionConverter.java:70)
    at org.springframework.data.redis.connection.lettuce.LettuceExceptionConverter.convert(LettuceExceptionConverter.java:41)
    at org.springframework.data.redis.PassThroughExceptionTranslationStrategy.translate(PassThroughExceptionTranslationStrategy.java:44)
    at org.springframework.data.redis.FallbackExceptionTranslationStrategy.translate(FallbackExceptionTranslationStrategy.java:42)
    at org.springframework.data.redis.connection.lettuce.LettuceConnection.convertLettuceAccessException(LettuceConnection.java:273)
    at org.springframework.data.redis.connection.lettuce.LettuceStringCommands.convertLettuceAccessException(LettuceStringCommands.java:799)
    at org.springframework.data.redis.connection.lettuce.LettuceStringCommands.get(LettuceStringCommands.java:68)
    at org.springframework.data.redis.connection.DefaultedRedisConnection.get(DefaultedRedisConnection.java:266)
    at org.springframework.data.redis.core.DefaultValueOperations$1.inRedis(DefaultValueOperations.java:57)
    at org.springframework.data.redis.core.AbstractOperations$ValueDeserializingRedisCallback.doInRedis(AbstractOperations.java:60)
    at org.springframework.data.redis.core.RedisTemplate.execute(RedisTemplate.java:228)
    at org.springframework.data.redis.core.RedisTemplate.execute(RedisTemplate.java:188)
    at org.springframework.data.redis.core.AbstractOperations.execute(AbstractOperations.java:96)
    at org.springframework.data.redis.core.DefaultValueOperations.get(DefaultValueOperations.java:53)
    at com.xxx.price.settle.biz.utils.RedisTemplateUtil.get(RedisTemplateUtil.java:109)
    at com.xxx.price.settle.biz.service.impl.PriceGrossServiceImpl.selectBaseGross(PriceGrossServiceImpl.java:367)
    at com.xxx.price.settle.biz.service.impl.PriceGrossServiceImpl.lambda$getGrossListByCriteria$7(PriceGrossServiceImpl.java:454)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
Caused by: io.lettuce.core.RedisCommandTimeoutException: Command timed out after 5 second(s)
    at io.lettuce.core.ExceptionFactory.createTimeoutException(ExceptionFactory.java:51)
    at io.lettuce.core.LettuceFutures.awaitOrCancel(LettuceFutures.java:119)
    at io.lettuce.core.cluster.ClusterFutureSyncInvocationHandler.handleInvocation(ClusterFutureSyncInvocationHandler.java:131)
    at io.lettuce.core.internal.AbstractInvocationHandler.invoke(AbstractInvocationHandler.java:79)
    at com.sun.proxy.$Proxy183.get(Unknown Source)
    at org.springframework.data.redis.connection.lettuce.LettuceStringCommands.get(LettuceStringCommands.java:66)
    ... 14 common frames omitted

2、原因分析

  • 观察异常报警发现只是部分应用报警,而区别主要是redis client不同,使用jedis的应用很快恢复,但使用lettuce的应用一直超时

  • 联想到Jedis依据自身异常反馈,能自动同步客户端与服务端的集群信息,所以能很快恢复,猜想luttuce对恢复后的故障集群没有拓扑刷新
  • 进一步分析,由于项目使用springboot,开启redis组件自动装配,但默认构建的LettuceConnectionFactory对象并未开启拓扑刷新(springboot版本>=2.3.0后有属性配置支持开启自动刷新)
  • 通过测试验证确实是lettuce未开启刷新导致,故障报告,测试点:
    • 停掉slave会不会有问题?无问题
    • 手工切主从会不会有问题?可能会有问题,但开启刷新很快恢复
    • 异常切主从会不会有问题?可能会有问题,但开启刷新很快恢复
    • 主从宕一组会如何?1/3节点不可用; 2/3节点可用?部分请求异常
    • 重启主会如何?(很快) 是否发生主-从切换,程序会如何?可能会有问题,但开启刷新很快恢复

 

3、解决方案

3.1、 根本方法

解决办法就是要开启redis client的集群拓扑刷新功能,不同客户端,采用不同处理方式:

  • jedis client默认自动支持,不需要升级(由于jedis通过自身异常反馈来识别重连、刷新服务端的集群信息机制,保证其自动故障恢复)
  • luttuce client默认未开启,需要手动指定开启刷新

3.2、 具体方案

注意:不管哪种方式建议maxRedirections最好为集群节点数;集群节点redis.cluster.nodes配置为所有节点,而不是一个或几个节点,防止节点故障重启连接异常

普通spring项目,自定义RedisClientConfigurationBuilderCustomizer(参考3.2.3(1))即可

springboot 1.x之前版本

使用jedis(默认是jedis),不需要升级

使用lettuce,需要升级

通过matrix组件redis-starter操作redis,升级对应补丁版本

通过java config构造redisTemplate,自定义RedisClientConfigurationBuilderCustomizer(参考3.2.3(1))即可

springboot 2.x开始,redis client默认为Lettuce,但默认不支持拓扑刷新

使用jedis,不需要升级

使用lettuce(默认是lettuce),需要升级

通过matrix组件redis-starter操作redis,升级对应补丁版本

通过java config构造redisTemplate,自定义RedisClientConfigurationBuilderCustomizer(参考3.2.3(1))即可

springboot 2.3.0开始,支持集群拓扑刷新功能,开启属性配置即可

3.2.1、普通spring项目

(1)普通java构造或builder redis client,只需构造时传入ClusterTopologyRefreshOptions

ClusterClientOptions clusterClientOptions = ClusterClientOptions.builder()
            .autoReconnect(true)
            .maxRedirects(6)
            .topologyRefreshOptions(ClusterTopologyRefreshOptions.builder()
                    .enablePeriodicRefresh(30000, TimeUnit.MILLISECONDS)
                    .enableAllAdaptiveRefreshTriggers()
                    .build())
            .build();
RedisClusterClient redisClusterClient = RedisClusterClient.create(clientResources, redisURIs);
redisClusterClient.setOptions(clusterClientOptions);

(2)通过spring-data-redis构建redis client,构造时传入ClusterTopologyRefreshOptions

@Bean
public RedisConnectionFactory newLettuceConnectionFactory() {
 
    ClusterTopologyRefreshOptions clusterTopologyRefreshOptions = ClusterTopologyRefreshOptions.builder()
            .enablePeriodicRefresh(Duration.ofMillis(30000))
            .enableAllAdaptiveRefreshTriggers()
            .build();
    ClusterClientOptions clusterClientOptions = ClusterClientOptions.builder()
            .validateClusterNodeMembership(false)
            .topologyRefreshOptions(clusterTopologyRefreshOptions)
            .build();
    LettuceClientConfiguration clientConfig = LettuceClientConfiguration.builder()
            .commandTimeout(redisProperties.getTimeout())
            .shutdownTimeout(Duration.ZERO)
            .clientOptions(clusterClientOptions)
            .build();
 
    RedisClusterConfiguration serverConfig = new RedisClusterConfiguration(redisProperties.getCluster().getNodes());
 
    return new LettuceConnectionFactory(serverConfig, clientConfig);
}

3.2.2、springboot项目,版本1.x

默认使用jedis client,不需要升级;若是使用luttuce,需要升级,修改规则如下:

使用matrix  redis-starter的,直接升级补丁版本;

没有使用matrix  redis-starter的自定义RedisClientConfigurationBuilderCustomizer(参考3.2.3(1))即可

3.2.3、springboot项目,2.0.X<版本<2.3.0

(1)springboot版本不升级

默认使用jedis client,不需要升级;若是使用luttuce,需要升级,修改规则如下:

使用matrix  redis-starter的,直接升级对应补丁版本;

没有使用matrix  redis-starter的自定义RedisClientConfigurationBuilderCustomizer即可

@Configuration
public class MatrixLettuceClientConfigurationBuilderCustomizerConfig {
   @Value("${spring.redis.lettuce.cluster.refresh.adaptive:false}")
   private boolean refreshAdaptive;
   @Value("${spring.redis.lettuce.cluster.refresh.period:}")
   private Duration refreshPeriod;
 
   @Bean
   @ConditionalOnMissingBean(LettuceClientConfigurationBuilderCustomizer.class)
   public LettuceClientConfigurationBuilderCustomizer LettuceClientConfigurationBuilderCustomizer(RedisProperties properties){
      return new MatrixLettuceClientConfigurationBuilderCustomizer(properties, refreshAdaptive, refreshPeriod);
   }
 
   static class MatrixLettuceClientConfigurationBuilderCustomizer implements LettuceClientConfigurationBuilderCustomizer {
      private final RedisProperties properties;
      private final boolean refreshAdaptive;
      private final Duration refreshPeriod;
 
      MatrixLettuceClientConfigurationBuilderCustomizer(RedisProperties properties,boolean refreshAdaptive,Duration refreshPeriod){
         this.properties = properties;
         this.refreshAdaptive = refreshAdaptive;
         this.refreshPeriod = refreshPeriod;
      }
 
      @Override
      public void customize(LettuceClientConfiguration.LettuceClientConfigurationBuilder clientConfigurationBuilder) {
         ClientOptions.Builder builder = ClientOptions.builder();
 
         if (this.properties.getCluster() != null) {
            ClusterTopologyRefreshOptions.Builder refreshBuilder = ClusterTopologyRefreshOptions.builder();
            if (refreshPeriod != null) {
               refreshBuilder.enablePeriodicRefresh(refreshPeriod);
            }
 
            if (refreshAdaptive) {
               refreshBuilder.enableAllAdaptiveRefreshTriggers();
            }
            builder = ClusterClientOptions.builder().topologyRefreshOptions(refreshBuilder.build());
         }
 
         clientConfigurationBuilder.clientOptions(builder.timeoutOptions(TimeoutOptions.enabled()).build());
      }
   }
}
 
 
spring:
  redis:
    lettuce:
      cluster:
        refresh:
          adaptive: true
          period: 30000

(2)springboot版本升级

          可以直接升级springboot版本到2.3.4.RELEASE,或者通过升级到matrix-2.0.8.RELEASE间接升级springboot版本,参考下面配置

引入依赖
可以升级matrix版本
<parent>
    <groupId>com.xxx.mall</groupId>
    <artifactId>matrix</artifactId>
    <version>2.0.8.RELEASE</version>
</parent>
或直接升级springboot版本2.3.4.RELEASE
<dependency>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-dependencies</artifactId>
    <version>2.3.4.RELEASE</version>
    <type>pom</type>
    <scope>import</scope>
</dependency>
 
配置属性
spring:
  redis:
    lettuce:
      cluster:
        refresh:
          adaptive: true
          period: 30000

3.2.4、springboot项目,版本>=2.3.0

spring:
  redis:
    lettuce:
      cluster:
        refresh:
          adaptive: true
          period: 30000

matrix补丁/springboot版本对应,如下

使用matrix 平台组件,版本对应的springboot关系如上表,请升级补丁版本或最新版本

需要配置属性,

spring.redis.lettuce.cluster.refresh.adaptive= true
spring.redis.lettuce.cluster.refresh.period=30000

或者

spring:
  redis:
	lettuce:
	  cluster:
		refresh:
		  adaptive: true
		  period: 30000

4、参考文档

redis客户端区别

springboot2.0集成Lettuce & Jedis

springboot 2.0 2.1 2.2 2.3 2.4 版本改变说明

posted @ 2022-05-31 18:14  zbjice  阅读(3688)  评论(1编辑  收藏  举报