Dubbo整合ZooKeeper3.6.x时出现zookeeper not connected

一、开端

  • Dubbo 2.7.12 及其以下版本,均默认使用 CuratorZookeeperClient
Dubbo 2.7.13 开始 ZookeeperTransporter 接口 getExtension 方法根据是否可以加载到 CuratorCache 这个类来判别当前依赖的 Curator 是高版本还是低版本;
package org.apache.dubbo.remoting.zookeeper;
 
import org.apache.dubbo.common.URL;
import org.apache.dubbo.common.extension.ExtensionLoader;
import org.apache.dubbo.common.extension.ExtensionScope;
import org.apache.dubbo.common.extension.SPI;
import org.apache.dubbo.rpc.model.ApplicationModel;
 
@SPI(scope = ExtensionScope.APPLICATION)
public interface ZookeeperTransporter {
   
  String CURATOR_5 = "curator5";
   
  String CURATOR = "curator";
   
  ZookeeperClient connect(URL url);
   
  void destroy();
   
  static ZookeeperTransporter getExtension(ApplicationModel applicationModel) {
    ExtensionLoader extensionLoader = applicationModel.getExtensionLoader(ZookeeperTransporter.class);
    boolean isHighVersion = isHighVersionCurator();
    if (isHighVersion) {
      return extensionLoader.getExtension(CURATOR_5);
    }
    return extensionLoader.getExtension(CURATOR);
  }
   
  static boolean isHighVersionCurator() {
    try {
      Class.forName("org.apache.curator.framework.recipes.cache.CuratorCache");
      return true;
    } catch (ClassNotFoundException e) {
      return false;
    }
  }
}
因此,Dubbo 3.0.x 整合 Curator 5.2.0 & ZooKeeper 3.6.3 时,报错位置在 Curator5ZookeeperClient.<init>(Curator5ZookeeperClient.java:83)
java.lang.IllegalStateException: java.lang.IllegalStateException: zookeeper not connected
	at org.apache.dubbo.config.deploy.DefaultApplicationDeployer.prepareEnvironment(DefaultApplicationDeployer.java:697) ~[dubbo-3.0.5.jar:3.0.5]
	at org.apache.dubbo.config.deploy.DefaultApplicationDeployer.startConfigCenter(DefaultApplicationDeployer.java:276) ~[dubbo-3.0.5.jar:3.0.5]
	at org.apache.dubbo.config.deploy.DefaultApplicationDeployer.initialize(DefaultApplicationDeployer.java:198) ~[dubbo-3.0.5.jar:3.0.5]
	at org.apache.dubbo.config.deploy.DefaultModuleDeployer.prepare(DefaultModuleDeployer.java:467) ~[dubbo-3.0.5.jar:3.0.5]
	at org.apache.dubbo.config.spring.context.DubboConfigApplicationListener.initDubboConfigBeans(DubboConfigApplicationListener.java:68) ~[dubbo-3.0.5.jar:3.0.5]
	at org.apache.dubbo.config.spring.context.DubboConfigApplicationListener.onApplicationEvent(DubboConfigApplicationListener.java:55) ~[dubbo-3.0.5.jar:3.0.5]
	at org.apache.dubbo.config.spring.context.DubboConfigApplicationListener.onApplicationEvent(DubboConfigApplicationListener.java:34) ~[dubbo-3.0.5.jar:3.0.5]
	at org.springframework.context.event.SimpleApplicationEventMulticaster.doInvokeListener(SimpleApplicationEventMulticaster.java:176) ~[spring-context-5.3.15.jar:5.3.15]
	at org.springframework.context.event.SimpleApplicationEventMulticaster.invokeListener(SimpleApplicationEventMulticaster.java:169) ~[spring-context-5.3.15.jar:5.3.15]
	at org.springframework.context.event.SimpleApplicationEventMulticaster.multicastEvent(SimpleApplicationEventMulticaster.java:143) ~[spring-context-5.3.15.jar:5.3.15]
	at org.springframework.context.event.SimpleApplicationEventMulticaster.multicastEvent(SimpleApplicationEventMulticaster.java:131) ~[spring-context-5.3.15.jar:5.3.15]
	at org.springframework.context.support.AbstractApplicationContext.registerListeners(AbstractApplicationContext.java:881) ~[spring-context-5.3.15.jar:5.3.15]
	at org.springframework.context.support.AbstractApplicationContext.refresh(AbstractApplicationContext.java:580) ~[spring-context-5.3.15.jar:5.3.15]
	at org.springframework.boot.web.servlet.context.ServletWebServerApplicationContext.refresh(ServletWebServerApplicationContext.java:145) ~[spring-boot-2.6.3.jar:2.6.3]
	at org.springframework.boot.SpringApplication.refresh(SpringApplication.java:732) [spring-boot-2.6.3.jar:2.6.3]
	at org.springframework.boot.SpringApplication.refreshContext(SpringApplication.java:414) [spring-boot-2.6.3.jar:2.6.3]
	at org.springframework.boot.SpringApplication.run(SpringApplication.java:302) [spring-boot-2.6.3.jar:2.6.3]
	at org.springframework.boot.SpringApplication.run(SpringApplication.java:1303) [spring-boot-2.6.3.jar:2.6.3]
	at org.springframework.boot.SpringApplication.run(SpringApplication.java:1292) [spring-boot-2.6.3.jar:2.6.3]
	at org.coderead.ProviderApplication.main(ProviderApplication.java:11) [classes/:na]
Caused by: java.lang.IllegalStateException: zookeeper not connected
	at org.apache.dubbo.remoting.zookeeper.curator5.Curator5ZookeeperClient.(Curator5ZookeeperClient.java:86) ~[dubbo-3.0.5.jar:3.0.5]
	at org.apache.dubbo.remoting.zookeeper.curator5.Curator5ZookeeperTransporter.createZookeeperClient(Curator5ZookeeperTransporter.java:27) ~[dubbo-3.0.5.jar:3.0.5]
	at org.apache.dubbo.remoting.zookeeper.AbstractZookeeperTransporter.connect(AbstractZookeeperTransporter.java:69) ~[dubbo-3.0.5.jar:3.0.5]
	at org.apache.dubbo.configcenter.support.zookeeper.ZookeeperDynamicConfiguration.(ZookeeperDynamicConfiguration.java:67) ~[dubbo-3.0.5.jar:3.0.5]
	at org.apache.dubbo.configcenter.support.zookeeper.ZookeeperDynamicConfigurationFactory.createDynamicConfiguration(ZookeeperDynamicConfigurationFactory.java:47) ~[dubbo-3.0.5.jar:3.0.5]
	at org.apache.dubbo.common.config.configcenter.AbstractDynamicConfigurationFactory.lambda$getDynamicConfiguration$0(AbstractDynamicConfigurationFactory.java:39) ~[dubbo-3.0.5.jar:3.0.5]
	at java.util.concurrent.ConcurrentHashMap.computeIfAbsent(ConcurrentHashMap.java:1660) ~[na:1.8.0_131]
	at org.apache.dubbo.common.config.configcenter.AbstractDynamicConfigurationFactory.getDynamicConfiguration(AbstractDynamicConfigurationFactory.java:39) ~[dubbo-3.0.5.jar:3.0.5]
	at org.apache.dubbo.config.deploy.DefaultApplicationDeployer.getDynamicConfiguration(DefaultApplicationDeployer.java:734) ~[dubbo-3.0.5.jar:3.0.5]
	at org.apache.dubbo.config.deploy.DefaultApplicationDeployer.prepareEnvironment(DefaultApplicationDeployer.java:690) ~[dubbo-3.0.5.jar:3.0.5]
	... 19 common frames omitted
Caused by: java.lang.IllegalStateException: zookeeper not connected
	at org.apache.dubbo.remoting.zookeeper.curator5.Curator5ZookeeperClient.(Curator5ZookeeperClient.java:83) ~[dubbo-3.0.5.jar:3.0.5]
	... 28 common frames omitted

抛出异常的代码:

// Curator5ZookeeperClient.java

public Curator5ZookeeperClient(URL url) {
  // ... (省略)
  client.getConnectionStateListenable().addListener(new CuratorConnectionStateListener(url));
  client.start();
  // 这里是一个同步阻塞等待,假如超过了 timeout 的时间,当前ZooKeeper客户端还是没有变成“已连接”状态,当前线程就会被唤醒,继续向下执行
  boolean connected = client.blockUntilConnected(timeout, TimeUnit.MILLISECONDS);
  // 判断当前客户端不是“已连接”状态,主动抛出异常
  if (!connected) {
    throw new IllegalStateException("zookeeper not connected");
  }
  // ... (省略)
}

二、增加超时时长

CuratorZookeeperClient 构造函数和 Curator5ZookeeperClient 的构造函数逻辑类似。

网上有一些解决方案,就是增加超时时长,来避免该 IllegalStateException 异常。比如在 application.properties 中增加配置项:

dubbo.registry.timeout=30000

其他可以设置超时的配置:

三、寻找超时原因

但是,关键的关键还是得找到超时的原因。

client.start() 是个异步方法,问题就突然陷入了毫无头绪的境地。

这时候,你需要知道关于 ZooKeeper 源码的几个知识点:

  • ClientCnxn 这个类负责管理客户端的套接字i/o。ZooKeeper 报文的“发送”和“接收”都要经过这个类;
  • ClientCnxn 中包含 SendThreadEventThread,前者负责数据的“发送”,后者负责数据的“接收”;

现在的问题是“连接不上”,因此我们按如下步骤排查:

  1. 使用 ping <ip> 命令排除目标IP或域名 无法访问到的可能性;
  2. 使用 telnet <ip> <port> 排除端口访问不通的可能性;
  3. 如果不是前两者,那就说明TCP通道是通畅的,那就调试一下连接过程的代码!

我们认准 ClientCnxn.SendThread,找到 startConnect 方法。以下是 ZooKeeper 3.6.3 中的源码:

另外,给大家看一下 ZooKeeper 3.4.10 中的源码:

3.1 SaslServerPrincipal.getServerPrincipal

经过测试发现,在 addr.getHostName()ia.getCanonicalHostName() 处分别耗时 10s,共计花费时长 20s

3.2 对比新老ZooKeeper的addr状态

以下是 ZooKeeper 3.4.10 中的源码调试时,addr.getHostName() 调用前,addr 的“状态”:

相对应的,ZooKeeper 3.6.3 中的源码调试时,addr.getHostName() 调用前,addr 的“状态”:

3.3 InetSocketAddress.getHostName源码分析

首先,调用 InetSocketAddress.getHostName

// 当前在类文件 InetSocketAddress.java 中 
public final String getHostName() {
  // 调用InetSocketAddress的内部类InetSocketAddressHolder的getHostName方法
  return holder.getHostName();
}

接着,继续看 InetSocketAddress.InetSocketAddressHoldergetHostName 方法:

// 当前在类文件 InetSocketAddress.java 中的内部类 InetSocketAddressHolder 中
private String getHostName() {
  // 新老ZooKeeper的addr的此hostname都为null,跳过
  if (hostname != null)
    return hostname;
  // 新老ZooKeeper代码中,此处的 addr 都是 Inet4Address 实例
  if (addr != null)
    return addr.getHostName();
  return null;
}

当然,不管是 Inet4Address 还是 Inet6Address 都是 InetAddress 的子类,他们都调用的是基类的 getHostName 方法:

// 当前在类文件 InetAddress.java 中
public String getHostName() {
  return getHostName(true);
}

String getHostName(boolean check) {
  // ZooKeeper 3.4.10 代码在调试时,当前 if 条件判定为 false,跳过
  // ZooKeeper 3.6.3 代码在调试时,当前 if 条件判定为 true,将调用 InetAddress.getHostFromNameService 方法
  if (holder().getHostName() == null) {
    holder().hostName = InetAddress.getHostFromNameService(this, check);
  }
  return holder().getHostName();
}

如果该地址(InetAddress)是用主机名(hostname)创建的,则会记住并返回该主机名;

否则,将执行DNS反向解析,并根据系统配置的名称查找服务返回结果。

3.4 创建InetAddress为什么不一样?

首先,新老ZooKeeper代码中的 addr 都是由方法 hostProvider.next(1000) 获取的。

这个方法的作用:就是从 StaticHostProvider 的成员变量 serverAddresses (该成员变量的类型是 InetSocketAddress 列表)中随机获取一个地址。

继续挖掘 serverAddresses 初始化的地方。

ZooKeeper 3.4.10

创建 ZooKeeper 对象时,需要传入 connectString 参数

经过 ConnectStringParser 处理后得到的 InetSocketAddress 列表,例如:

接着就是 StaticHostProvider 的构造函数的初始化:

public StaticHostProvider(Collection<InetSocketAddress> serverAddresses)
        throws UnknownHostException {
  for (InetSocketAddress address : serverAddresses) {
        InetAddress ia = address.getAddress();
        // 根据前面解析的情况,此时 ia == null,调用 address.getHostName 获取到 10.47.227.15 作为参数调用 getAllByName
        InetAddress resolvedAddresses[] = InetAddress.getAllByName(
          (ia!=null) ? ia.getHostAddress(): address.getHostName());
        for (InetAddress resolvedAddress : resolvedAddresses) {
            // If hostName is null but the address is not, we can tell that
            // the hostName is an literal IP address. Then we can set the host string as the hostname
            // safely to avoid reverse DNS lookup.
            // As far as i know, the only way to check if the hostName is null is use toString().
            // Both the two implementations of InetAddress are final class, so we can trust the return value of
            // the toString() method.
            if (resolvedAddress.toString().startsWith("/") 
                    && resolvedAddress.getAddress() != null) {
                this.serverAddresses.add(
                        new InetSocketAddress(InetAddress.getByAddress(
                                // 关键就这里,使用用户传入的 connectString 的中的 host 作为主机名!显然也不是空的!
                                address.getHostName(), 
                                resolvedAddress.getAddress()), 
                                address.getPort()));
            } else {
                this.serverAddresses.add(new InetSocketAddress(resolvedAddress.getHostAddress(), address.getPort()));
            }  
        }
    }
    
    if (this.serverAddresses.isEmpty()) {
        throw new IllegalArgumentException(
                "A HostProvider may not be empty!");
    }
    Collections.shuffle(this.serverAddresses);
}

getAllByName 的功能是根据 hostName 获取 IP 地址,源码如下:

本文中,走到红框位置,返回了一个 hostname=null,addr不为null 的 Inet4Address 对象。

ZooKeeper 3.6.3

我们再来看看新版本的 ZooKeeper 的构造函数:

public ZooKeeper(
  String connectString,
  int sessionTimeout,
  Watcher watcher,
  boolean canBeReadOnly) throws IOException {
  this(connectString, sessionTimeout, watcher, canBeReadOnly, createDefaultHostProvider(connectString));
}

// default hostprovider
private static HostProvider createDefaultHostProvider(String connectString) {
  // 虽然,写法上有一些差异,但是 StaticHostProvider 的初始化逻辑和老版本相差无几。
  // ConnectStringParser的构造函数在解析connectString时增加了对 ipv6 地址解析的支持!
  return new StaticHostProvider(new ConnectStringParser(connectString).getServerAddresses());
}

再来,就是 StaticHostProvider 的构造函数源码:

*init* 方法中主要是方法参数赋值给成员变量的操作,比较简单。
private void init(Collection serverAddresses, long randomnessSeed, Resolver resolver) {
  this.sourceOfRandomness = new Random(randomnessSeed);
  this.resolver = resolver;
  if (serverAddresses.isEmpty()) {
    throw new IllegalArgumentException("A HostProvider may not be empty!");
  }
  this.serverAddresses = shuffle(serverAddresses);
  currentIndex = -1;
  lastIndex = -1;
}

Resolver 对象的 getAllByName 方法的调用发生在 hostProvider.next(1000) 调用时。

3.5 总结

回到问题:创建的InetAddress为什么不一样?
答:

首先,connectString 中的 host:port 格式的字符串被解析后,通过 InetSocketAddress.createUnresolved(host, port) 创建为 InetSocketAddress 对象(这是一个未解析出具体IP地址的地址);

ZooKeeper 3.4.10 ZooKeeper 3.6.3
解析IP地址的发生点 StaticHostProvider 构造函数 调用 StaticHostProvider.next(long) 时
创建解析后的 InetSocketAddress 是否设置了hostName 是,拿 connectString 的 host 作为 hostName

因此 ZooKeeper 3.6.3 是需要 DNS 反向解析的,这就是新版本和老版本之间的区别。

3.6 InetAddress 部分方法说明

方法名 功能 受保护级别
getAllByName 给定主机名,返回其IP地址数组 public
getAddressesFromNameService DNS解析,通过主机名获取IP地址 private
getHostName 获取当前IP地址的主机名 public
getHostFromNameService DNS反向解析,通过IP地址获取主机名 private
getCanonicalHostName 获取此IP地址的完全限定域名 public

四、建议

  1. 如果没有IPv6方面的需求,可以考虑继续使用 ZooKeeper 3.4.10 版本;
  2. 如果一定要用 ZooKeeper 3.6.3 版本,但是用不到 SASL 认证,可以添加JVM参数 -Dzookeeper.sasl.client=false 来禁用 SASL 认证

参考文档

记一次zookeeper连接慢的问题和解决方法

  • 这篇找到了比较核心的原因

【Zookeeper】zookeeper not connected

  • 这篇只找到了较为表层的原因,给出的解决方案也不好

InetAddress类中的getHostName()方法的坑

  • 这个是通过添加 hosts 的方式,来解决 getHostName 阻塞的问题,感觉不是特别好(如果IP变了还需要新增 hosts 中的条目),但是也是个方法。
posted @ 2022-03-22 14:39  极客子羽  阅读(1479)  评论(0编辑  收藏  举报