Dubbo整合ZooKeeper3.6.x时出现zookeeper not connected
一、开端
- Dubbo 2.7.12 及其以下版本,均默认使用 CuratorZookeeperClient
从Dubbo 2.7.13 开始 ZookeeperTransporter 接口 getExtension 方法根据是否可以加载到 CuratorCache 这个类来判别当前依赖的 Curator 是高版本还是低版本;
package org.apache.dubbo.remoting.zookeeper;
import org.apache.dubbo.common.URL;
import org.apache.dubbo.common.extension.ExtensionLoader;
import org.apache.dubbo.common.extension.ExtensionScope;
import org.apache.dubbo.common.extension.SPI;
import org.apache.dubbo.rpc.model.ApplicationModel;
@SPI(scope = ExtensionScope.APPLICATION)
public interface ZookeeperTransporter {
String CURATOR_5 = "curator5";
String CURATOR = "curator";
ZookeeperClient connect(URL url);
void destroy();
static ZookeeperTransporter getExtension(ApplicationModel applicationModel) {
ExtensionLoader extensionLoader = applicationModel.getExtensionLoader(ZookeeperTransporter.class);
boolean isHighVersion = isHighVersionCurator();
if (isHighVersion) {
return extensionLoader.getExtension(CURATOR_5);
}
return extensionLoader.getExtension(CURATOR);
}
static boolean isHighVersionCurator() {
try {
Class.forName("org.apache.curator.framework.recipes.cache.CuratorCache");
return true;
} catch (ClassNotFoundException e) {
return false;
}
}
}
因此,Dubbo 3.0.x 整合 Curator 5.2.0 & ZooKeeper 3.6.3 时,报错位置在 Curator5ZookeeperClient.<init>(Curator5ZookeeperClient.java:83)
java.lang.IllegalStateException: java.lang.IllegalStateException: zookeeper not connected
at org.apache.dubbo.config.deploy.DefaultApplicationDeployer.prepareEnvironment(DefaultApplicationDeployer.java:697) ~[dubbo-3.0.5.jar:3.0.5]
at org.apache.dubbo.config.deploy.DefaultApplicationDeployer.startConfigCenter(DefaultApplicationDeployer.java:276) ~[dubbo-3.0.5.jar:3.0.5]
at org.apache.dubbo.config.deploy.DefaultApplicationDeployer.initialize(DefaultApplicationDeployer.java:198) ~[dubbo-3.0.5.jar:3.0.5]
at org.apache.dubbo.config.deploy.DefaultModuleDeployer.prepare(DefaultModuleDeployer.java:467) ~[dubbo-3.0.5.jar:3.0.5]
at org.apache.dubbo.config.spring.context.DubboConfigApplicationListener.initDubboConfigBeans(DubboConfigApplicationListener.java:68) ~[dubbo-3.0.5.jar:3.0.5]
at org.apache.dubbo.config.spring.context.DubboConfigApplicationListener.onApplicationEvent(DubboConfigApplicationListener.java:55) ~[dubbo-3.0.5.jar:3.0.5]
at org.apache.dubbo.config.spring.context.DubboConfigApplicationListener.onApplicationEvent(DubboConfigApplicationListener.java:34) ~[dubbo-3.0.5.jar:3.0.5]
at org.springframework.context.event.SimpleApplicationEventMulticaster.doInvokeListener(SimpleApplicationEventMulticaster.java:176) ~[spring-context-5.3.15.jar:5.3.15]
at org.springframework.context.event.SimpleApplicationEventMulticaster.invokeListener(SimpleApplicationEventMulticaster.java:169) ~[spring-context-5.3.15.jar:5.3.15]
at org.springframework.context.event.SimpleApplicationEventMulticaster.multicastEvent(SimpleApplicationEventMulticaster.java:143) ~[spring-context-5.3.15.jar:5.3.15]
at org.springframework.context.event.SimpleApplicationEventMulticaster.multicastEvent(SimpleApplicationEventMulticaster.java:131) ~[spring-context-5.3.15.jar:5.3.15]
at org.springframework.context.support.AbstractApplicationContext.registerListeners(AbstractApplicationContext.java:881) ~[spring-context-5.3.15.jar:5.3.15]
at org.springframework.context.support.AbstractApplicationContext.refresh(AbstractApplicationContext.java:580) ~[spring-context-5.3.15.jar:5.3.15]
at org.springframework.boot.web.servlet.context.ServletWebServerApplicationContext.refresh(ServletWebServerApplicationContext.java:145) ~[spring-boot-2.6.3.jar:2.6.3]
at org.springframework.boot.SpringApplication.refresh(SpringApplication.java:732) [spring-boot-2.6.3.jar:2.6.3]
at org.springframework.boot.SpringApplication.refreshContext(SpringApplication.java:414) [spring-boot-2.6.3.jar:2.6.3]
at org.springframework.boot.SpringApplication.run(SpringApplication.java:302) [spring-boot-2.6.3.jar:2.6.3]
at org.springframework.boot.SpringApplication.run(SpringApplication.java:1303) [spring-boot-2.6.3.jar:2.6.3]
at org.springframework.boot.SpringApplication.run(SpringApplication.java:1292) [spring-boot-2.6.3.jar:2.6.3]
at org.coderead.ProviderApplication.main(ProviderApplication.java:11) [classes/:na]
Caused by: java.lang.IllegalStateException: zookeeper not connected
at org.apache.dubbo.remoting.zookeeper.curator5.Curator5ZookeeperClient.(Curator5ZookeeperClient.java:86) ~[dubbo-3.0.5.jar:3.0.5]
at org.apache.dubbo.remoting.zookeeper.curator5.Curator5ZookeeperTransporter.createZookeeperClient(Curator5ZookeeperTransporter.java:27) ~[dubbo-3.0.5.jar:3.0.5]
at org.apache.dubbo.remoting.zookeeper.AbstractZookeeperTransporter.connect(AbstractZookeeperTransporter.java:69) ~[dubbo-3.0.5.jar:3.0.5]
at org.apache.dubbo.configcenter.support.zookeeper.ZookeeperDynamicConfiguration.(ZookeeperDynamicConfiguration.java:67) ~[dubbo-3.0.5.jar:3.0.5]
at org.apache.dubbo.configcenter.support.zookeeper.ZookeeperDynamicConfigurationFactory.createDynamicConfiguration(ZookeeperDynamicConfigurationFactory.java:47) ~[dubbo-3.0.5.jar:3.0.5]
at org.apache.dubbo.common.config.configcenter.AbstractDynamicConfigurationFactory.lambda$getDynamicConfiguration$0(AbstractDynamicConfigurationFactory.java:39) ~[dubbo-3.0.5.jar:3.0.5]
at java.util.concurrent.ConcurrentHashMap.computeIfAbsent(ConcurrentHashMap.java:1660) ~[na:1.8.0_131]
at org.apache.dubbo.common.config.configcenter.AbstractDynamicConfigurationFactory.getDynamicConfiguration(AbstractDynamicConfigurationFactory.java:39) ~[dubbo-3.0.5.jar:3.0.5]
at org.apache.dubbo.config.deploy.DefaultApplicationDeployer.getDynamicConfiguration(DefaultApplicationDeployer.java:734) ~[dubbo-3.0.5.jar:3.0.5]
at org.apache.dubbo.config.deploy.DefaultApplicationDeployer.prepareEnvironment(DefaultApplicationDeployer.java:690) ~[dubbo-3.0.5.jar:3.0.5]
... 19 common frames omitted
Caused by: java.lang.IllegalStateException: zookeeper not connected
at org.apache.dubbo.remoting.zookeeper.curator5.Curator5ZookeeperClient.(Curator5ZookeeperClient.java:83) ~[dubbo-3.0.5.jar:3.0.5]
... 28 common frames omitted
抛出异常的代码:
// Curator5ZookeeperClient.java
public Curator5ZookeeperClient(URL url) {
// ... (省略)
client.getConnectionStateListenable().addListener(new CuratorConnectionStateListener(url));
client.start();
// 这里是一个同步阻塞等待,假如超过了 timeout 的时间,当前ZooKeeper客户端还是没有变成“已连接”状态,当前线程就会被唤醒,继续向下执行
boolean connected = client.blockUntilConnected(timeout, TimeUnit.MILLISECONDS);
// 判断当前客户端不是“已连接”状态,主动抛出异常
if (!connected) {
throw new IllegalStateException("zookeeper not connected");
}
// ... (省略)
}
二、增加超时时长
CuratorZookeeperClient 构造函数和 Curator5ZookeeperClient 的构造函数逻辑类似。
网上有一些解决方案,就是增加超时时长,来避免该 IllegalStateException 异常。比如在 application.properties 中增加配置项:
dubbo.registry.timeout=30000
其他可以设置超时的配置:
三、寻找超时原因
但是,关键的关键还是得找到超时的原因。
client.start() 是个异步方法,问题就突然陷入了毫无头绪的境地。
这时候,你需要知道关于 ZooKeeper 源码的几个知识点:
- ClientCnxn 这个类负责管理客户端的套接字i/o。ZooKeeper 报文的“发送”和“接收”都要经过这个类;
- ClientCnxn 中包含 SendThread 和 EventThread,前者负责数据的“发送”,后者负责数据的“接收”;
现在的问题是“连接不上”,因此我们按如下步骤排查:
- 使用 ping <ip> 命令排除目标IP或域名 无法访问到的可能性;
- 使用 telnet <ip> <port> 排除端口访问不通的可能性;
- 如果不是前两者,那就说明TCP通道是通畅的,那就调试一下连接过程的代码!
我们认准 ClientCnxn.SendThread,找到 startConnect 方法。以下是 ZooKeeper 3.6.3 中的源码:
另外,给大家看一下 ZooKeeper 3.4.10 中的源码:
3.1 SaslServerPrincipal.getServerPrincipal
经过测试发现,在 addr.getHostName() 和 ia.getCanonicalHostName() 处分别耗时 10s,共计花费时长 20s。
3.2 对比新老ZooKeeper的addr状态
以下是 ZooKeeper 3.4.10 中的源码调试时,addr.getHostName() 调用前,addr 的“状态”:
相对应的,ZooKeeper 3.6.3 中的源码调试时,addr.getHostName() 调用前,addr 的“状态”:
3.3 InetSocketAddress.getHostName源码分析
首先,调用 InetSocketAddress.getHostName,
// 当前在类文件 InetSocketAddress.java 中
public final String getHostName() {
// 调用InetSocketAddress的内部类InetSocketAddressHolder的getHostName方法
return holder.getHostName();
}
接着,继续看 InetSocketAddress.InetSocketAddressHolder 的 getHostName 方法:
// 当前在类文件 InetSocketAddress.java 中的内部类 InetSocketAddressHolder 中
private String getHostName() {
// 新老ZooKeeper的addr的此hostname都为null,跳过
if (hostname != null)
return hostname;
// 新老ZooKeeper代码中,此处的 addr 都是 Inet4Address 实例
if (addr != null)
return addr.getHostName();
return null;
}
当然,不管是 Inet4Address 还是 Inet6Address 都是 InetAddress 的子类,他们都调用的是基类的 getHostName 方法:
// 当前在类文件 InetAddress.java 中
public String getHostName() {
return getHostName(true);
}
String getHostName(boolean check) {
// ZooKeeper 3.4.10 代码在调试时,当前 if 条件判定为 false,跳过
// ZooKeeper 3.6.3 代码在调试时,当前 if 条件判定为 true,将调用 InetAddress.getHostFromNameService 方法
if (holder().getHostName() == null) {
holder().hostName = InetAddress.getHostFromNameService(this, check);
}
return holder().getHostName();
}
如果该地址(InetAddress)是用主机名(hostname)创建的,则会记住并返回该主机名;
否则,将执行DNS反向解析,并根据系统配置的名称查找服务返回结果。
3.4 创建InetAddress为什么不一样?
首先,新老ZooKeeper代码中的 addr 都是由方法 hostProvider.next(1000) 获取的。
这个方法的作用:就是从 StaticHostProvider 的成员变量 serverAddresses (该成员变量的类型是 InetSocketAddress 列表)中随机获取一个地址。
继续挖掘 serverAddresses 初始化的地方。
ZooKeeper 3.4.10
创建 ZooKeeper 对象时,需要传入 connectString 参数
经过 ConnectStringParser 处理后得到的 InetSocketAddress 列表,例如:
接着就是 StaticHostProvider 的构造函数的初始化:
public StaticHostProvider(Collection<InetSocketAddress> serverAddresses)
throws UnknownHostException {
for (InetSocketAddress address : serverAddresses) {
InetAddress ia = address.getAddress();
// 根据前面解析的情况,此时 ia == null,调用 address.getHostName 获取到 10.47.227.15 作为参数调用 getAllByName
InetAddress resolvedAddresses[] = InetAddress.getAllByName(
(ia!=null) ? ia.getHostAddress(): address.getHostName());
for (InetAddress resolvedAddress : resolvedAddresses) {
// If hostName is null but the address is not, we can tell that
// the hostName is an literal IP address. Then we can set the host string as the hostname
// safely to avoid reverse DNS lookup.
// As far as i know, the only way to check if the hostName is null is use toString().
// Both the two implementations of InetAddress are final class, so we can trust the return value of
// the toString() method.
if (resolvedAddress.toString().startsWith("/")
&& resolvedAddress.getAddress() != null) {
this.serverAddresses.add(
new InetSocketAddress(InetAddress.getByAddress(
// 关键就这里,使用用户传入的 connectString 的中的 host 作为主机名!显然也不是空的!
address.getHostName(),
resolvedAddress.getAddress()),
address.getPort()));
} else {
this.serverAddresses.add(new InetSocketAddress(resolvedAddress.getHostAddress(), address.getPort()));
}
}
}
if (this.serverAddresses.isEmpty()) {
throw new IllegalArgumentException(
"A HostProvider may not be empty!");
}
Collections.shuffle(this.serverAddresses);
}
getAllByName 的功能是根据 hostName 获取 IP 地址,源码如下:
本文中,走到红框位置,返回了一个 hostname=null,addr不为null 的 Inet4Address 对象。
ZooKeeper 3.6.3
我们再来看看新版本的 ZooKeeper 的构造函数:
public ZooKeeper(
String connectString,
int sessionTimeout,
Watcher watcher,
boolean canBeReadOnly) throws IOException {
this(connectString, sessionTimeout, watcher, canBeReadOnly, createDefaultHostProvider(connectString));
}
// default hostprovider
private static HostProvider createDefaultHostProvider(String connectString) {
// 虽然,写法上有一些差异,但是 StaticHostProvider 的初始化逻辑和老版本相差无几。
// ConnectStringParser的构造函数在解析connectString时增加了对 ipv6 地址解析的支持!
return new StaticHostProvider(new ConnectStringParser(connectString).getServerAddresses());
}
再来,就是 StaticHostProvider 的构造函数源码:
*init* 方法中主要是方法参数赋值给成员变量的操作,比较简单。
private void init(Collection serverAddresses, long randomnessSeed, Resolver resolver) {
this.sourceOfRandomness = new Random(randomnessSeed);
this.resolver = resolver;
if (serverAddresses.isEmpty()) {
throw new IllegalArgumentException("A HostProvider may not be empty!");
}
this.serverAddresses = shuffle(serverAddresses);
currentIndex = -1;
lastIndex = -1;
}
Resolver 对象的 getAllByName 方法的调用发生在 hostProvider.next(1000) 调用时。
3.5 总结
回到问题:创建的InetAddress为什么不一样?
答:
首先,connectString 中的 host:port 格式的字符串被解析后,通过 InetSocketAddress.createUnresolved(host, port) 创建为 InetSocketAddress 对象(这是一个未解析出具体IP地址的地址);
ZooKeeper 3.4.10 | ZooKeeper 3.6.3 | |
---|---|---|
解析IP地址的发生点 | StaticHostProvider 构造函数 | 调用 StaticHostProvider.next(long) 时 |
创建解析后的 InetSocketAddress 是否设置了hostName | 是,拿 connectString 的 host 作为 hostName | 否 |
因此 ZooKeeper 3.6.3 是需要 DNS 反向解析的,这就是新版本和老版本之间的区别。
3.6 InetAddress 部分方法说明
方法名 | 功能 | 受保护级别 |
---|---|---|
getAllByName | 给定主机名,返回其IP地址数组 | public |
getAddressesFromNameService | DNS解析,通过主机名获取IP地址 | private |
getHostName | 获取当前IP地址的主机名 | public |
getHostFromNameService | DNS反向解析,通过IP地址获取主机名 | private |
getCanonicalHostName | 获取此IP地址的完全限定域名 | public |
四、建议
- 如果没有IPv6方面的需求,可以考虑继续使用 ZooKeeper 3.4.10 版本;
- 如果一定要用 ZooKeeper 3.6.3 版本,但是用不到 SASL 认证,可以添加JVM参数 -Dzookeeper.sasl.client=false 来禁用 SASL 认证
参考文档
- 这篇找到了比较核心的原因
【Zookeeper】zookeeper not connected
- 这篇只找到了较为表层的原因,给出的解决方案也不好
InetAddress类中的getHostName()方法的坑
- 这个是通过添加 hosts 的方式,来解决 getHostName 阻塞的问题,感觉不是特别好(如果IP变了还需要新增 hosts 中的条目),但是也是个方法。