每日学习笔记（24）

1，一开始都是调用HttpMethod的getResponseBody()和getResponseBodyAsString，但这样总会出现下图中的警告信息

这是由于没有使用缓存的缘故，如果字符串数据过多，会出警告，应该改用流和缓存来读取数据：

InputStream resStream = null;

   String response = null;
        BufferedReader resBufferReader = null;

        try {
            httpClient.executeMethod(httpMethod);
            resStream = httpMethod.getResponseBodyAsStream();
            resBufferReader = new BufferedReader(new InputStreamReader(resStream));
            StringBuffer resBuffer = new StringBuffer();
            String resTemp = "";
            while((resTemp = resBufferReader.readLine()) != null){
                resBuffer.append(resTemp);
            }
            response = resBuffer.toString();

        } catch (Exception e) {
        }

2，连接Zookeeper的方法如下：

public static Configuration hBaseConfiguration = null;
public static HBaseAdmin hBaseAdmin = null;

    public static void init() {
        hBaseConfiguration = HBaseConfiguration.create();
        try {
            hBaseAdmin = new HBaseAdmin(hBaseConfiguration);
        } catch (Exception e) {
            throw new HbaseRuntimeException(e);
        }
    }

这里其实是将两个默认的配置文件加进来了，对于相同的配置项，后者会覆盖前者

conf.addResource("hbase-default.xml");
conf.addResource("hbase-site.xml");

可是总是报下面的错：

An error is preventing HBase from connecting to ZooKeeper

Caused by: java.io.IOException: Unable to determine ZooKeeper ensemble

断点跟踪到Zookeeper的源码发现是ZKUtil类的connect方法抛出的异常，

  public static ZooKeeper connect(Configuration conf, String ensemble,
      Watcher watcher, final String descriptor)
  throws IOException {
    if(ensemble == null) {
      throw new IOException("Unable to determine ZooKeeper ensemble");
    }
    int timeout = conf.getInt("zookeeper.session.timeout", 180 * 1000);
    LOG.debug(descriptor + " opening connection to ZooKeeper with ensemble (" +
        ensemble + ")");
    return new ZooKeeper(ensemble, timeout, watcher);
  }

上述代码表明是没有读取到Zookeeper集群的地址，这个地址是在ZooKeeperWatcher的构造函数中读取的：

this.quorum = ZKConfig.getZKQuorumServersString(conf);

继续跟下去，发现配置信息是makeZKProps方法读取出来的，

    // First check if there is a zoo.cfg in the CLASSPATH. If so, simply read
    // it and grab its configuration properties.
    ClassLoader cl = HQuorumPeer.class.getClassLoader();
    final InputStream inputStream =
      cl.getResourceAsStream(HConstants.ZOOKEEPER_CONFIG_NAME);
    if (inputStream != null) {
      try {
        return parseZooCfg(conf, inputStream);
      } catch (IOException e) {
        LOG.warn("Cannot read " + HConstants.ZOOKEEPER_CONFIG_NAME +
                 ", loading from XML files", e);
      }
}

看到这里才恍然大悟，它会首先去检查CLASSPATH下是否有zoo.cfg文件，如果有，则将其中的配置项读取出来作为Zookeeper的配置项，而此时就会完全不顾 hbase-default.xml和hbase-site.xml这两个配置文件了！

3， Zookeeper有两个异常需要特别认真地去考虑，

1）第一种情况是连接丢失，在丢失的这段时间，你的操作是不生效的，也就意味着你所做的delete,setData,makePath这些操作都是无效的，这就是第一个要特别去处理的异常信息

KeeperException.ConnectionLossException，处理的方法很简单，就是引入重试机制，指定好最大重试次数，重试间隔时间即可。

  public <T> T retryOperation(ZkOperation operation) throws KeeperException, InterruptedException {
    KeeperException exception = null;
    for (int i = 0; i < retryCount; i++) {
      try {
        return (T) operation.execute();
      } catch (KeeperException.ConnectionLossException e) {
        if (exception == null) {
          exception = e;
        }
        if (Thread.currentThread().isInterrupted()) {
          Thread.currentThread().interrupt();
          throw new InterruptedException();
        }
        retryDelay(i);
      }
    }
    throw exception;
  }

2）第二种情况是Session的超时。当你第一次连接Zookeeper时，是可以注册一个Watcher的，这个Watcher的作用就是应对Zookeeper连接成功和会话超时的，

当后者发生时，你必须进行尝试重新连接Zookeeper服务器的动作，一旦重新连接成功，你就可以做一些应用层的初始化动作，这里是通过onReconnect.command()来实现的，OnReconnect接口是一个钩子，用于重连完成时，回调进行一些初始化动作的。

  public synchronized void process(WatchedEvent event) {
    if (log.isInfoEnabled()) {
      log.info("Watcher " + this + " name:" + name + " got event " + event + " path:" + event.getPath() + " type:" + event.getType());
    }

    state = event.getState();
    if (state == KeeperState.SyncConnected) {
      connected = true;
      clientConnected.countDown();
    } else if (state == KeeperState.Expired) {
      connected = false;
      log.info("Attempting to reconnect to recover relationship with ZooKeeper...");
      //尝试重新连接zk服务器
      try {
        connectionStrategy.reconnect(zkServerAddress, zkClientTimeout, this,
            new ZkClientConnectionStrategy.ZkUpdate() {
              @Override
              public void update(SolrZooKeeper keeper) throws InterruptedException, TimeoutException, IOException {
                synchronized (connectionStrategy) {
                  waitForConnected(SolrZkClient.DEFAULT_CLIENT_CONNECT_TIMEOUT);
                  client.updateKeeper(keeper);
                  if (onReconnect != null) {
                    onReconnect.command();
                  }
                  synchronized (ConnectionManager.this) {
                    ConnectionManager.this.connected = true;
                  }
                }

              }
            });
      } catch (Exception e) {
        SolrException.log(log, "", e);
      }
      log.info("Connected:" + connected);
    } else if (state == KeeperState.Disconnected) {
      connected = false;
    } else {
      connected = false;
    }
    notifyAll();
  }

4，今天在做solr的master/slave切换时遇到一个让人困扰的问题

场景描述：

3个solr节点的集群，1个master节点，名为m1,2个slave节点，分别为s1,s2,每个solr节点都在Zookeeper集群中同一个Znode下注册为EPHEMERAL_SEQUENTIAL节点，分别可以得到一个序号，采取“序号最小者为master”的策略来进行master选举。若m1节点挂掉，则下一个序号最小的slave节点自动接替成为新的master，假定此slave是s1,则此时有3件事要完成:

1) s1节点上的solr核的solrConfig.xml配置文件中有关replication的片段，必须从slave的配置改成master的配置，并且reload其对应的solr核

2)其他slave节点（这里是s2)必须修改其配置文件中有关replication的片段，将原先指向m1的masterUrl改为指向s1，并且reload其对应的solr核

3）若原先挂掉的m1节点重新回到集群中来，则它会在上面提到的那个Znode下重新一个EPHEMERAL_SEQUENTIAL节点，并且序号肯定会比s1,s2的大，则m1会发现已经有新的master节点s1存在，自动识别出自己的身份是slave，其上的solr核也会采用有关slave的配置片段，并且指向s1所在的新的masterUrl

问题：

我现在碰到的情况是，s1将其配置文件从slave改为master，然后reload的结果是，索引目录文件由index变成了index.时间戳，导致s2这个slave节点在从s1复制索引时却是默认从index这个目录去复制的，从而无法找到索引文件，s1上的indexversion返回是0.

目前卡在这个地方，明天来好好研究下真实原因。。。

posted on 2012-03-01 20:17 Phinecos(洞庭散人) 阅读(5905) 评论(0) 编辑收藏举报

刷新页面返回顶部

登录后才能查看或发表评论，立即登录或者逛逛博客园首页

阅读排行：
· 阿里最新开源QwQ-32B，效果媲美deepseek-r1满血版，部署成本又又又降低了！
· 单线程的Redis速度为什么快？
· 展开说说关于C#中ORM框架的用法！
· SQL Server 2025 AI相关能力初探
· Pantheons：用 TypeScript 打造主流大模型对话的一站式集成库

历史上的今天：
2009-03-01 GDI+学习笔记（一）

每日学习笔记（24）

导航

统计

公告

积分与排名

随笔分类 (743)

随笔档案 (604)

常去的站点

我的好友

我的站点

阅读排行榜

评论排行榜

推荐排行榜

最新评论