dremio SafeExit处理流程简单说明

dremio的SafeExit实现了服务的安全退出,确保job 任务可以正确处理(当服务关闭的时候,没有处理的任务可以正常处理完成),以下是一个简单
处理流程说明

整体SafeExit 说明

目前dremio对于SafeExit的处理,核心是在NodeRegistration 中处理的,具体的触发是通过jvm的标准ShutdownHook,里边清理的任务包含了
ForemenWorkManager以及FragmentWorkManager,这两个也是dremio 查询执行的核心模块

参考实现类

 

DACDaemon 资源清理处理

  • main 入口
public static void main(final String[] args) throws Exception {
    try (TimedBlock b = Timer.time("main")) {
      DACConfig config = DACConfig.newConfig();
      DACDaemon daemon = newDremioDaemon(config, ClassPathScanner.fromPrescan(config.getConfig().getSabotConfig()));
      daemon.init();
      // 注册close hook
      daemon.closeOnJVMShutDown();
    }
}
  • closeOnJVMShutDown 处理
public void closeOnJVMShutDown() {
    // Set shutdown hook after services are initialized
    Runtime.getRuntime().addShutdownHook(new Thread(shutdownHook, "shutdown-thread"));
}
  • shutdownHook 方法
private final Runnable shutdownHook = new Runnable() {
    @Override
    public void run() {
      try {
       // 
        close();
      } catch (InterruptedException ignored) {
      } catch (Exception e) {
        logger.error("Failed to close services during shutdown", e);
      }
    }
};
  • SingletonRegistry close 实现

SingletonRegistry 包含一个ServiceRegistry,里边包含了所有的service 实现服务,当然也包含了实现SafeExit接口的模块

@Override
public void close() throws Exception {
    // 会进行service 的实现进行close
    registry.close();
}
  • ServiceRegistry close
@Override
public synchronized void close() throws Exception {
    if(!closed){
      closed = true;
      AutoCloseables.close(Lists.reverse(services));
    }
}

NodeRegistration 对于资源的清理

上边已经说了hooks 中会关闭实现了service 的服务,NodeRegistration 就是调用SafeExit处理的service 实现

  • close 方法
public synchronized void close() throws Exception {
    if (!closed) {
      closed = true;
      logger.info("Waiting for work to complete before shutdown.");
      // 主要安全关闭fragmentManager 以及foremenManager,对于MaestroService实现暂时没有直接进行关闭,后边分析下原因
      ThreadFactory threadFactory = new NamedThreadFactory("noderegistration-shutdown-");
      Thread t1 = waitToExit(threadFactory, fragmentManager);
      Thread t2 = waitToExit(threadFactory, foremenManager);
 
      t1.start();
      t2.start();
 
      t1.join();
      t2.join();
 
      logger.info("Unregistering node {}", endpointName);
      if (!registrationHandles.isEmpty()) {
        Thread t = threadFactory.newThread(new Runnable() {
          @Override
          public void run() {
            try {
              AutoCloseables.close(registrationHandles);
            } catch (Exception e) {
              logger.warn("Exception while closing registration handle", e);
            }
          }
        });
 
        t.start();
        try {
          t.join(dremioConfig.get().getSabotConfig().getInt(ExecConstants.ZK_REFRESH) * 2);
          if (t.isAlive()) {
            logger.warn("Timeout expired while trying to unregister node");
          }
        } catch (final InterruptedException e) {
          logger.warn("Interrupted while sleeping during coordination deregistration.");
          // Preserve evidence that the interruption occurred so that code higher up on the call stack can learn of the
          // interruption and respond to it if it wants to.
          Thread.currentThread().interrupt();
        }
      }
    }
  }

waitToExit 方法

private Thread waitToExit(ThreadFactory threadFactory, final Provider<? extends SafeExit> provider) {
    return threadFactory.newThread(new Runnable() {
      @Override
      public void run() {
        SafeExit safeExit;
        try {
          safeExit = provider.get();
        } catch (Exception ex){
          // ignore since this means no instance wasn't running on this node.
          return;
        }
        safeExit.waitToExit();
      }
    });
  }
}
  • ForemenWorkManager SafeExit 处理
 @Override
  public void waitToExit() {
    synchronized(this) {
      if (externalIdToForeman.isEmpty()) {
        return;
      }
 
      exitLatch = new ExtendedLatch();
    }
 
    // Wait for at most the configured graceful timeout or until the latch is released.
    exitLatch.awaitUninterruptibly(dbContext.get().getDremioConfig().getLong(
      DremioConfig.DREMIO_TERMINATION_GRACE_PERIOD_SECONDS) * 1000);
  }

说明

以上是一个简单说明dremio 关于SafeExit的处理,实际上里边还包含了不少细节,我只是大概说明下处理,详细的可以参考源码

参考资料

sabot/kernel/src/main/java/com/dremio/exec/work/SafeExit.java
sabot/kernel/src/main/java/com/dremio/exec/server/NodeRegistration.java
dac/backend/src/main/java/com/dremio/dac/daemon/DACDaemon.java
common/legacy/src/main/java/com/dremio/service/ServiceRegistry.java
common/legacy/src/main/java/com/dremio/service/SingletonRegistry.java
sabot/kernel/src/main/java/com/dremio/exec/work/protector/Foreman.java
sabot/kernel/src/main/java/com/dremio/exec/work/protector/ForemenWorkManager.java
sabot/kernel/src/main/java/com/dremio/sabot/exec/FragmentWorkManager.java

posted on 2024-03-09 08:00  荣锋亮  阅读(5)  评论(0编辑  收藏  举报

导航