dremio source 禁用source 不可用禁止移除与反射的一些问题

实际上dremio 的反射比较有意思,而且也比较强大,比如我们可以会想通过反射,当上游系统不可用的时候依然可以查询
但是实际效果并不是这样的

参考配置

如下

问题

  • The source [s3] is currently unavailable. Metadata is not accessible; please check node health (or external storage) and permissions. Info: [com.amazonaws.SdkClientException: Unable to execute HTTP request: minio]

原因

尽管我们配置了不移除元数据信息,但是实际上存在一个问题就是,dremio 的ManagedStoragePlugin 在获取source plugin 的时候会进行checkState

  • 参考处理
  // 获取的状态,具体状态是通过PluginsManager处理的
  protected void checkState() {
    try(AutoCloseableLock l = readLock()) {
      SourceState state = this.state;
      if(state.getStatus() == SourceState.SourceStatus.bad) {
        final String msg = state.getMessages().stream()
          .map(m -> m.getMessage())
          .collect(Collectors.joining(", "));
 
        StringBuilder badStateMessage = new StringBuilder();
        badStateMessage.append("The source [").append(sourceKey).append("] is currently unavailable. Metadata is not ");
        badStateMessage.append("accessible; please check node health (or external storage) and permissions.");
        if (!Strings.isNullOrEmpty(msg)) {
          badStateMessage.append(" Info: [").append(msg).append("]");
        }
        String suggestedUserAction = this.state.getSuggestedUserAction();
        if (!Strings.isNullOrEmpty(suggestedUserAction)) {
          badStateMessage.append("\nAdditional actions: [").append(suggestedUserAction).append("]");
        }
        UserException.Builder builder = UserException.sourceInBadState().message(badStateMessage.toString());
 
        for(Message message : state.getMessages()) {
          builder.addContext(message.getLevel().name(), message.getMessage());
        }
 
        throw builder.buildSilently();
      }
    }
  }

参考调用

  • SourceMetadataManager 元数据刷新处理
    实际上就是基于以前说的调度任务
  if(isMaster) {
    // we can schedule on all nodes since this is a clustered singleton and will only run on a single node.
    this.wakeupTask = modifiableScheduler.schedule(
        Schedule.Builder.everyMillis(WAKEUP_FREQUENCY_MS)
          .asClusteredSingleton(METADATA_REFRESH_TASK_NAME_PREFIX + sourceKey)
          .build(),
          new WakeupWorker());
  }

状态处理

private void wakeup() {
 
    monitor.onWakeup();
 
    // if we've never refreshed, initialize the refresh start times. We do this on wakeup since that will happen if this
    // node gets assigned refresh responsibilities much later than the node initially comes up. It does leave the gap
    // where we may refresh early if we do a refresh and then the task immediately migrates but that is probably okay
    // for now.
    if (!initialized) {
      initializeRefresh();
      // on first wakeup, we'll skip work so we can avoid a bunch of distracting exceptions when a plugin is first starting.
      return;
    }
 
    try { 
     // bridge实际上就是ManagedStoragePlugin
      bridge.refreshState();
    } catch (TimeoutException ex) {
      logger.debug("Source '{}' timed out while refreshing state, skipping refresh.", sourceKey, ex);
      return;
    } catch (Exception ex) {
      logger.debug("Source '{}' refresh failed as we were unable to retrieve refresh it's state.", sourceKey, ex);
      return;
    }
 
    if (!runLock.tryLock()) {
      logger.info("Source '{}' delaying refresh since an adhoc refresh is currently active.", sourceKey);
      return;
    }
 
    try (Closeable c = AutoCloseableLock.ofAlreadyOpen(runLock, true)) {
      if ( !(fullRefresh.shouldRun() || namesRefresh.shouldRun()) ) {
        return;
      }
 
      final SourceState sourceState = bridge.getState();
      if (sourceState == null || sourceState.getStatus() == SourceStatus.bad) {
        logger.info("Source '{}' skipping metadata refresh since it is currently in a bad state of {}.",
            sourceKey, sourceState);
        return;
      }
 
      final BackgroundRefresh refresh;
      if(fullRefresh.shouldRun()) {
        refresh = new BackgroundRefresh(fullRefresh, true);
      } else {
        refresh = new BackgroundRefresh(namesRefresh, false);
      }
      refresh.run();
    } catch (RuntimeException e) {
      logger.warn("Source '{}' metadata refresh failed to complete due to an exception.", sourceKey, e);
    }
 
  }

说明

目前来说dremio 对于部分source 的state check 没有进行配置的处理,所以当source 不可用的时候尽管我们进行了反射处理(同时查询计划实际上也是使用的反射存储),但是过一段实践还是会出现查询不能用的问题

参考资料

sabot/kernel/src/main/java/com/dremio/exec/catalog/ManagedStoragePlugin.java
sabot/kernel/src/main/java/com/dremio/exec/catalog/PluginsManager.java
sabot/kernel/src/main/java/com/dremio/exec/catalog/SourceMetadataManager.java

posted on 2024-02-18 08:00  荣锋亮  阅读(13)  评论(0编辑  收藏  举报

导航