dremio source 禁用source 不可用禁止移除与反射的一些问题
实际上dremio 的反射比较有意思,而且也比较强大,比如我们可以会想通过反射,当上游系统不可用的时候依然可以查询
但是实际效果并不是这样的
参考配置
如下
问题
- The source [s3] is currently unavailable. Metadata is not accessible; please check node health (or external storage) and permissions. Info: [com.amazonaws.SdkClientException: Unable to execute HTTP request: minio]
原因
尽管我们配置了不移除元数据信息,但是实际上存在一个问题就是,dremio 的ManagedStoragePlugin 在获取source plugin 的时候会进行checkState
- 参考处理
// 获取的状态,具体状态是通过PluginsManager处理的
protected void checkState() {
try(AutoCloseableLock l = readLock()) {
SourceState state = this.state;
if(state.getStatus() == SourceState.SourceStatus.bad) {
final String msg = state.getMessages().stream()
.map(m -> m.getMessage())
.collect(Collectors.joining(", "));
StringBuilder badStateMessage = new StringBuilder();
badStateMessage.append("The source [").append(sourceKey).append("] is currently unavailable. Metadata is not ");
badStateMessage.append("accessible; please check node health (or external storage) and permissions.");
if (!Strings.isNullOrEmpty(msg)) {
badStateMessage.append(" Info: [").append(msg).append("]");
}
String suggestedUserAction = this.state.getSuggestedUserAction();
if (!Strings.isNullOrEmpty(suggestedUserAction)) {
badStateMessage.append("\nAdditional actions: [").append(suggestedUserAction).append("]");
}
UserException.Builder builder = UserException.sourceInBadState().message(badStateMessage.toString());
for(Message message : state.getMessages()) {
builder.addContext(message.getLevel().name(), message.getMessage());
}
throw builder.buildSilently();
}
}
}
参考调用
- SourceMetadataManager 元数据刷新处理
实际上就是基于以前说的调度任务
if(isMaster) {
// we can schedule on all nodes since this is a clustered singleton and will only run on a single node.
this.wakeupTask = modifiableScheduler.schedule(
Schedule.Builder.everyMillis(WAKEUP_FREQUENCY_MS)
.asClusteredSingleton(METADATA_REFRESH_TASK_NAME_PREFIX + sourceKey)
.build(),
new WakeupWorker());
}
状态处理
private void wakeup() {
monitor.onWakeup();
// if we've never refreshed, initialize the refresh start times. We do this on wakeup since that will happen if this
// node gets assigned refresh responsibilities much later than the node initially comes up. It does leave the gap
// where we may refresh early if we do a refresh and then the task immediately migrates but that is probably okay
// for now.
if (!initialized) {
initializeRefresh();
// on first wakeup, we'll skip work so we can avoid a bunch of distracting exceptions when a plugin is first starting.
return;
}
try {
// bridge实际上就是ManagedStoragePlugin
bridge.refreshState();
} catch (TimeoutException ex) {
logger.debug("Source '{}' timed out while refreshing state, skipping refresh.", sourceKey, ex);
return;
} catch (Exception ex) {
logger.debug("Source '{}' refresh failed as we were unable to retrieve refresh it's state.", sourceKey, ex);
return;
}
if (!runLock.tryLock()) {
logger.info("Source '{}' delaying refresh since an adhoc refresh is currently active.", sourceKey);
return;
}
try (Closeable c = AutoCloseableLock.ofAlreadyOpen(runLock, true)) {
if ( !(fullRefresh.shouldRun() || namesRefresh.shouldRun()) ) {
return;
}
final SourceState sourceState = bridge.getState();
if (sourceState == null || sourceState.getStatus() == SourceStatus.bad) {
logger.info("Source '{}' skipping metadata refresh since it is currently in a bad state of {}.",
sourceKey, sourceState);
return;
}
final BackgroundRefresh refresh;
if(fullRefresh.shouldRun()) {
refresh = new BackgroundRefresh(fullRefresh, true);
} else {
refresh = new BackgroundRefresh(namesRefresh, false);
}
refresh.run();
} catch (RuntimeException e) {
logger.warn("Source '{}' metadata refresh failed to complete due to an exception.", sourceKey, e);
}
}
说明
目前来说dremio 对于部分source 的state check 没有进行配置的处理,所以当source 不可用的时候尽管我们进行了反射处理(同时查询计划实际上也是使用的反射存储),但是过一段实践还是会出现查询不能用的问题
参考资料
sabot/kernel/src/main/java/com/dremio/exec/catalog/ManagedStoragePlugin.java
sabot/kernel/src/main/java/com/dremio/exec/catalog/PluginsManager.java
sabot/kernel/src/main/java/com/dremio/exec/catalog/SourceMetadataManager.java