dremio 异步读取但是没开启cache 的处理

dremio 对于文件系统支持异步读以及cache 处理,对于cache 的处理只有在开启异步读的时候

参考配置

  • 开启异步数据访问

 

  • cache 配置

 

内部处理

实际上都使用的是ce cache 包中的,getAsyncByteReader 的调用,此方法是标准FileSystem的一个方法

  • FileSystem 接口定义

 

  • 参考调用链
        at com.dremio.service.cachemanager.CacheFileSystemWrapper$CacheFileSystem.getAsyncByteReader(CacheFileSystemWrapper.java:312)
        at com.dremio.io.file.FilterFileSystem.getAsyncByteReader(FilterFileSystem.java:112)
        at com.dremio.exec.store.dfs.LoggedFileSystem.getAsyncByteReader(LoggedFileSystem.java:140)
        // AsyncSeekableInputStreamFactory ce 实现
        at com.dremio.extra.exec.store.dfs.iceberg.AsyncSeekableInputStreamFactory.getStream(AsyncSeekableInputStreamFactory.java:55)
        at com.dremio.exec.store.iceberg.DremioInputFile.newStream(DremioInputFile.java:75)
        at org.apache.iceberg.TableMetadataParser.read(TableMetadataParser.java:272)
        at org.apache.iceberg.TableMetadataParser.read(TableMetadataParser.java:266)
        at com.dremio.exec.store.iceberg.IcebergUtils.loadTableMetadata(IcebergUtils.java:1492)
        at com.dremio.exec.store.iceberg.IcebergManifestListRecordReader.setup(IcebergManifestListRecordReader.java:139)
        at com.dremio.sabot.op.scan.ScanOperator.setupReaderAsCorrectUser(ScanOperator.java:348)
  • AsyncSeekableInputStreamFactory.getStream 处理
public SeekableInputStream getStream(FileSystem fs, OperatorContext context, Path path, Long fileLength, Long mtime, List dataset, String datasourcePluginUID) throws IOException {
      boolean shouldUseAsync = fs.supportsAsync();
      if (shouldUseAsync && mtime != null && fileLength != null) {
         OptionManager options = context.getOptions();
         FileKey fileKey = FileKey.of(path, Long.toString(mtime), FileType.AVRO, dataset, datasourcePluginUID);
        // 此处使用了ce cache 中CacheFileSystem 的实现
         AsyncByteReader reader = fs.getAsyncByteReader(fileKey, options.optionsAsMap(new String[]{ExecConstants.S3_NATIVE_ASYNC_CLIENT.getOptionName(), ExecConstants.ENABLE_STORE_PARQUET_ASYNC_TIMESTAMP_CHECK.getOptionName()}));
         if (options.getOption(READ_TIMEOUT) != Long.MAX_VALUE) {
            reader = new AsyncByteReaderWithTimeout((AsyncByteReader)reader, options.getOption(READ_TIMEOUT));
         }
 
         SlidingWindowReader slidingWindowReader = new SlidingWindowReader((AsyncByteReader)reader, context.getAllocator(), (ColumnChunkMetaData)null, Range.closedOpen(0L, fileLength), fileLength, (int)options.getOption(CHUNK_SIZE), (int)options.getOption(CHUNK_COUNT_TARGET), context.getStats(), (new EventLog(new StreamInfo(path, context.getFragmentHandle(), context.getStats().getOperatorId()), options.getOption(ENABLE_ASYNC_DEBUG_LOGGING))).newColumn());
         slidingWindowReader.initialize((ReadChunk)null);
         return this.wrap(slidingWindowReader, (AsyncByteReader)reader, context.getStats());
      } else {
         return SeekableInputStreamFactory.DEFAULT.getStream(fs, context, path, fileLength, mtime, dataset, datasourcePluginUID);
      }
}
  • FileSystemPlugin创建对于FileSystemWrapper的使用
public FileSystem createFS(String userName, OperatorContext operatorContext, boolean metadata) throws IOException {
    使用了SabotContext提供得到fileSystemWrapper工具类,每个文件系统插件的实例都会有一些配置参数,就是上边ui的信息
    return context.getFileSystemWrapper().wrap(newFileSystem(userName, operatorContext), name, config, operatorContext,
        isAsyncEnabledForQuery(operatorContext) && getConfig().isAsyncEnabled(), metadata);
}
  • ce cache 对于CacheFileSystemWrapper的处理
public FileSystem wrap(FileSystem fs, String storageId, AsyncStreamConf conf, OperatorContext context, boolean enableAsync, boolean isMetadataRefresh) throws IOException {
      LOGGER.debug("cache-file-system-wrapper-creator for plugin-id {}, global-cm {}, local-cm {}, plugin-cm {}, operator-cm {}, isMetadataRefresh {}", new Object[]{storageId, this.cmo.getCacheManagerEnabled(), this.dremioConfig.getBoolean("services.executor.cache.enabled"), conf.getCacheProperties().isCachingEnabled(this.cmo.getOptionManager()), "true", isMetadataRefresh});
      boolean cachingEnabled = this.cmo.getCacheManagerEnabled() && this.dremioConfig.getBoolean("services.executor.cache.enabled") && conf.getCacheProperties().isCachingEnabled(this.cmo.getOptionManager());
      boolean invalidPluginId = storageId.contains(":::");
      if (cachingEnabled && !invalidPluginId) {
         boolean isExecutor = this.dremioConfig.getBoolean("services.executor.enabled");
         if (isMetadataRefresh) {
            return new CacheFileSystemWrapper.CacheFileSystem(fs, storageId, conf.getCacheProperties());
         } else {
            if (isExecutor && enableAsync) {
               if (this.cm == null) {
                  this.startCacheManager();
               }
 
               if (this.cm != null && !this.cm.isInError() && !this.cm.isClosed()) {
                  return new CacheFileSystemWrapper.CacheFileSystem(fs, storageId, conf.getCacheProperties());
               }
            }
 
            return fs;
         }
      } else {
        //  开启异步,但是没使用cache的,直接使用的是实际的FileSystem实现
         return fs;
      }
}
  • DremioInputFile.newStream 处理
    结合上边的调用链可以看出是通过DremioInputFile处理的
public SeekableInputStream newStream() {
    try {
     // ce kernel 实现的SeekableInputStreamFactory,对应上边的AsyncSeekableInputStreamFactory
      SeekableInputStreamFactory factory = io.getContext() == null || io.getDataset() == null ?
          SeekableInputStreamFactory.DEFAULT :
          io.getContext().getConfig().getInstance(SeekableInputStreamFactory.KEY, SeekableInputStreamFactory.class,
              SeekableInputStreamFactory.DEFAULT);
      return factory.getStream(io.getFs(), io.getContext(),
          path, fileSize, mtime, io.getDataset(), io.getDatasourcePluginUID());
    } catch (FileNotFoundException e) {
      throw new NotFoundException(e, "Path %s not found.", path);
    } catch (IOException e) {
      throw new UncheckedIOException(String.format("Failed to create new input stream for file: %s", path), e);
    }
}

说明

dremio 不少开源部分是依赖ce 包的,否则会有一些问题,很多时候分析的时候需要结合起来学习分析

参考资料

sabot/kernel/src/main/java/com/dremio/exec/store/iceberg/DremioInputFile.java
sabot/kernel/src/main/java/com/dremio/exec/store/dfs/FileSystemWrapper.java
sabot/kernel/src/main/java/com/dremio/exec/server/SabotContext.java
plugins/s3/src/main/java/com/dremio/plugins/s3/store/S3AsyncByteReader.java
common/legacy/src/main/java/com/dremio/io/AsyncByteReader.java
sabot/kernel/src/main/java/com/dremio/exec/store/dfs/FileSystemPlugin.java

posted on 2024-03-28 08:00  荣锋亮  阅读(12)  评论(0编辑  收藏  举报

导航