dremio 异步读取但是没开启cache 的处理
dremio 对于文件系统支持异步读以及cache 处理,对于cache 的处理只有在开启异步读的时候
参考配置
- 开启异步数据访问
- cache 配置
内部处理
实际上都使用的是ce cache 包中的,getAsyncByteReader 的调用,此方法是标准FileSystem的一个方法
- FileSystem 接口定义
- 参考调用链
at com.dremio.service.cachemanager.CacheFileSystemWrapper$CacheFileSystem.getAsyncByteReader(CacheFileSystemWrapper.java:312)
at com.dremio.io.file.FilterFileSystem.getAsyncByteReader(FilterFileSystem.java:112)
at com.dremio.exec.store.dfs.LoggedFileSystem.getAsyncByteReader(LoggedFileSystem.java:140)
// AsyncSeekableInputStreamFactory ce 实现
at com.dremio.extra.exec.store.dfs.iceberg.AsyncSeekableInputStreamFactory.getStream(AsyncSeekableInputStreamFactory.java:55)
at com.dremio.exec.store.iceberg.DremioInputFile.newStream(DremioInputFile.java:75)
at org.apache.iceberg.TableMetadataParser.read(TableMetadataParser.java:272)
at org.apache.iceberg.TableMetadataParser.read(TableMetadataParser.java:266)
at com.dremio.exec.store.iceberg.IcebergUtils.loadTableMetadata(IcebergUtils.java:1492)
at com.dremio.exec.store.iceberg.IcebergManifestListRecordReader.setup(IcebergManifestListRecordReader.java:139)
at com.dremio.sabot.op.scan.ScanOperator.setupReaderAsCorrectUser(ScanOperator.java:348)
- AsyncSeekableInputStreamFactory.getStream 处理
public SeekableInputStream getStream(FileSystem fs, OperatorContext context, Path path, Long fileLength, Long mtime, List dataset, String datasourcePluginUID) throws IOException {
boolean shouldUseAsync = fs.supportsAsync();
if (shouldUseAsync && mtime != null && fileLength != null) {
OptionManager options = context.getOptions();
FileKey fileKey = FileKey.of(path, Long.toString(mtime), FileType.AVRO, dataset, datasourcePluginUID);
// 此处使用了ce cache 中CacheFileSystem 的实现
AsyncByteReader reader = fs.getAsyncByteReader(fileKey, options.optionsAsMap(new String[]{ExecConstants.S3_NATIVE_ASYNC_CLIENT.getOptionName(), ExecConstants.ENABLE_STORE_PARQUET_ASYNC_TIMESTAMP_CHECK.getOptionName()}));
if (options.getOption(READ_TIMEOUT) != Long.MAX_VALUE) {
reader = new AsyncByteReaderWithTimeout((AsyncByteReader)reader, options.getOption(READ_TIMEOUT));
}
SlidingWindowReader slidingWindowReader = new SlidingWindowReader((AsyncByteReader)reader, context.getAllocator(), (ColumnChunkMetaData)null, Range.closedOpen(0L, fileLength), fileLength, (int)options.getOption(CHUNK_SIZE), (int)options.getOption(CHUNK_COUNT_TARGET), context.getStats(), (new EventLog(new StreamInfo(path, context.getFragmentHandle(), context.getStats().getOperatorId()), options.getOption(ENABLE_ASYNC_DEBUG_LOGGING))).newColumn());
slidingWindowReader.initialize((ReadChunk)null);
return this.wrap(slidingWindowReader, (AsyncByteReader)reader, context.getStats());
} else {
return SeekableInputStreamFactory.DEFAULT.getStream(fs, context, path, fileLength, mtime, dataset, datasourcePluginUID);
}
}
- FileSystemPlugin创建对于FileSystemWrapper的使用
public FileSystem createFS(String userName, OperatorContext operatorContext, boolean metadata) throws IOException {
使用了SabotContext提供得到fileSystemWrapper工具类,每个文件系统插件的实例都会有一些配置参数,就是上边ui的信息
return context.getFileSystemWrapper().wrap(newFileSystem(userName, operatorContext), name, config, operatorContext,
isAsyncEnabledForQuery(operatorContext) && getConfig().isAsyncEnabled(), metadata);
}
- ce cache 对于CacheFileSystemWrapper的处理
public FileSystem wrap(FileSystem fs, String storageId, AsyncStreamConf conf, OperatorContext context, boolean enableAsync, boolean isMetadataRefresh) throws IOException {
LOGGER.debug("cache-file-system-wrapper-creator for plugin-id {}, global-cm {}, local-cm {}, plugin-cm {}, operator-cm {}, isMetadataRefresh {}", new Object[]{storageId, this.cmo.getCacheManagerEnabled(), this.dremioConfig.getBoolean("services.executor.cache.enabled"), conf.getCacheProperties().isCachingEnabled(this.cmo.getOptionManager()), "true", isMetadataRefresh});
boolean cachingEnabled = this.cmo.getCacheManagerEnabled() && this.dremioConfig.getBoolean("services.executor.cache.enabled") && conf.getCacheProperties().isCachingEnabled(this.cmo.getOptionManager());
boolean invalidPluginId = storageId.contains(":::");
if (cachingEnabled && !invalidPluginId) {
boolean isExecutor = this.dremioConfig.getBoolean("services.executor.enabled");
if (isMetadataRefresh) {
return new CacheFileSystemWrapper.CacheFileSystem(fs, storageId, conf.getCacheProperties());
} else {
if (isExecutor && enableAsync) {
if (this.cm == null) {
this.startCacheManager();
}
if (this.cm != null && !this.cm.isInError() && !this.cm.isClosed()) {
return new CacheFileSystemWrapper.CacheFileSystem(fs, storageId, conf.getCacheProperties());
}
}
return fs;
}
} else {
// 开启异步,但是没使用cache的,直接使用的是实际的FileSystem实现
return fs;
}
}
- DremioInputFile.newStream 处理
结合上边的调用链可以看出是通过DremioInputFile处理的
public SeekableInputStream newStream() {
try {
// ce kernel 实现的SeekableInputStreamFactory,对应上边的AsyncSeekableInputStreamFactory
SeekableInputStreamFactory factory = io.getContext() == null || io.getDataset() == null ?
SeekableInputStreamFactory.DEFAULT :
io.getContext().getConfig().getInstance(SeekableInputStreamFactory.KEY, SeekableInputStreamFactory.class,
SeekableInputStreamFactory.DEFAULT);
return factory.getStream(io.getFs(), io.getContext(),
path, fileSize, mtime, io.getDataset(), io.getDatasourcePluginUID());
} catch (FileNotFoundException e) {
throw new NotFoundException(e, "Path %s not found.", path);
} catch (IOException e) {
throw new UncheckedIOException(String.format("Failed to create new input stream for file: %s", path), e);
}
}
说明
dremio 不少开源部分是依赖ce 包的,否则会有一些问题,很多时候分析的时候需要结合起来学习分析
参考资料
sabot/kernel/src/main/java/com/dremio/exec/store/iceberg/DremioInputFile.java
sabot/kernel/src/main/java/com/dremio/exec/store/dfs/FileSystemWrapper.java
sabot/kernel/src/main/java/com/dremio/exec/server/SabotContext.java
plugins/s3/src/main/java/com/dremio/plugins/s3/store/S3AsyncByteReader.java
common/legacy/src/main/java/com/dremio/io/AsyncByteReader.java
sabot/kernel/src/main/java/com/dremio/exec/store/dfs/FileSystemPlugin.java