dremio 元数据自动提升为物理数据集的功能简单说明
dremio包含了一个元数据自动提升为物理数据集的功能,对于文件系统我们就不用进行格式提升了,dremio 就可以直接查询了,配置如下
当然此功能的前提是数据格式可以被自动发现(dremio 的easy foramt 能力),以下是一个内部处理的简单说明
一个参考调用链
stack com.dremio.exec.store.DatasetRetrievalOptions autoPromote
Affect(class count: 3 , method count: 1) cost in 687 ms, listenerId: 1
ts=2024-02-14 01:54:43;thread_name=1a33e29b-b1aa-abdf-ab00-a4b882830200/0:foreman-planning;id=256;is_daemon=true;priority=10;TCCL=sun.misc.Launcher$AppClassLoader@18b4aac2
@com.dremio.exec.store.DatasetRetrievalOptions.autoPromote()
at com.dremio.exec.store.dfs.FileSystemPlugin.getDatasetWithFormat(FileSystemPlugin.java:723)
at com.dremio.exec.store.dfs.FileSystemPlugin.getDatasetHandle(FileSystemPlugin.java:1772)
at com.dremio.exec.catalog.ManagedStoragePlugin.getDatasetHandle(ManagedStoragePlugin.java:1012)
at com.dremio.exec.catalog.DatasetManager.getTableFromPlugin(DatasetManager.java:428)
at com.dremio.exec.catalog.DatasetManager.getTable(DatasetManager.java:244)
at com.dremio.exec.catalog.CatalogImpl.getTableHelper(CatalogImpl.java:880)
at com.dremio.exec.catalog.CatalogImpl.getTable(CatalogImpl.java:245)
at com.dremio.exec.catalog.CatalogImpl.getTableForQuery(CatalogImpl.java:907)
at com.dremio.exec.catalog.SourceAccessChecker.lambda$getTableForQuery$5(SourceAccessChecker.java:164)
at com.dremio.exec.catalog.SourceAccessChecker.getIfVisible(SourceAccessChecker.java:114)
at com.dremio.exec.catalog.SourceAccessChecker.getTableForQuery(SourceAccessChecker.java:164)
at com.dremio.exec.catalog.DelegatingCatalog.getTableForQuery(DelegatingCatalog.java:125)
at com.dremio.exec.catalog.CachingCatalog.lambda$getTableForQuery$6(CachingCatalog.java:189)
at com.dremio.exec.catalog.CachingCatalog.timedGet(CachingCatalog.java:246)
at com.dremio.exec.catalog.CachingCatalog.getTableForQuery(CachingCatalog.java:189)
at com.dremio.exec.ops.PlannerCatalogImpl.getValidatedTableWithSchema(PlannerCatalogImpl.java:108)
at com.dremio.exec.ops.DremioCatalogReader.getTable(DremioCatalogReader.java:106)
at com.dremio.exec.ops.DremioCatalogReader.getTable(DremioCatalogReader.java:82)
at org.apache.calcite.sql.validate.DremioEmptyScope.resolveTable(DremioEmptyScope.java:44)
at org.apache.calcite.sql.validate.DremioEmptyScope.resolveTable(DremioEmptyScope.java:34)
at org.apache.calcite.sql.validate.DelegatingScope.resolveTable(DelegatingScope.java:203)
at org.apache.calcite.sql.validate.IdentifierNamespace.resolveImpl(IdentifierNamespace.java:129)
at org.apache.calcite.sql.validate.IdentifierNamespace.validateImpl(IdentifierNamespace.java:199)
at org.apache.calcite.sql.validate.AbstractNamespace.validate(AbstractNamespace.java:84)
at org.apache.calcite.sql.validate.SqlValidatorImpl.validateNamespace(SqlValidatorImpl.java:982)
at org.apache.calcite.sql.validate.SqlValidatorImpl.validateQuery(SqlValidatorImpl.java:963)
at org.apache.calcite.sql.validate.SqlValidatorImpl.validateFrom(SqlValidatorImpl.java:3212)
at org.apache.calcite.sql.validate.SqlValidatorImpl.validateFrom(SqlValidatorImpl.java:3194)
at org.apache.calcite.sql.validate.SqlValidatorImpl.validateSelect(SqlValidatorImpl.java:3471)
at org.apache.calcite.sql.validate.SelectNamespace.validateImpl(SelectNamespace.java:60)
at org.apache.calcite.sql.validate.AbstractNamespace.validate(AbstractNamespace.java:84)
at org.apache.calcite.sql.validate.SqlValidatorImpl.validateNamespace(SqlValidatorImpl.java:982)
at org.apache.calcite.sql.validate.SqlValidatorImpl.validateQuery(SqlValidatorImpl.java:963)
at org.apache.calcite.sql.SqlSelect.validate(SqlSelect.java:247)
at org.apache.calcite.sql.validate.SqlValidatorImpl.validateScopedExpression(SqlValidatorImpl.java:938)
at com.dremio.exec.planner.sql.SqlValidatorImpl.validate(SqlValidatorImpl.java:129)
at com.dremio.exec.planner.sql.SqlValidatorAndToRelContext.validate(SqlValidatorAndToRelContext.java:80)
at com.dremio.exec.planner.sql.handlers.SqlToRelTransformer.validateNode(SqlToRelTransformer.java:165)
at com.dremio.exec.planner.sql.handlers.SqlToRelTransformer.validateAndConvert(SqlToRelTransformer.java:140)
at com.dremio.exec.planner.sql.handlers.SqlToRelTransformer.validateAndConvert(SqlToRelTransformer.java:102)
at com.dremio.exec.planner.sql.handlers.query.NormalHandler.getPlan(NormalHandler.java:73)
at com.dremio.exec.planner.sql.handlers.commands.HandlerToExec.plan(HandlerToExec.java:59)
at com.dremio.exec.work.foreman.AttemptManager.plan(AttemptManager.java:561)
at com.dremio.exec.work.foreman.AttemptManager.lambda$run$4(AttemptManager.java:458)
at com.dremio.service.commandpool.ReleasableBoundCommandPool.lambda$getWrappedCommand$3(ReleasableBoundCommandPool.java:140)
at com.dremio.service.commandpool.CommandWrapper.run(CommandWrapper.java:70)
at com.dremio.context.RequestContext.run(RequestContext.java:109)
at com.dremio.common.concurrent.ContextMigratingExecutorService.lambda$decorate$4(ContextMigratingExecutorService.java:227)
at com.dremio.common.concurrent.ContextMigratingExecutorService$ComparableRunnable.run(ContextMigratingExecutorService.java:207)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:750)
ts=2024-02-14 01:55:02;thread_name=metadata-refresh-modifiable-scheduler-23;id=291;is_daemon=true;priority=10;TCCL=sun.misc.Launcher$AppClassLoader@18b4aac2
@com.dremio.exec.store.DatasetRetrievalOptions.autoPromote()
at com.dremio.exec.store.dfs.FileSystemPlugin.getDatasetHandle(FileSystemPlugin.java:1758)
at com.dremio.exec.catalog.NamespaceListing$TransformingIterator.populateNextHandle(NamespaceListing.java:135)
at com.dremio.exec.catalog.NamespaceListing$TransformingIterator.hasNext(NamespaceListing.java:91)
at com.dremio.exec.catalog.MetadataSynchronizer.synchronizeDatasets(MetadataSynchronizer.java:199)
at com.dremio.exec.catalog.MetadataSynchronizer.go(MetadataSynchronizer.java:136)
at com.dremio.exec.catalog.SourceMetadataManager$RefreshRunner.refreshFull(SourceMetadataManager.java:466)
at com.dremio.exec.catalog.SourceMetadataManager$BackgroundRefresh.run(SourceMetadataManager.java:580)
at com.dremio.exec.catalog.SourceMetadataManager.wakeup(SourceMetadataManager.java:288)
at com.dremio.exec.catalog.SourceMetadataManager.access$300(SourceMetadataManager.java:100)
at com.dremio.exec.catalog.SourceMetadataManager$WakeupWorker.run(SourceMetadataManager.java:227)
at com.dremio.service.scheduler.LocalSchedulerService$CancellableTask.run(LocalSchedulerService.java:252)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:750)
参考处理
实际处理还是在sql 查询的catalog 校验中,对于支持元数据配置此选项的source 插件,对于dataset handle 的处理需要特殊化
- 参考代码
结合上边的调用链,可以看出先是catalog 的getTable,然后是DatasetManager 中的getTable
catalog getTable 处理
public DremioTable getTable(NamespaceKey key) {
final NamespaceKey resolvedKey = resolveToDefault(key);
if (resolvedKey != null) {
final DremioTable table = getTableHelper(resolvedKey);
if (table != null) {
return table;
}
}
return getTableHelper(key);
}
getTableHelper 会调用DatasetManager 的getTable 同时发现之后还会更新元数据信息
private DremioTable getTableHelper(NamespaceKey key) {
Span.current().setAttribute("dremio.namespace.key.schemapath", key.getSchemaPath());
final DremioTable table = datasets.getTable(key, options, false);
if (table != null) {
// 找到之后还会按需更新catalog 的元数据信息(对于viewtable)
return updateTableIfNeeded(key, table);
}
return null;
}
DatasetManager getTable 处理
public DremioTable getTable(
NamespaceKey key,
MetadataRequestOptions options,
boolean ignoreColumnCount
){
final DatasetConfig config = getConfig(key);
if(config != null) {
// canonicalize the path.
key = new NamespaceKey(config.getFullPathList());
}
if(isAmbiguousKey(key)) {
key = getCanonicalKey(key);
}
String pluginName = key.getRoot();
final ManagedStoragePlugin plugin = plugins.getPlugin(pluginName, false);
if (config == null) {
logger.debug("Got a null config");
} else {
logger.debug("Got config id {}", config.getId());
}
if(plugin != null) {
// if we have a plugin and the info isn't a vds (this happens in home, where VDS are intermingled with plugin datasets).
if(config == null || config.getType() != DatasetType.VIRTUAL_DATASET) {
return getTableFromPlugin(key, config, plugin, options, ignoreColumnCount);
}
}
if(config == null) {
return null;
}
// at this point, we should only be looking at virtual datasets.
if(config.getType() != DatasetType.VIRTUAL_DATASET) {
// if we're not looking at a virtual dataset, it must mean that we hit a race condition where the source has been removed but the dataset was retrieved just before.
return null;
}
return createTableFromVirtualDataset(config, options);
}
- getTableFromPlugin 内部处理
final Optional<DatasetHandle> handle;
try {
// 会使用托管存储插件的getDatasetHandle,对于文件系统就是内部的处理
handle = plugin.getDatasetHandle(key, datasetConfig, retrievalOptions);
} catch (ConnectorException e) {
throw UserException.validationError(e)
.message("Failure while retrieving dataset [%s].", key)
.build(logger);
}
- ManagedStoragePlugin 内部处理
参考处理
public Optional<DatasetHandle> getDatasetHandle(
NamespaceKey key,
DatasetConfig datasetConfig,
DatasetRetrievalOptions retrievalOptions
) throws ConnectorException {
try (AutoCloseableLock ignored = readLock()) {
checkState();
final EntityPath entityPath;
if(datasetConfig != null) {
entityPath = new EntityPath(datasetConfig.getFullPathList());
} else {
entityPath = MetadataObjectsUtils.toEntityPath(key);
}
// include the full path of the dataset
// 调用实际内部的FileSystemPlugin插件方法
Span.current().setAttribute("dremio.dataset.path", PathUtils.constructFullPath(entityPath.getComponents()));
return plugin.getDatasetHandle(entityPath,
retrievalOptions.asGetDatasetOptions(datasetConfig));
}
}
- FileSystemPlugin文件系统存储插件的处理
public Optional<DatasetHandle> getDatasetHandle(EntityPath datasetPath, GetDatasetOption
throws ConnectorException {
BatchSchema currentSchema = CurrentSchemaOption.getSchema(options);
FileConfig fileConfig = FileConfigOption.getFileConfig(options);
List<String> sortColumns = SortColumnsOption.getSortColumns(options);
List<Field> droppedColumns = CurrentSchemaOption.getDroppedColumns(options);
List<Field> updatedColumns = CurrentSchemaOption.getUpdatedColumns(options);
boolean isSchemaLearningEnabled = CurrentSchemaOption.isSchemaLearningEnabled(options);
FormatPluginConfig formatPluginConfig = null;
if (fileConfig != null) {
formatPluginConfig = PhysicalDatasetUtils.toFormatPlugin(fileConfig, Collections.<String>emptyList());
}
InternalMetadataTableOption internalMetadataTableOption = InternalMetadataTableOption.getInternalMetadataTableOption(options);
if (internalMetadataTableOption != null) {
TimeTravelOption.TimeTravelRequest timeTravelRequest = Optional.ofNullable(TimeTravelOption.getTimeTravelOption(options))
.map(TimeTravelOption::getTimeTravelRequest)
.orElse(null);
return getDatasetHandleForInternalMetadataTable(datasetPath, formatPluginConfig, timeTravelRequest, internalMetadataTableOption);
}
Optional<DatasetHandle> handle = Optional.empty();
try {
handle = getDatasetHandleForNewRefresh(
MetadataObjectsUtils.toNamespaceKey(datasetPath),
fileConfig,
DatasetRetrievalOptions.of(options));
} catch (AccessControlException e) {
if (!DatasetRetrievalOptions.of(options).ignoreAuthzErrors()) {
logger.debug(e.getMessage());
throw UserException.permissionError(e)
.message("Not authorized to read table %s at path ", datasetPath)
.build(logger);
}
} catch (IOException e) {
logger.debug("Failed to create table {}", datasetPath, e);
}
if(handle.isPresent()) {
// handle is UnlimitedSplitsDatasetHandle, dataset is parquet
if(DatasetRetrievalOptions.of(options).autoPromote() ) {
// autoPromote will allow this handle to work, regardless whether dataset is/is-not promoted
return handle;
} else if(fileConfig != null){
// dataset has already been promoted
return handle;
} else {
// dataset not promoted, handle cannot be used without incorrectly triggering auto-promote
return Optional.empty();
}
}
// 实际首次会执行的是这个,上边获取的dataset handle 依然为null,会调用getDatasetWithFormat 方法,
final PreviousDatasetInfo pdi = new PreviousDatasetInfo(fileConfig, currentSchema, sortColumns, droppedColumns, updatedColumns, isSchemaLearningEnabled);
try {
return Optional.ofNullable(getDatasetWithFormat(MetadataObjectsUtils.toNamespaceKey(datasetPath), pdi,
formatPluginConfig, DatasetRetrievalOptions.of(options), SystemUser.SYSTEM_USERNAME));
} catch (Exception e) {
Throwables.propagateIfPossible(e, ConnectorException.class);
throw new ConnectorException(e);
}
}
- FileSystemPlugin getDatasetWithFormat 内部的处理
如果包含了autoPromote,会结合实践的文件格式,选择不同的formatplugin
if (datasetAccessor == null &&
retrievalOptions.autoPromote()) {
boolean formatFound = false;
for (final FormatMatcher matcher : matchers) {
try {
final FileSelectionProcessor fileSelectionProcessor = matcher.getFormatPlugin().getFileSelectionProcessor(fs, fileSelection);
if (matcher.matches(fs, fileSelection, codecFactory)) {
formatFound = true;
final DatasetType type = fs.isDirectory(Path.of(fileSelection.getSelectionRoot()))
? DatasetType.PHYSICAL_DATASET_SOURCE_FOLDER : DatasetType.PHYSICAL_DATASET_SOURCE_FILE;
final FileSelection normalizedFileSelection = fileSelectionProcessor.normalizeForPlugin(fileSelection);
final FileUpdateKey updateKey = fileSelectionProcessor.generateUpdateKey();
datasetAccessor = matcher.getFormatPlugin()
.getDatasetAccessor(type, oldConfig, fs, normalizedFileSelection, this, datasetPath,
updateKey, retrievalOptions.maxMetadataLeafColumns(), retrievalOptions.getTimeTravelRequest());
if (datasetAccessor != null) {
break;
}
}
} catch (IOException e) {
logger.debug("File read failed.", e);
}
}
说明
以上是关于文件系统类的source 自动元数据提升的一个简单说明,当然在以上处理的基础上,因为已经获取到了dataset handle 我们还需要进行dataset 的存储以及更新一些统计信息
参考处理DatasetManager 中的 getTableFromPlugin 之后返回一个NamespaceTable
boolean opportunisticSave = (datasetConfig == null);
if (opportunisticSave) {
datasetConfig = MetadataObjectsUtils.newShallowConfig(handle.get());
}
logger.debug("Attempting inline refresh for key : {} , canonicalKey : {} ", key, canonicalKey);
try {
plugin.getSaver()
.save(datasetConfig, handle.get(), plugin.unwrap(StoragePlugin.class), opportunisticSave, retrievalOptions,
userName);
} catch (ConcurrentModificationException cme) {
// Some other query, or perhaps the metadata refresh, must have already created this dataset. Re-obtain it
// from the namespace
assert opportunisticSave : "Non-opportunistic saves should have already handled a CME";
try {
datasetConfig = userNamespaceService.getDataset(canonicalKey);
} catch (NamespaceException e) {
// We got a concurrent modification exception because a dataset existed. It shouldn't be the case that it
// no longer exists. In the very rare case of this code racing with both another update *and* a dataset deletion
// we should act as if the delete won
logger.warn("Unable to obtain dataset {}. Likely race with dataset deletion", canonicalKey);
return null;
}
参考资料
sabot/kernel/src/main/java/com/dremio/exec/catalog/CatalogImpl.java
sabot/kernel/src/main/java/com/dremio/exec/ops/PlannerCatalog.java
sabot/kernel/src/main/java/com/dremio/exec/catalog/EntityExplorer.java
sabot/kernel/src/main/java/com/dremio/exec/ops/PlannerCatalogImpl.java
sabot/kernel/src/main/java/com/dremio/exec/ops/DremioCatalogReader.java
sabot/kernel/src/main/java/com/dremio/exec/catalog/DatasetManager.java
sabot/kernel/src/main/java/com/dremio/exec/store/dfs/FileSystemPlugin.java
sabot/kernel/src/main/java/com/dremio/exec/store/dfs/easy/EasyFormatPlugin.java
sabot/kernel/src/main/java/com/dremio/exec/store/easy/json/JSONFormatPlugin.java