dremio 自动提升分片字段处理简单说明

dremio自动提升支持自动将文件夹转换为一个列，同时可以实现数据的过滤查询，是一个很不错的功能，比如我们在一些数据归档类的应用中可以基于事件进行分区，之后通过自动提升可以方便的进行数据查询

效果

查询效果

debug 效果

内部处理

对于dremio 自动提升的处理我已经介绍过了，实际上核心是分区字段的处理（对应的查询上就是表元数据）

保存处理

DatasetManager 中的getTableFromPlugin 方法

try {

      plugin.getSaver()

          .save(datasetConfig, handle.get(), plugin.unwrap(StoragePlugin.class), opportunisticSave, retrievalOptions,

                userName);

    } catch (ConcurrentModificationException cme) {

      // Some other query, or perhaps the metadata refresh, must have already created this dataset. Re-obtain it

      // from the namespace

      assert opportunisticSave : "Non-opportunistic saves should have already handled a CME";

      try {

        datasetConfig = userNamespaceService.getDataset(canonicalKey);

      } catch (NamespaceException e) {

        // We got a concurrent modification exception because a dataset existed. It shouldn't be the case that it

        // no longer exists. In the very rare case of this code racing with both another update *and* a dataset deletion

        // we should act as if the delete won

        logger.warn("Unable to obtain dataset {}. Likely race with dataset deletion", canonicalKey);

        return null;

      }

      final NamespaceTable namespaceTable = getTableFromNamespace(key, datasetConfig, plugin, accessUserName, options);

      options.getStatsCollector()

          .addDatasetStat(canonicalKey.getSchemaPath(), MetadataAccessType.CACHED_METADATA.name(),

              stopwatch.elapsed(TimeUnit.MILLISECONDS));

      return namespaceTable;

    } catch (AccessControlException ignored) {

      return null;

    }
 
    options.getStatsCollector()

        .addDatasetStat(canonicalKey.getSchemaPath(), MetadataAccessType.PARTIAL_METADATA.name(),

            stopwatch.elapsed(TimeUnit.MILLISECONDS));
 
    plugin.checkAccess(canonicalKey, datasetConfig, accessUserName, options);
 
    // TODO: use MaterializedSplitsPointer if metadata is not too big!

    final TableMetadata tableMetadata = new TableMetadataImpl(plugin.getId(), datasetConfig,

      accessUserName, DatasetSplitsPointer.of(userNamespaceService, datasetConfig),

      getPrimaryKey(plugin.getPlugin(), datasetConfig, schemaConfig, key,

        false /* The metadata for the table is either incomplete, missing or out of date.

        Storing in namespace can cause issues if the metadata is missing. Don't save here.

        */));

    return new NamespaceTable(tableMetadata, optionManager.getOption(FULL_NESTED_SCHEMA_SUPPORT));

  }

内部保存对于datasetconf 的扩展

DatasetSaverImpl saveUsingV1Flow

private void saveUsingV1Flow(DatasetHandle handle,

                               DatasetRetrievalOptions options,

                               DatasetConfig datasetConfig,

                               SourceMetadata sourceMetadata,

                               boolean opportunisticSave,

                               Function<DatasetConfig, DatasetConfig> datasetMutator,

                               NamespaceKey canonicalKey,

                               NamespaceAttribute... attributes) {

    if (datasetConfig.getId() == null) {

      // this is a new dataset, otherwise save will fail

      datasetConfig.setId(new EntityId(UUID.randomUUID().toString()));

    }
 
    NamespaceService.SplitCompression splitCompression = NamespaceService.SplitCompression.valueOf(optionManager.getOption(CatalogOptions.SPLIT_COMPRESSION_TYPE).toUpperCase());

    try (DatasetMetadataSaver saver = systemNamespace.newDatasetMetadataSaver(canonicalKey, datasetConfig.getId(), splitCompression,

            optionManager.getOption(CatalogOptions.SINGLE_SPLIT_PARTITION_MAX), optionManager.getOption(NamespaceOptions.DATASET_METADATA_CONSISTENCY_VALIDATE))) {

      final PartitionChunkListing chunkListing = sourceMetadata.listPartitionChunks(handle,

              options.asListPartitionChunkOptions(datasetConfig));
 
      final long recordCountFromSplits = saver == null || chunkListing == null ? 0 :

              CatalogUtil.savePartitionChunksInSplitsStores(saver,chunkListing);

      // 此处会基于实际的dataset handle 进行数据集元数据的获取，比如txt，json 格式的EasyFormatDatasetAccessor handle

      final DatasetMetadata datasetMetadata = sourceMetadata.getDatasetMetadata(handle, chunkListing,

              options.asGetMetadataOptions(datasetConfig));
 
      Optional<ByteString> readSignature = Optional.empty();

      if (sourceMetadata instanceof SupportsReadSignature) {

        final BytesOutput output = ((SupportsReadSignature) sourceMetadata).provideSignature(handle, datasetMetadata);

        //noinspection ObjectEquality

        if (output != BytesOutput.NONE) {

          readSignature = Optional.of(MetadataObjectsUtils.toProtostuff(output));

        }

      }

      // 处理会扩展分区信息

      MetadataObjectsUtils.overrideExtended(datasetConfig, datasetMetadata, readSignature,

              recordCountFromSplits, options.maxMetadataLeafColumns());

      datasetConfig = datasetMutator.apply(datasetConfig);
 
      saver.saveDataset(datasetConfig, opportunisticSave, attributes);

      updateListener.metadataUpdated(canonicalKey);

    } catch (DatasetMetadataTooLargeException e) {

      datasetConfig.setRecordSchema(null);

      datasetConfig.setReadDefinition(null);

      try {

        systemNamespace.addOrUpdateDataset(canonicalKey, datasetConfig);

      } catch (NamespaceException ignored) {

      }

      throw UserException.validationError(e)

              .build(logger);

    } catch (NamespaceException| IOException e) {

      throw UserException.validationError(e)

              .build(logger);

    }

  }

}

EasyFormatDatasetAccessor 参考实现

getBatchSchema 是一个核心实现，进行了扩展

private BatchSchema getBatchSchema(BatchSchema oldSchema, final FileSelection selection, final FileSystem dfs) throws Exception {

    final SabotContext context = formatPlugin.getContext();

    try (

        BufferAllocator sampleAllocator = context.getAllocator().newChildAllocator("sample-alloc", 0, Long.MAX_VALUE);

        OperatorContextImpl operatorContext = new OperatorContextImpl(context.getConfig(), context.getDremioConfig(), sampleAllocator, context.getOptionManager(), 1000, context.getExpressionSplitCache());

        SampleMutator mutator = new SampleMutator(sampleAllocator)

    ) {

     // 查找隐藏的列信息

      final ImplicitFilesystemColumnFinder explorer = new ImplicitFilesystemColumnFinder(context.getOptionManager(),

          dfs, GroupScan.ALL_COLUMNS, ImplicitFilesystemColumnFinder.Mode.ALL_IMPLICIT_COLUMNS);
 
      Optional<FileAttributes> fileName = selection.getFileAttributesList().stream().filter(input -> input.size() > 0).findFirst();

      final FileAttributes file = fileName.orElse(selection.getFileAttributesList().get(0));
 
      EasyDatasetSplitXAttr dataset = EasyDatasetSplitXAttr.newBuilder()

          .setStart(0L)

          .setLength(Long.MAX_VALUE)

          .setPath(file.getPath().toString())

          .build();

      try (RecordReader reader = new AdditionalColumnsRecordReader(operatorContext, ((EasyFormatPlugin) formatPlugin)

          .getRecordReader(operatorContext, dfs, dataset, GroupScan.ALL_COLUMNS), explorer.getImplicitFieldsForSample(selection), sampleAllocator)) {

        reader.setup(mutator);

        Map<String, ValueVector> fieldVectorMap = new HashMap<>();

        int i = 0;

        for (VectorWrapper<?> vw : mutator.getContainer()) {

          fieldVectorMap.put(vw.getField().getName(), vw.getValueVector());

          if (++i > maxLeafColumns) {

            throw new ColumnCountTooLargeException(maxLeafColumns);

          }

        }

        reader.allocate(fieldVectorMap);

        reader.next();

        mutator.getContainer().buildSchema(BatchSchema.SelectionVectorMode.NONE);

        // 进行采样的显式字段以及隐式的schema 合并，其中就有我们说的文件夹隐式字段

        return getMergedSchema(oldSchema,

          mutator.getContainer().getSchema(),

          oldConfig.isSchemaLearningEnabled(),

          oldConfig.getDropColumns(),

          oldConfig.getModifiedColumns(),

          context.getOptionManager().getOption(ExecConstants.ENABLE_INTERNAL_SCHEMA),

          tableSchemaPath.getPathComponents(),

          file.getPath().toString());

      }

    }

  }

一个小技巧

从我上边的图中可以看出还有几个隐藏字段也可以实现（文件名以及修改时间），是可以通过配置显示的

相关配置
ImplicitFilesystemColumnFinder

  public static final StringValidator IMPLICIT_FILE_FIELD_LABEL = new StringValidator("dremio.store.file.file-field-label", "$file");

  public static final StringValidator IMPLICIT_MOD_FIELD_LABEL = new StringValidator("dremio.store.file.mod-field-label", "$mtime");

  public static final BooleanValidator IMPLICIT_FILE_FIELD_ENABLE = new BooleanValidator("dremio.store.file.file-field-enabled", false);

  public static final BooleanValidator IMPLICIT_DIRS_FIELD_ENABLE = new BooleanValidator("dremio.store.file.dir-field-enabled", true);

  public static final BooleanValidator IMPLICIT_MOD_FIELD_ENABLE = new BooleanValidator("dremio.store.file.mod-field-enabled", false);

说明

以上是一个关于自动提升分区字段的简单说明，详细的可以参考源码学习，我只是简单介绍了EasyFormatDatasetAccessor，当然还会有其他一些实现

参考资料

sabot/kernel/src/main/java/com/dremio/exec/catalog/DatasetManager.java
sabot/kernel/src/main/java/com/dremio/exec/store/dfs/implicit/AdditionalColumnsRecordReader.java
sabot/kernel/src/main/java/com/dremio/exec/store/easy/EasyFormatDatasetAccessor.java
sabot/kernel/src/main/java/com/dremio/exec/catalog/DatasetSaverImpl.java
sabot/kernel/src/main/java/com/dremio/exec/store/dfs/implicit/ImplicitFilesystemColumnFinder.java
sabot/kernel/src/main/java/com/dremio/exec/util/MetadataSupportsInternalSchema.java
sabot/kernel/src/main/java/com/dremio/exec/store/dfs/implicit/ImplicitFilesystemColumnFinder.java

posted on 2024-02-27 08:00 荣锋亮阅读(14) 评论(0) 编辑收藏举报

会员力量，点亮园子希望

刷新页面返回顶部

rongfengliang-荣锋亮