dremio 表函数简单说明

dremio 对于表函数的处理实际上还是基于了apcahe calcite ,只是对于dremio 来说,使用相对不是很多
目前dremio 比较多的就是外部查询,还有就是关于iceberg 的一些支持函数

参考使用

  • 外部查询
 
SELECT b.customer_id, a.product_id, a.price
FROM table(postgresql.external_query( source_a, 
'SELECT product_id, price
FROM products' )) AS a, 
source_b.sales AS b
    WHERE b.product_id = a.product_id'
  • iceberg 函数
    包含了table_files,table_history,table_manifests....
 
SELECT * 
FROM TABLE( table_files('<table_path>.<table_name>') )
 
SELECT * 
FROM TABLE( table_history('<table_path>.<table_name>') )
 
SELECT * 
FROM TABLE( table_manifests('<table_path>.<table_name>') )

参考实现

对于实现都是基于了TableMacro 接口,可以实现返回table,注意此方法是编译时使用的

  • 外部查询的处理
    sabot/kernel/src/main/java/com/dremio/exec/tablefunctions/ExternalQuery.java
 
public final class ExternalQuery implements TableMacro {
  private static final List<FunctionParameter> FUNCTION_PARAMS = new ParameterListBuilder()
    .add(String.class, "query").build();
 
  private final Function<String, BatchSchema> schemaBuilder;
  private final Function<BatchSchema, RelDataType> rowTypeBuilder;
  private final StoragePluginId pluginId;
 
  public ExternalQuery(Function<String, BatchSchema> schemaBuilder,
                       Function<BatchSchema, RelDataType> rowTypeBuilder,
                       StoragePluginId pluginId) {
    this.schemaBuilder = schemaBuilder;
    this.rowTypeBuilder = rowTypeBuilder;
    this.pluginId = pluginId;
  }
 
  @Override
  public List<FunctionParameter> getParameters() {
    return FUNCTION_PARAMS;
  }
 
  @Override
  public TranslatableTable apply(List<? extends Object> arguments) {
    // 实现了TranslatableTable 接口
    return ExternalQueryTranslatableTable.create(schemaBuilder, rowTypeBuilder, pluginId, (String) arguments.get(0));
  }
}
  • iceberg 处理
    MetadataFunctionsMacro, 基于了自己定义的VersionedTableMacro
 
public class MetadataFunctionsMacro extends VersionedTableMacro {
 
  public enum MacroName {
    TABLE_HISTORY("table_history"),
    TABLE_MANIFESTS("table_manifests"),
    TABLE_SNAPSHOT("table_snapshot"),
    TABLE_FILES("table_files");
    private final String name;
    MacroName(String name) {
      this.name = name;
    }
    public String getName() {
      return name;
    }
  }
 
  private final TranslatableTableResolver tableResolver;
  private static final List<FunctionParameter> FUNCTION_PARAMS = new ReflectiveFunctionBase.ParameterListBuilder()
    .add(String.class, "table_name").build();
 
  public MetadataFunctionsMacro(TranslatableTableResolver tableResolver) {
    this.tableResolver = tableResolver;
  }
 
  @Override
  public List<FunctionParameter> getParameters() {
    return FUNCTION_PARAMS;
  }
 
  @Override
  public TranslatableTable apply(final List<? extends Object> arguments, TableVersionContext tableVersionContext) {
    final List<String> tablePath = splitTableIdentifier((String) arguments.get(0));
    return tableResolver.find(tablePath, tableVersionContext);
  }
}

SupportsExternalQuery 接口

契约定义如下,包含了外部查询的处理

表函数的使用

sabot/kernel/src/main/java/com/dremio/exec/catalog/CatalogImpl.java
具体会在sql 校验处理阶段判断

 
 @Override
  public Collection<Function> getFunctions(NamespaceKey path,
    FunctionType functionType) {
    final NamespaceKey resolvedPath = resolveSingle(path);
    switch (functionType) {
      case TABLE:
         // 会进行不少判断,比如iceberg 的以及其他模式的
        return getUserDefinedTableFunctions(path, resolvedPath);
      case SCALAR:
        return getUserDefinedScalarFunctions(resolvedPath);
      default:
        return ImmutableList.of();
    }
  }

同时在物理计划处理上
ExternalQueryScanPrel.java

 
@Override
public PhysicalOperator getPhysicalOperator(PhysicalPlanCreator creator) throws IOException {
final SupportsExternalQuery externalQueryImplementor = creator.getContext().getCatalogService().getSource(pluginId);
return externalQueryImplementor.getExternalQueryPhysicalOperator(creator, this, batchSchema, sql);
}

PlannerPhase 也会使用到,实际上就是对于不同阶段的处理应用规则

static final RuleSet getPhysicalRules(OptimizerRulesContext optimizerRulesContext) {
    final List<RelOptRule> ruleList = new ArrayList<>();
    final PlannerSettings ps = optimizerRulesContext.getPlannerSettings();
 
    ruleList.add(SortConvertPrule.INSTANCE);
    ruleList.add(SortPrule.INSTANCE);
    ruleList.add(ProjectPrule.INSTANCE);
    ruleList.add(FlattenPrule.INSTANCE);
    ruleList.add(ScreenPrule.INSTANCE);
    ruleList.add(ExpandConversionRule.INSTANCE);
    ruleList.add(FilterPrule.INSTANCE);
    ruleList.add(LimitPrule.INSTANCE);
    ruleList.add(SamplePrule.INSTANCE);
    ruleList.add(SampleToLimitPrule.INSTANCE);
    ruleList.add(WriterPrule.INSTANCE);
    ruleList.add(WindowPrule.INSTANCE);
    ruleList.add(PushLimitToTopN.INSTANCE);
    ruleList.add(LimitUnionExchangeTransposeRule.INSTANCE);
    ruleList.add(UnionAllPrule.INSTANCE);
    ruleList.add(ValuesPrule.INSTANCE);
    ruleList.add(EmptyPrule.INSTANCE);
    ruleList.add(ExternalQueryScanPrule.INSTANCE);
    ruleList.add(MFunctionQueryScanPrule.INSTANCE);
 
    if (ps.isHashAggEnabled()) {
      ruleList.add(HashAggPrule.INSTANCE);
    }

说明

目前来说官方提供的社区扩展也只有jdbc 实现了SupportsExternalQuery 接口的定义,参考实现

// 物理操作器的生成,主要是JdbcGroupScan定义
public PhysicalOperator getExternalQueryPhysicalOperator(PhysicalPlanCreator creator, ExternalQueryScanPrel prel, BatchSchema schema, String sql) {
  SchemaBuilder schemaBuilder = BatchSchema.newBuilder();
  com.google.common.collect.ImmutableSet.Builder<String> skippedColumnsBuilder = new com.google.common.collect.ImmutableSet.Builder();
  this.filterBatchSchema(schema, schemaBuilder, skippedColumnsBuilder);
  BatchSchema filteredSchema = schemaBuilder.build();
  ImmutableSet<String> skippedColumns = skippedColumnsBuilder.build();
  return new JdbcGroupScan(creator.props(prel, "$dremio$", schema, JdbcPrel.RESERVE, JdbcPrel.LIMIT), sql, (List)filteredSchema.getFields().stream().map((f) -> {
     return SchemaPath.getSimplePath(f.getName());
  }).collect(ImmutableList.toImmutableList()), this.getPluginId(), filteredSchema, skippedColumns);
}

参考资料

https://docs.dremio.com/software/data-sources/external-queries/
https://docs.dremio.com/software/sql-reference/sql-commands/apache-iceberg-tables/apache-iceberg-select/
sabot/kernel/src/main/java/com/dremio/exec/catalog/CatalogImpl.java
sabot/kernel/src/main/java/com/dremio/exec/planner/PlannerPhase.java
https://calcite.apache.org/javadocAggregate/org/apache/calcite/schema/TableMacro.html
sabot/kernel/src/main/java/com/dremio/exec/tablefunctions/ExternalQueryScanPrel.java

posted on 2023-01-03 18:32  荣锋亮  阅读(117)  评论(0编辑  收藏  举报

导航