dremio 表函数简单说明
dremio 对于表函数的处理实际上还是基于了apcahe calcite ,只是对于dremio 来说,使用相对不是很多
目前dremio 比较多的就是外部查询,还有就是关于iceberg 的一些支持函数
参考使用
- 外部查询
SELECT b.customer_id, a.product_id, a.price
FROM table(postgresql.external_query( source_a,
'SELECT product_id, price
FROM products' )) AS a,
source_b.sales AS b
WHERE b.product_id = a.product_id'
- iceberg 函数
包含了table_files,table_history,table_manifests....
SELECT *
FROM TABLE( table_files('<table_path>.<table_name>') )
SELECT *
FROM TABLE( table_history('<table_path>.<table_name>') )
SELECT *
FROM TABLE( table_manifests('<table_path>.<table_name>') )
参考实现
对于实现都是基于了TableMacro 接口,可以实现返回table,注意此方法是编译时使用的
- 外部查询的处理
sabot/kernel/src/main/java/com/dremio/exec/tablefunctions/ExternalQuery.java
public final class ExternalQuery implements TableMacro {
private static final List<FunctionParameter> FUNCTION_PARAMS = new ParameterListBuilder()
.add(String.class, "query").build();
private final Function<String, BatchSchema> schemaBuilder;
private final Function<BatchSchema, RelDataType> rowTypeBuilder;
private final StoragePluginId pluginId;
public ExternalQuery(Function<String, BatchSchema> schemaBuilder,
Function<BatchSchema, RelDataType> rowTypeBuilder,
StoragePluginId pluginId) {
this.schemaBuilder = schemaBuilder;
this.rowTypeBuilder = rowTypeBuilder;
this.pluginId = pluginId;
}
@Override
public List<FunctionParameter> getParameters() {
return FUNCTION_PARAMS;
}
@Override
public TranslatableTable apply(List<? extends Object> arguments) {
// 实现了TranslatableTable 接口
return ExternalQueryTranslatableTable.create(schemaBuilder, rowTypeBuilder, pluginId, (String) arguments.get(0));
}
}
- iceberg 处理
MetadataFunctionsMacro, 基于了自己定义的VersionedTableMacro
public class MetadataFunctionsMacro extends VersionedTableMacro {
public enum MacroName {
TABLE_HISTORY("table_history"),
TABLE_MANIFESTS("table_manifests"),
TABLE_SNAPSHOT("table_snapshot"),
TABLE_FILES("table_files");
private final String name;
MacroName(String name) {
this.name = name;
}
public String getName() {
return name;
}
}
private final TranslatableTableResolver tableResolver;
private static final List<FunctionParameter> FUNCTION_PARAMS = new ReflectiveFunctionBase.ParameterListBuilder()
.add(String.class, "table_name").build();
public MetadataFunctionsMacro(TranslatableTableResolver tableResolver) {
this.tableResolver = tableResolver;
}
@Override
public List<FunctionParameter> getParameters() {
return FUNCTION_PARAMS;
}
@Override
public TranslatableTable apply(final List<? extends Object> arguments, TableVersionContext tableVersionContext) {
final List<String> tablePath = splitTableIdentifier((String) arguments.get(0));
return tableResolver.find(tablePath, tableVersionContext);
}
}
SupportsExternalQuery 接口
契约定义如下,包含了外部查询的处理
表函数的使用
sabot/kernel/src/main/java/com/dremio/exec/catalog/CatalogImpl.java
具体会在sql 校验处理阶段判断
@Override
public Collection<Function> getFunctions(NamespaceKey path,
FunctionType functionType) {
final NamespaceKey resolvedPath = resolveSingle(path);
switch (functionType) {
case TABLE:
// 会进行不少判断,比如iceberg 的以及其他模式的
return getUserDefinedTableFunctions(path, resolvedPath);
case SCALAR:
return getUserDefinedScalarFunctions(resolvedPath);
default:
return ImmutableList.of();
}
}
同时在物理计划处理上
ExternalQueryScanPrel.java
@Override
public PhysicalOperator getPhysicalOperator(PhysicalPlanCreator creator) throws IOException {
final SupportsExternalQuery externalQueryImplementor = creator.getContext().getCatalogService().getSource(pluginId);
return externalQueryImplementor.getExternalQueryPhysicalOperator(creator, this, batchSchema, sql);
}
PlannerPhase 也会使用到,实际上就是对于不同阶段的处理应用规则
static final RuleSet getPhysicalRules(OptimizerRulesContext optimizerRulesContext) {
final List<RelOptRule> ruleList = new ArrayList<>();
final PlannerSettings ps = optimizerRulesContext.getPlannerSettings();
ruleList.add(SortConvertPrule.INSTANCE);
ruleList.add(SortPrule.INSTANCE);
ruleList.add(ProjectPrule.INSTANCE);
ruleList.add(FlattenPrule.INSTANCE);
ruleList.add(ScreenPrule.INSTANCE);
ruleList.add(ExpandConversionRule.INSTANCE);
ruleList.add(FilterPrule.INSTANCE);
ruleList.add(LimitPrule.INSTANCE);
ruleList.add(SamplePrule.INSTANCE);
ruleList.add(SampleToLimitPrule.INSTANCE);
ruleList.add(WriterPrule.INSTANCE);
ruleList.add(WindowPrule.INSTANCE);
ruleList.add(PushLimitToTopN.INSTANCE);
ruleList.add(LimitUnionExchangeTransposeRule.INSTANCE);
ruleList.add(UnionAllPrule.INSTANCE);
ruleList.add(ValuesPrule.INSTANCE);
ruleList.add(EmptyPrule.INSTANCE);
ruleList.add(ExternalQueryScanPrule.INSTANCE);
ruleList.add(MFunctionQueryScanPrule.INSTANCE);
if (ps.isHashAggEnabled()) {
ruleList.add(HashAggPrule.INSTANCE);
}
说明
目前来说官方提供的社区扩展也只有jdbc 实现了SupportsExternalQuery 接口的定义,参考实现
// 物理操作器的生成,主要是JdbcGroupScan定义
public PhysicalOperator getExternalQueryPhysicalOperator(PhysicalPlanCreator creator, ExternalQueryScanPrel prel, BatchSchema schema, String sql) {
SchemaBuilder schemaBuilder = BatchSchema.newBuilder();
com.google.common.collect.ImmutableSet.Builder<String> skippedColumnsBuilder = new com.google.common.collect.ImmutableSet.Builder();
this.filterBatchSchema(schema, schemaBuilder, skippedColumnsBuilder);
BatchSchema filteredSchema = schemaBuilder.build();
ImmutableSet<String> skippedColumns = skippedColumnsBuilder.build();
return new JdbcGroupScan(creator.props(prel, "$dremio$", schema, JdbcPrel.RESERVE, JdbcPrel.LIMIT), sql, (List)filteredSchema.getFields().stream().map((f) -> {
return SchemaPath.getSimplePath(f.getName());
}).collect(ImmutableList.toImmutableList()), this.getPluginId(), filteredSchema, skippedColumns);
}
参考资料
https://docs.dremio.com/software/data-sources/external-queries/
https://docs.dremio.com/software/sql-reference/sql-commands/apache-iceberg-tables/apache-iceberg-select/
sabot/kernel/src/main/java/com/dremio/exec/catalog/CatalogImpl.java
sabot/kernel/src/main/java/com/dremio/exec/planner/PlannerPhase.java
https://calcite.apache.org/javadocAggregate/org/apache/calcite/schema/TableMacro.html
sabot/kernel/src/main/java/com/dremio/exec/tablefunctions/ExternalQueryScanPrel.java