dremio 的InformationSchemaCatalog 服务二

今天简单说明了下关于InformationSchemaCatalog 服务的介绍,当时介绍的InformationSchemaCatalog 更多属于基础能力(实际上这个类核心还是dremio 内部使用的)
对于在bi 工具以及系统使用的,dremio 是通过了独立的存储扩展(dremio 存储扩展比较方便,我们可以灵活的扩展dremio)

InfoSchemaConf定义

 
@SourceType(value = "INFORMATION_SCHEMA", configurable = false)
public class InfoSchemaConf extends ConnectionConf<InfoSchemaConf, InfoSchemaStoragePlugin> {
 
  @Override
  public InfoSchemaStoragePlugin newPlugin(SabotContext context, String name, Provider<StoragePluginId> pluginIdProvider) {
   // InfoSchemaStoragePlugin 存储扩展,实现了对于数据的查询,name 以及明确了查询了格式必须是INFORMATION_SCHEMA了,就类似我们自己开发的plugin 配置的sourcename
    return new InfoSchemaStoragePlugin(context, name);
  }
 
  @Override
  public boolean isInternal() {
    return true;
  }
}

InfoSchemaStoragePlugin 处理

InfoSchemaStoragePlugin 核心是读取操作,因为实际的数据已经直接在不通catalog 处理了(catalog 依赖了InformationSchemaCatalog服务)
类图

 

 


核心部分,主要包含了datasethandler 的处理

 
  @Override
  public DatasetHandleListing listDatasetHandles(GetDatasetOption... options) {
    return () -> Arrays.stream(InformationSchemaTable.values()).iterator();
  }
 
  @Override
  public Optional<DatasetHandle> getDatasetHandle(EntityPath datasetPath, GetDatasetOption... options) {
    if (datasetPath.size() != 2) {
      return Optional.empty();
    }
 
    final InformationSchemaTable table = TABLE_MAP.get(datasetPath.getName().toLowerCase());
    if (table == null) {
      return Optional.empty();
    }
 
    return Optional.of(table);
  }

对于datasethandle 的处理,nformationSchema包装了自己了,使用了枚举类型(InformationSchemaTable 类)
对于数据的读取操作,使用了InformationSchemaRecordReader, 此方法内部使用了包装的rpc 服务(InformationSchemaServiceGrpc)
具体在catalog-api 这个服务中

 

 


代码

 
public class InformationSchemaRecordReader extends AbstractRecordReader {
  private static final org.slf4j.Logger logger = org.slf4j.LoggerFactory.getLogger(InformationSchemaRecordReader.class);
  // grpc 的InformationSchemaServiceBlockingStub 
  private final InformationSchemaServiceBlockingStub catalogStub;
  // 使用了InformationSchemaTable 
  private final InformationSchemaTable table;
  private final String catalogName;
  private final String username;
  private final SearchQuery searchQuery;
  private final boolean complexTypeSupport;
 
  private Context.CancellableContext context;
  private TableWriter<?> tableWriter;
 
  public InformationSchemaRecordReader(
    OperatorContext context,
    List<SchemaPath> fields,
    InformationSchemaServiceBlockingStub catalogStub,
    InformationSchemaTable table,
    String catalogName,
    String username,
    SearchQuery searchQuery,
    boolean complexTypeSupport
  ) {
    super(context, fields);
    this.catalogStub = catalogStub;
    this.table = table;
    this.catalogName = catalogName;
    this.username = username;
    this.searchQuery = searchQuery;
    this.complexTypeSupport = complexTypeSupport;
  }
 
  @Override
  public void setup(OutputMutator output) {
    context = Context.current().withCancellation();
    context.run(() -> {
      tableWriter = createTableWriter();
      tableWriter.init(output);
    });
  }
 
  @Override
  public int next() {
    Preconditions.checkNotNull(tableWriter, "Reader must be #setup first");
    return tableWriter.write(numRowsPerBatch);
  }
 
  @Override
  public void close() throws Exception {
    if (context != null) {
      context.close();
    }
 
    context = null;
    tableWriter = null;
  }
 
  @Override
  protected boolean supportsSkipAllQuery() {
    return true;
  }
 
  private Set<String> getGivenFields() {
    return getColumns()
      .stream()
      .peek(path -> Preconditions.checkArgument(path.isSimplePath()))
      .map(path -> path.getAsUnescapedPath().toUpperCase())
      .collect(Collectors.toSet());
  }
 
  private TableWriter<?> createTableWriter() {
    final Set<String> selectedFields =
      isStarQuery() ? InformationSchemaMetadata.getAllFieldNames(table.getRecordSchema()) : getGivenFields();
 
    switch (table) {
 
      case CATALOGS: {
        final ListCatalogsRequest.Builder catalogsRequest = ListCatalogsRequest.newBuilder()
          .setUsername(username);
        if (searchQuery != null) {
          catalogsRequest.setQuery(searchQuery);
        }
 
        // start Catalog stream from catalog service
        return new CatalogsTableWriter(catalogStub.listCatalogs(catalogsRequest.build()), selectedFields, catalogName);
      }
 
      case SCHEMATA: {
        final ListSchemataRequest.Builder schemataRequest = ListSchemataRequest.newBuilder()
          .setUsername(username);
        if (searchQuery != null) {
          schemataRequest.setQuery(searchQuery);
        }
 
        // start Schema stream from catalog service
        return new SchemataTableWriter(catalogStub.listSchemata(schemataRequest.build()), selectedFields, catalogName);
      }
 
      case TABLES: {
        final ListTablesRequest.Builder tablesRequest = ListTablesRequest.newBuilder()
          .setUsername(username);
        if (searchQuery != null) {
          tablesRequest.setQuery(searchQuery);
        }
 
        // start Table stream from catalog service
        return new TablesTableWriter(catalogStub.listTables(tablesRequest.build()), selectedFields, catalogName);
      }
 
      case VIEWS: {
        final ListViewsRequest.Builder viewsRequest = ListViewsRequest.newBuilder()
          .setUsername(username);
        if (searchQuery != null) {
          viewsRequest.setQuery(searchQuery);
        }
 
        // start View stream from catalog service
        return new ViewsTableWriter(catalogStub.listViews(viewsRequest.build()), selectedFields, catalogName);
      }
 
      case COLUMNS: {
        final ListTableSchemataRequest.Builder columnsRequest = ListTableSchemataRequest.newBuilder()
          .setUsername(username);
        if (searchQuery != null) {
          columnsRequest.setQuery(searchQuery);
        }
        // start TableSchema stream from catalog service
        final Iterator<TableSchema> tableSchemata = catalogStub.listTableSchemata(columnsRequest.build());
 
        // For each TableSchema, iterates over #flatMap of batch_schema field, which represents the records in the
        // "COLUMNS" table, and not the TableSchema message itself (unlike other tables).
        final Iterator<Column> columnIterator = new AbstractIterator<Column>() {
          Iterator<Column> currentIterator = null;
 
          @Override
          protected Column computeNext() {
            while (true) {
              if (currentIterator != null && currentIterator.hasNext()) {
                return currentIterator.next();
              }
 
              if (!tableSchemata.hasNext()) {
                return endOfData();
              }
 
              // Gets next TableSchema from the catalog service only after exhausting the current one. See comment in
              // TableWriter#write.
              final TableSchema currentSchema = tableSchemata.next();
              BatchSchema bs = BatchSchema.deserialize(currentSchema.getBatchSchema().toByteArray());
              //If an inconsistency is detected don't attempt converting to Arrow format since it will cause an assertion failure.  Put out a warning and move on to next row.
              if (bs.getFieldCount() == 0) {
                // Add a warning message to indicate this table has missing fields
                logger.warn("{}.{}.{} has missing fields or incorrect format. ", currentSchema.getCatalogName(), currentSchema.getSchemaName(), currentSchema.getTableName());
                continue;
              }
              final RelDataType rowType =
                CalciteArrowHelper.wrap(bs)
                  .toCalciteRecordType(JavaTypeFactoryImpl.INSTANCE, complexTypeSupport);
              //noinspection ConstantConditions
              currentIterator = Iterators.transform(rowType.getFieldList().iterator(),
                field -> new Column(Strings.isNullOrEmpty(catalogName) ? currentSchema.getCatalogName() : catalogName,
                  currentSchema.getSchemaName(),
                  currentSchema.getTableName(),
                  field));
            }
          }
        };
        return new ColumnsTableWriter(columnIterator, selectedFields, catalogName);
      }
      default:
        throw UserException.unsupportedError()
          .message("InformationSchemaRecordReader does not support table of '%s' type", table)
          .buildSilently();
    }
  }
}

此调用链的学习,可以参考我以前写的文章
参考调用链

 

 

说明

如果了解presto 以及trion 这类查询引擎的话,他们的InformationSchema 是到插件级别的,并不是类似dremio以及drill 这种工具的,全局级别的

参考资料

sabot/kernel/src/main/java/com/dremio/exec/store/ischema/InfoSchemaConf.java
sabot/kernel/src/main/java/com/dremio/exec/store/ischema/InformationSchemaTable.java
sabot/kernel/src/main/java/com/dremio/exec/store/ischema/InfoSchemaStoragePlugin.java
sabot/kernel/src/main/java/com/dremio/exec/store/ischema/InformationSchemaRecordReader.java
services/catalog-api/target/generated-sources/protobuf/com/dremio/service/catalog/InformationSchemaServiceGrpc.java
https://www.cnblogs.com/rongfengliang/p/16463884.html

posted on   荣锋亮  阅读(38)  评论(0编辑  收藏  举报

相关博文:
阅读排行:
· 全程不用写代码,我用AI程序员写了一个飞机大战
· DeepSeek 开源周回顾「GitHub 热点速览」
· 记一次.NET内存居高不下排查解决与启示
· MongoDB 8.0这个新功能碉堡了,比商业数据库还牛
· .NET10 - 预览版1新功能体验(一)
历史上的今天:
2020-07-17 errors 一个很不错的golang 异常处理包
2020-07-17 使用nginx limit_rate 解决用户下载限速
2020-07-17 EdgeDB 1.0 Alpha 4 发布了
2020-07-17 静态文件请求路径 rewrite nginx && openresty 实现
2019-07-17 zeebe 0.20.0 发布生产可用了!
2019-07-17 手工部署yugabyte的几点说明
2019-07-17 yugabyte docker-compose 运行试用

导航

< 2025年3月 >
23 24 25 26 27 28 1
2 3 4 5 6 7 8
9 10 11 12 13 14 15
16 17 18 19 20 21 22
23 24 25 26 27 28 29
30 31 1 2 3 4 5
点击右上角即可分享
微信分享提示