dremio 元数据处理

dremio 的元数据会影响查询的执行,以及bi 工具的使用,所以会包含两部分,读以及写,dremio 包含了定时刷新的,以及在source 首次创建
的时候(adhoc )

参考图

写入保存,基于页面操作的核心是DatasetSaver 实现的save 方法

 

 

 
Affect(class count: 1 , method count: 3) cost in 314 ms, listenerId: 4
ts=2022-10-12 06:27:29;thread_name=metadata-refresh-modifiable-scheduler-17;id=a3;is_daemon=true;priority=10;TCCL=sun.misc.Launcher$AppClassLoader@18b4aac2
    @com.dremio.exec.catalog.DatasetSaverImpl.save()
        at com.dremio.exec.catalog.DatasetSaverImpl.save(DatasetSaverImpl.java:137)
        at com.dremio.exec.catalog.MetadataSynchronizer.tryHandleExistingDataset(MetadataSynchronizer.java:311)
        at com.dremio.exec.catalog.MetadataSynchronizer.handleExistingDataset(MetadataSynchronizer.java:229)
        at com.dremio.exec.catalog.MetadataSynchronizer.synchronizeDatasets(MetadataSynchronizer.java:201)
        at com.dremio.exec.catalog.MetadataSynchronizer.go(MetadataSynchronizer.java:134)
        at com.dremio.exec.catalog.SourceMetadataManager$RefreshRunner.refreshFull(SourceMetadataManager.java:441)
        at com.dremio.exec.catalog.SourceMetadataManager$BackgroundRefresh.run(SourceMetadataManager.java:555)
        at com.dremio.exec.catalog.SourceMetadataManager.wakeup(SourceMetadataManager.java:264)
        at com.dremio.exec.catalog.SourceMetadataManager.access$300(SourceMetadataManager.java:96)
        at com.dremio.exec.catalog.SourceMetadataManager$WakeupWorker.run(SourceMetadataManager.java:203)
        at com.dremio.service.scheduler.LocalSchedulerService$CancellableTask.run(LocalSchedulerService.java:226)
        at com.jprofiler.agent.callee.RunnableTracking.run(ejt:19)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:750)

读取(核心是listTableSchemata 基于InformationSchemaCatalog 实现处理)

ts=2022-10-12 06:23:25;thread_name=grpc-default-executor-31;id=385;is_daemon=true;priority=5;TCCL=sun.misc.Launcher$AppClassLoader@18b4aac2
    @com.dremio.exec.catalog.InformationSchemaCatalogImpl.listTableSchemata()
        at com.dremio.exec.catalog.CatalogImpl.listTableSchemata(CatalogImpl.java:1720)
        at com.dremio.exec.catalog.SourceAccessChecker.listTableSchemata(SourceAccessChecker.java:514)
        at com.dremio.exec.catalog.DelegatingCatalog.listTableSchemata(DelegatingCatalog.java:365)
        at com.dremio.exec.catalog.InformationSchemaServiceImpl.listTableSchemata(InformationSchemaServiceImpl.java:164)
        at com.dremio.service.catalog.InformationSchemaServiceGrpc$MethodHandlers.invoke(InformationSchemaServiceGrpc.java:663)
        at io.grpc.stub.ServerCalls$UnaryServerCallHandler$UnaryServerCallListener.onHalfClose(ServerCalls.java:182)
        at io.grpc.PartialForwardingServerCallListener.onHalfClose(PartialForwardingServerCallListener.java:35)
        at io.grpc.ForwardingServerCallListener.onHalfClose(ForwardingServerCallListener.java:23)
        at io.grpc.ForwardingServerCallListener$SimpleForwardingServerCallListener.onHalfClose(ForwardingServerCallListener.java:40)
        at io.grpc.Contexts$ContextualizedServerCallListener.onHalfClose(Contexts.java:86)
        at io.grpc.PartialForwardingServerCallListener.onHalfClose(PartialForwardingServerCallListener.java:35)
        at io.grpc.ForwardingServerCallListener.onHalfClose(ForwardingServerCallListener.java:23)
        at io.grpc.ForwardingServerCallListener$SimpleForwardingServerCallListener.onHalfClose(ForwardingServerCallListener.java:40)
        at io.opentracing.contrib.grpc.TracingServerInterceptor$2.onHalfClose(TracingServerInterceptor.java:231)
        at io.grpc.PartialForwardingServerCallListener.onHalfClose(PartialForwardingServerCallListener.java:35)
        at io.grpc.ForwardingServerCallListener.onHalfClose(ForwardingServerCallListener.java:23)
        at io.grpc.ForwardingServerCallListener$SimpleForwardingServerCallListener.onHalfClose(ForwardingServerCallListener.java:40)
        at io.grpc.util.TransmitStatusRuntimeExceptionInterceptor$1.onHalfClose(TransmitStatusRuntimeExceptionInterceptor.java:74)
        at io.grpc.internal.ServerCallImpl$ServerStreamListenerImpl.halfClosed(ServerCallImpl.java:340)
        at io.grpc.internal.ServerImpl$JumpToApplicationThreadServerStreamListener$1HalfClosed.runInContext(ServerImpl.java:866)
        at io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)
        at com.jprofiler.agent.callee.RunnableTracking.run(ejt:19)
        at io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:133)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:750)
 
 

定时任务刷新元数据

 

 

jdbc 数据集列表处理handler,具体代码的jdbc 存储插件中(社区版,暂时没有开源),核心实现需要依赖一个fecher 服务 JdbcSchemaFetcherImpl

 
public class JdbcDatasetHandle implements DatasetHandle {
        private final EntityPath entityPath;
        private final String catalog;
        private final String schema;
        private final String table;
        private JdbcFetcherProto.GetTableMetadataResponse tableMetadataResponse = null;
 
        JdbcDatasetHandle(String catalog, String schema, String table) {
            this.catalog = catalog;
            this.schema = schema;
            this.table = table;
            ImmutableList.Builder<String> builder = ImmutableList.builder();
            builder.add(JdbcStoragePlugin.this.config.getSourceName());
            if (!Strings.isNullOrEmpty(catalog)) {
                builder.add(catalog);
            }
 
            if (!Strings.isNullOrEmpty(schema)) {
                builder.add(schema);
            }
 
            builder.add(table);
            this.entityPath = new EntityPath(builder.build());
        }
 
        public EntityPath getDatasetPath() {
            return this.entityPath;
        }
 
        JdbcFetcherProto.GetTableMetadataResponse getTableMetadataResponse() {
            if (this.tableMetadataResponse == null) {
                this.tableMetadataResponse = JdbcStoragePlugin.this.fetcher.getTableMetadata(GetTableMetadataRequest.newBuilder().setCatalog(this.catalog).setSchema(this.schema).setTable(this.table).build());
            }
 
            return this.tableMetadataResponse;
        }
    }
 
    class JdbcIteratorListing implements DatasetHandleListing {
        private final Set<CloseableIterator<JdbcFetcherProto.CanonicalizeTablePathResponse>> references = new HashSet();
 
        JdbcIteratorListing() {
        }
 
        public Iterator<DatasetHandle> iterator() {
            CloseableIterator<JdbcFetcherProto.CanonicalizeTablePathResponse> iterator = JdbcStoragePlugin.this.fetcher.listTableNames(ListTableNamesRequest.newBuilder().build());
            this.references.add(iterator);
            return Iterators.transform(iterator, (input) -> {
                return JdbcStoragePlugin.this.new JdbcDatasetHandle(input.getCatalog(), input.getSchema(), input.getTable());
            });
        }
 
        public void close() {
            try {
                AutoCloseables.close(this.references);
            } catch (Exception var2) {
                JdbcStoragePlugin.LOGGER.warn("Error closing iterators when listing JDBC datasets.", var2);
            }
 
        }
    }

调度服务

  • 参考实现

说明

dremio 的元数据是比较重要的,大致了解下元数据的处理比较重要,同时为了方便bi 以及工具使用 dremio 包含了InformationSchemaCatalog 提供informationschema
对于jdbc 以及SourceMetadataManager 的可以参考我以前写的
的能力

参考资料

https://github.com/dremio/dremio-oss/blob/d41cb52143b6b0289fc8ed4d970bfcf410a669e8/sabot/kernel/src/main/java/com/dremio/exec/store/ischema/Column.java
https://github.com/dremio/dremio-oss/blob/d41cb52143b6b0289fc8ed4d970bfcf410a669e8/sabot/vector-tools/src/main/java/com/dremio/common/expression/SqlTypeNameVisitor.java
https://github.com/dremio/dremio-oss/blob/d41cb52143b6b0289fc8ed4d970bfcf410a669e8/sabot/kernel/src/main/java/com/dremio/exec/catalog/DatasetSaverImpl.java
https://github.com/dremio/dremio-oss/blob/d41cb52143b6b0289fc8ed4d970bfcf410a669e8/sabot/kernel/src/main/java/com/dremio/exec/catalog/InformationSchemaCatalogImpl.java
https://github.com/dremio/dremio-oss/blob/d41cb52143b6b0289fc8ed4d970bfcf410a669e8/sabot/kernel/src/main/java/com/dremio/exec/catalog/InformationSchemaCatalog.java
https://github.com/dremio/dremio-oss/blob/d41cb52143b6b0289fc8ed4d970bfcf410a669e8/sabot/kernel/src/main/java/com/dremio/exec/catalog/SourceMetadataManager.java
https://github.com/dremio/dremio-oss/blob/d41cb52143b6b0289fc8ed4d970bfcf410a669e8/sabot/kernel/src/main/java/com/dremio/exec/catalog/MetadataSynchronizer.java
https://www.cnblogs.com/rongfengliang/p/15978769.html
https://www.cnblogs.com/rongfengliang/p/15961890.html
https://www.cnblogs.com/rongfengliang/p/16486252.html

posted on 2022-10-15 21:38  荣锋亮  阅读(105)  评论(0编辑  收藏  举报

导航