dremio Operator 简单说明

来自官方的一小段介绍

Operator 是执行的基础单元,Operator 对于一个或者多个input 数据 stream 进行消费,然后输出,他们是单线程的
同时包含了不同类型的Operator 同时也会包含不同的状态,但是所有的状态会通过一系列的MasterStates进行描述
主要包含四类operators :

  • DualInputOperator 生产者,比如HashJoinOperator、UnionOperator
  • SingleInputOperator 生产者,单一消费者,比如FilterOperator, SortOperator
  • ProducerOperator 生产者,比如ScanOperator, UnorderedReceiverOperator
  • TerminalOperator 单一消费者,比如SingleSenderOperator, ScreenOperator
    dremio 的pipeline 是使用到Operator,Operator 会通过服务发现的能力在启动的时候注册(OperatorCreatorRegistry)

ProducerOperator 接口定义

 
public interface ProducerOperator extends Producer {
 
  enum State implements OperatorState<State> {
    NEEDS_SETUP(MasterState.NEEDS_SETUP),
    CAN_PRODUCE(MasterState.CAN_PRODUCE),
    BLOCKED(MasterState.BLOCKED),
    DONE(MasterState.DONE);
 
    final MasterState master;
    State(MasterState master){
      this.master = master;
    }
 
    @Override
    public void is(State expected) {
      assert expected == this : String.format(Operator.ERROR_STRING, expected.name(), this.name());
    }
 
    @Override
    public MasterState getMasterState() {
      return master;
    }
  }
 
  ProducerOperator.State getState();
 
  /**
   * Setups operator. Can only be called once. SqlOperatorImpl must be in NEEDS_SETUP state.
   * @return The VectorAccessible to be used for result return.
   */
  VectorAccessible setup() throws Exception;
 
  // 此次使用了类似guice 的注入助手能力,方便ProducerOperator的创建,具体参考 https://www.cnblogs.com/rongfengliang/p/16748323.html
  interface Creator<T extends PhysicalOperator> {
    ProducerOperator create(FragmentExecutionContext fragmentExecContext, OperatorContext context, T config) throws ExecutionSetupException;
  }
 
  interface ReceiverCreator<T extends PhysicalOperator> {
    ProducerOperator create(BatchStreamProvider streams, OperatorContext context, T config) throws ExecutionSetupException;
  }
}

发现加载(自己编写的,如果使用guice 会更加方便的)

private <T> ImmutableMap<Class<?>, T> getImplementors(ScanResult scanResult, Class<T> baseInterface) {
    final Map<Class<?>, T> map = new HashMap<>();
 
    Set<Class<? extends T>> providerClasses = scanResult.getImplementations(baseInterface);
    for (Class<?> c : providerClasses) {
      Class<?> operatorClass = c;
      boolean interfaceFound = false;
      while (!interfaceFound && !(c.equals(java.lang.Object.class))) {
        final Type[] ifaces = c.getGenericInterfaces(); // never returns null
        for (Type iface : ifaces) {
          if (!(iface instanceof ParameterizedType && ((ParameterizedType) iface).getRawType().equals(baseInterface))) {
            continue;
          }
          final Type[] args = ((ParameterizedType) iface).getActualTypeArguments();
          interfaceFound = true;
          boolean constructorFound = false;
          for (Constructor<?> constructor : operatorClass.getConstructors()) {
            Class<?>[] params = constructor.getParameterTypes();
            if (params.length == 0) {
              try {
                T newInstance = (T) constructor.newInstance();
                Object old = map.put((Class<?>) args[0], newInstance);
                if (old != null) {
                  throw UserException.functionError()
                      .message("Duplicate OperatorCreator [%s, %s] found for PhysicalOperator %s",
                          old.getClass().getCanonicalName(), operatorClass.getCanonicalName(),
                          ((Class<?>) args[0]).getCanonicalName())
                      .build(logger);
                }
                constructorFound = true;
              } catch (Exception ex) {
                logger.warn("Failure while creating OperatorCreator. Constructor declaring class {}.",
                    constructor.getDeclaringClass().getName(), ex);
              }
            }
          }
          if (!constructorFound) {
            logger.debug("Skipping registration of OperatorCreator {} as it doesn't have a default constructor",
                operatorClass.getCanonicalName());
          }
        }
        c = c.getSuperclass();
      }
    }
    return ImmutableMap.copyOf(map);
  }

参考实现类

 

 

pipeline 使用Operator

FragmentExecutor.java

pipeline = PipelineCreator.get(
    new FragmentExecutionContext(major.getForeman(), sources, cancelled, major.getContext()),
    buffers,
    opCreator,
    contextCreator,
    functionLookupContextToUse,
    rootOperator,
    tunnelProvider,
    new SharedResourcesContextImpl(sharedResources)
    );
 
pipeline.setup();

jdbc 对于ProducerOperator 的使用

public class JdbcBatchCreator implements ProducerOperator.Creator<JdbcSubScan> {
    public JdbcBatchCreator() {
    }
 
    public ProducerOperator create(FragmentExecutionContext fragmentExecContext, OperatorContext context, JdbcSubScan subScan) throws ExecutionSetupException {
         // 每个scan 执行会包含使用的存储插件,通过此将schema fetcher,reader 都关联起来了,我们基本可以看到,这个是一个插件必备的一个点
        // jdbc plugin 
        JdbcStoragePlugin plugin = (JdbcStoragePlugin)fragmentExecContext.getStoragePlugin(subScan.getPluginId());
        // schemfecher 
        JdbcSchemaFetcherImpl schemaFetcher = (JdbcSchemaFetcherImpl)plugin.getFetcher();
        JdbcPluginConfig config = plugin.getConfig();
      //  reader 
        JdbcRecordReader innerReader = new JdbcRecordReader(context, schemaFetcher.getSource(), subScan.getSql(), config, subScan.getColumns(), fragmentExecContext.cancelled(), subScan.getPluginId().getCapabilities(), plugin.getDialect().getDataTypeMapper(config), subScan.getReferencedTables(), subScan.getSkippedColumns());
        CoercionReader reader = new CoercionReader(context, subScan.getColumns(), innerReader, subScan.getFullSchema());
       // 使用ScanOperator
        return new ScanOperator(subScan, context, RecordReaderIterator.from(reader));
    }
}

一个参考调用链

以前分享过,执行的时候会依赖创建的Operator

 

 

MasterState 状态定义

Operator.java

 
  public enum MasterState {
    NEEDS_SETUP,
    CAN_CONSUME,
    CAN_CONSUME_L,
    CAN_CONSUME_R,
    CAN_PRODUCE,
    BLOCKED,
    DONE
  }
   interface OperatorState<T> {
    MasterState getMasterState();
    void is(T expected);
    String name();
  }

具体每个Operator 会包含自己的OperatorState 实现

说明

dremio Operator 是一个比较重要的东西,通过了解Operator 的大致流程,有助于我们更好的学习dremio 同时了解内部的机制(比如jdbc 插件的执行就依赖此能力,扩展了dremio 可以直接查询关系数据库,mongo 也类似)以上是一个简单的说明,详细的大家可以深入学习下

参考资料

https://github.com/dremio/dremio-oss/blob/d41cb52143b6b0289fc8ed4d970bfcf410a669e8/sabot/kernel/src/main/java/com/dremio/sabot/driver/PipelineCreator.java
https://github.com/dremio/dremio-oss/blob/d41cb52143b6b0289fc8ed4d970bfcf410a669e8/sabot/kernel/src/main/java/com/dremio/sabot/driver/OperatorCreatorRegistry.java
https://www.cnblogs.com/rongfengliang/p/16463884.html
https://github.com/dremio/dremio-oss/blob/d41cb52143b6b0289fc8ed4d970bfcf410a669e8/sabot/kernel/src/main/java/com/dremio/sabot/op/spi/Operator.java

posted on 2022-10-17 17:30  荣锋亮  阅读(37)  评论(0编辑  收藏  举报

导航