dremio arrow flight 协议server实现——DremioFlightProducer代码简单介绍

DremioFlightProducer 包含了dremio 关于arrow flight 实现的核心部分

FlightProducer 接口定义

对于producer 的实现主要是实现 FlightProducer,包含的方法如下

方法代表的意义

 

 

dremio 对于FlightProducer的实现处理

因为dremio 属于一个查询操作(但是对于特殊存储也支持create table 操作,nas,hdfs,s3),目前对于部分flight 协议方法会直接提示未实现
未实现包含的方法 (注意这个并不是全部都是这样的,部分支持flight 协议的就需要实现其他的,比如CoordinatorFlightProducer的),对于直接
dremio 的实现很简单location 直接就是访问的机器,所以实现也相对简单,对于CoordinatorFlightProducer 需要实现的就比较多了,具体可以参考
实现

 
@Override
public Runnable acceptPut(CallContext callContext, FlightStream flightStream, StreamListener<PutResult> streamListener) {
  throw CallStatus.UNIMPLEMENTED.withDescription("acceptPut is unimplemented").toRuntimeException();
}
 
@Override
public void doAction(CallContext callContext, Action action, StreamListener<Result> streamListener) {
  throw CallStatus.UNIMPLEMENTED.withDescription("doAction is unimplemented").toRuntimeException();
}
 
@Override
public void listActions(CallContext callContext, StreamListener<ActionType> streamListener) {
  throw CallStatus.UNIMPLEMENTED.withDescription("listActions is unimplemented").toRuntimeException();
}
 
@Override
public void listFlights(CallContext callContext, Criteria criteria, StreamListener<FlightInfo> streamListener) {
  throw CallStatus.UNIMPLEMENTED.withDescription("listFlights is unimplemented").toRuntimeException();
}

所以dremio 主要实现了4个方法,但是默认doExchange(暂时不支持)以及getSchema在接口定义中使用了default ,所以
核心需要实现是两个了,getStream以及getFlightInfo ,从api 定义上主要是处理部分data stream 信息
以上两个方法都依赖了session处理,核心是账户信息,方便权限认证以及获取其他配置信息,比如引擎相关的,以及路由,具体参考
sabot/kernel/src/main/java/com/dremio/sabot/rpc/user/UserSession.java,上边两个方法都依赖dremio 自己包装的FlightWorkManager
getStream 以及getFlightInfo代码处理

 
@Override
public void getStream(CallContext callContext, Ticket ticket, ServerStreamListener serverStreamListener) {
  try {
    final CallHeaders headers = retrieveHeadersFromCallContext(callContext);
    final UserSession session = sessionsManager.getUserSession(callContext.peerIdentity(), headers);
    final TicketContent.PreparedStatementTicket preparedStatementTicket = TicketContent.PreparedStatementTicket.parseFrom(ticket.getBytes());
    // 此方法基于userworker 包装了一个具体的执行,并基于回调获取数据
    flightWorkManager.runPreparedStatement(preparedStatementTicket, serverStreamListener, allocator, session);
  } catch (InvalidProtocolBufferException ex) {
    final RuntimeException error = CallStatus.INVALID_ARGUMENT.withCause(ex).withDescription("Invalid ticket used in getStream").toRuntimeException();
    serverStreamListener.error(error);
    throw error;
  }
}
 
@Override
public FlightInfo getFlightInfo(CallContext callContext, FlightDescriptor flightDescriptor) {
  final CallHeaders headers = retrieveHeadersFromCallContext(callContext);
  final UserSession session = sessionsManager.getUserSession(callContext.peerIdentity(), headers);
  final FlightPreparedStatement flightPreparedStatement = flightWorkManager
    .createPreparedStatement(flightDescriptor, callContext::isCancelled, session);
  // 获取Flight 信息也是userworker的rpc 调用,只是处理比较快,在包装的使用时候了一个阻塞处理(while 循环,基于timeout 处理返回),保证可以获取需要的数据
  return flightPreparedStatement.getFlightInfo(location);
}

FlightPreparedStatement 对于阻塞数据获取的处理

public FlightInfo getFlightInfo(Location location) {
     // 
    final UserProtos.CreatePreparedStatementArrowResp createPreparedStatementResp = responseHandler.get();
    final Schema schema = buildSchema(createPreparedStatementResp.getPreparedStatement().getArrowSchema());
 
    final PreparedStatementTicket preparedStatementTicketContent = PreparedStatementTicket.newBuilder()
      .setQuery(query)
      .setHandle(createPreparedStatementResp.getPreparedStatement().getServerHandle())
      .build();
 
    final Ticket ticket = new Ticket(preparedStatementTicketContent.toByteArray());
 
    final FlightEndpoint flightEndpoint = new FlightEndpoint(ticket, location);
    return new FlightInfo(schema, flightDescriptor, ImmutableList.of(flightEndpoint), -1, -1);
  }

FlightWorkManager 依赖UserWorker以及OptionManager UserWorker 主要是处理任务提交的(以后会有相关实现介绍)OptionManager
主要是关于flight 配置相关的,FlightWorkManager 主要包含两个方法

 
  public FlightPreparedStatement createPreparedStatement(FlightDescriptor flightDescriptor,
                                                         Supplier<Boolean> isRequestCancelled, UserSession userSession)
   public void runPreparedStatement(TicketContent.PreparedStatementTicket ticket, FlightProducer.ServerStreamListener listener,
                                   BufferAllocator allocator, UserSession userSession) 

runPreparedStatement 方法是直接执行,使用了回调机制进行结果处理,主要是RunQueryResponseHandler 实现了UserResponseHandler
对于提交的job 我们可以通过UserResponseHandler 回调处理结果数据

 
public void runPreparedStatement(TicketContent.PreparedStatementTicket ticket, FlightProducer.ServerStreamListener listener,
                                   BufferAllocator allocator, UserSession userSession) {
    final UserBitShared.ExternalId runExternalId = ExternalIdHelper.generateExternalId();
    final UserRequest userRequest =
      new UserRequest(UserProtos.RpcType.RUN_QUERY,
        UserProtos.RunQuery.newBuilder()
          .setType(UserBitShared.QueryType.PREPARED_STATEMENT)
          .setPriority(UserProtos.QueryPriority.newBuilder()
            .setWorkloadType(UserBitShared.WorkloadType.FLIGHT)
            .setWorkloadClass(UserBitShared.WorkloadClass.GENERAL))
          .setSource(UserProtos.SubmissionSource.FLIGHT)
          .setPreparedStatementHandle(ticket.getHandle())
          .build());
     // listener 包装一个responseHandler 
    final UserResponseHandler responseHandler = runQueryResponseHandlerFactory.getHandler(runExternalId, userSession,
      workerProvider, optionManagerProvider, listener, allocator);
    workerProvider.get().submitWork(runExternalId, userSession, responseHandler, userRequest, TerminationListenerRegistry.NOOP);
  }

createPreparedStatement 实际上也是一个任务的调度执行,只是包装为一个FlightPreparedStatement

public FlightPreparedStatement createPreparedStatement(FlightDescriptor flightDescriptor,
                                                       Supplier<Boolean> isRequestCancelled, UserSession userSession) {
  final String query = getQuery(flightDescriptor);
 
  final UserProtos.CreatePreparedStatementArrowReq createPreparedStatementReq =
    UserProtos.CreatePreparedStatementArrowReq.newBuilder()
      .setSqlQuery(query)
      .build();
 
  final UserBitShared.ExternalId prepareExternalId = ExternalIdHelper.generateExternalId();
  final UserRequest userRequest =
    new UserRequest(UserProtos.RpcType.CREATE_PREPARED_STATEMENT_ARROW, createPreparedStatementReq);
 
  final CreatePreparedStatementResponseHandler createPreparedStatementResponseHandler =
    new CreatePreparedStatementResponseHandler(prepareExternalId, userSession, workerProvider, isRequestCancelled);
 
  workerProvider.get().submitWork(prepareExternalId, userSession, createPreparedStatementResponseHandler,
    userRequest, TerminationListenerRegistry.NOOP);
 
  return new FlightPreparedStatement(flightDescriptor, query, createPreparedStatementResponseHandler);
}

说明

DremioFlightProducer 在dremio 实现flight service 协议中还是一个比较重要的东西,代码实际上并不是很多,dremio 对于flight 协议实现,实际上不少
下图为当前实现了flight的类

 

 

参考资料

https://arrow.apache.org/blog/2022/02/16/introducing-arrow-flight-sql/
https://github.com/apache/arrow/blob/master/java/flight/flight-sql/src/main/java/org/apache/arrow/flight/sql/example/FlightSqlClientDemoApp.java
https://arrow.apache.org/docs/java/reference/index.html
https://github.com/apache/arrow/blob/master/format/Flight.proto

posted on 2022-03-01 18:44  荣锋亮  阅读(306)  评论(0编辑  收藏  举报

导航