dremio jdbc 访问最好使用链接池工具

昨天在碰到一个dremio jdbc 比较奇怪的问题,按照了标准的jdbc 操作(建立链接,创建Statement,处理数据,关闭Statement,关闭连接)
当进行多次执行(多次建立连接操作)发现dremio 有一个操作异常,造成数据表创建有问题(事务不完整)

参考代码

private static void v3() {
        try {
            String sql = " select * from pg.public.sensor";
            String url = "jdbc:dremio:direct=localhost:31011;schema=sys";
            Connection conn1 = DriverManager.getConnection(url, "xxxx", "xxxxx");
            //3.获取Statement对象
            Statement statement1 = conn1.createStatement();
            statement1.executeQuery(sql);
            statement1.close();
            conn1.close();
 
            String sql2 = " select * from pg.public.sensor";
            Connection conn2 = DriverManager.getConnection(url, "dalong", "dalong123");
            Statement statement2 = conn2.createStatement();
            statement2.executeQuery(sql2);
            statement2.close();
            conn2.close();
 
 
            String sql3 = " select * from pg.public.sensor";
            Connection conn3 = DriverManager.getConnection(url, "dalong", "dalong123");
            Statement statement3 = conn3.createStatement();
            statement3.executeQuery(sql3);
            statement3.close();
            conn3.close();
 
        } catch (SQLException e) {
            throw new RuntimeException(e);
        }
}

注意上边我们没有执行进行数据的处理statement.executeQuery 部分

错误信息

Query cancelled by user 'xxxxx'

解决方法

通过简单跟进代码发现,提示信息是正确的,因为我们进行了 statement 的close 内部使用了游标,dremio 发现如果内部执行还没完成的时候执行
进行close 会通过dremio client 发送一个cancel query 的rpc 请求(dremio 内部是基于job 调度的,具体任务什么时候完成不太确定),如果进行了
数据查询获取,这样dremio jdbc 就可以知道啥时数据是完成的(内部通过游标遍历),然后close 肯定也没有问题,最好的实践方法我们是通过连接池
工具,这样可以进行连接的复用,确保连接不会出现问题,同时也有利于资源的管理

关于连接的处理

  • statement close 部分
    因为dremio 的statement 实际上是包装的avatica 的,内部数据处理使用了游标具体类是DremioResultSetImpl以及DremioCursor
    代码部分AvaticaResultSet
public void close() {
    this.closed = true;
    // dremio 部分会传递自己的DremioCursor
    Cursor cursor = this.cursor;
    if (cursor != null) {
        this.cursor = null;
        cursor.close();
    }
 
    if (this.statement != null) {
        this.statement.onResultSetClose(this);
    }
}

DremioCursor close 部分 (client/jdbc/src/main/java/com/dremio/jdbc/impl/DremioCursor.java)

  public void close() {
    // Clean up result set (to deallocate any buffers).
    cleanup();
    // TODO:  CHECK:  Something might need to set statement.openResultSet to
    // null.  Also, AvaticaResultSet.close() doesn't check whether already
    // closed and skip calls to cursor.close(), statement.onResultSetClose()
  }

cleanup 部分 (client/jdbc/src/main/java/com/dremio/jdbc/impl/DremioCursor.java)

  synchronized void cleanup() {
   // 可以看出如果resultsListener 没有完成,同时调用了一个close 操作,那个jdbc 会调用DremioClient 的一个cancelQuery rpc 请求
    if (resultsListener.getQueryId() != null && ! resultsListener.completed) {
      connection.getClient().cancelQuery(resultsListener.getQueryId());
    }
    resultsListener.close();
    currentBatchHolder.clear();
  }

cancelQuery 处理(client/base/src/main/java/com/dremio/exec/client/DremioClient.java)

  public RpcFuture<Ack> cancelQuery(QueryId id) {
    if(logger.isDebugEnabled()) {
      logger.debug("Cancelling query {}", QueryIdHelper.getQueryId(id));
    }
    // 会发送CANCEL_QUERY,内部调用的是UserClient
    return client.send(RpcType.CANCEL_QUERY, id, Ack.class);
  }

UserClient 内部会基于一个QueryResultHandler进行数据处理(一个包装类,同时还包含了一些事件监听)
QueryResultHandler 包含数据处理的一些事件比如成功,失败
UserRPCServer 内部cancel 处理实际有自己的包装,会进行一些操作的取消(执行计划之类的)
ForemenWorkManager 部分(sabot/kernel/src/main/java/com/dremio/exec/work/protector/ForemenWorkManager.java)

public Ack cancelQuery(ExternalId query, String username) {
  cancel(query, String.format("Query cancelled by user '%s'", username), true, false);
  return Acks.OK;
}

调用的是Foreman,这个属于dremio 比较核心的一个类了

  public synchronized void cancel(String reason, boolean clientCancelled, boolean runTimeExceeded) {
    cancel(reason, clientCancelled, null, false, runTimeExceeded);
  }
 
  public synchronized void cancel(String reason, boolean clientCancelled, String cancelContext,
                                  boolean isCancelledByHeapMonitor, boolean runTimeExceeded) {
    if (!canceled) {
      canceled = true;
 
      if (attemptManager != null) {
        //   内部基于了AttemptManager
        attemptManager.cancel(reason, clientCancelled, cancelContext, isCancelledByHeapMonitor, runTimeExceeded);
      }
    } else {
      logger.debug("Cancel of queryId:{} was already attempted before. Ignoring cancelling request now.",
        QueryIdHelper.getQueryId(attemptId.toQueryId()));
    }
  }

AttemptManager 的处理

  public void cancel(String reason, boolean clientCancelled, String cancelContext, boolean isCancelledByHeapMonitor, boolean runTimeExceeded) {
    // Note this can be called from outside of run() on another thread, or after run() completes
    profileTracker.setCancelReason(reason);
    this.clientCancelled = clientCancelled;
    this.runTimeExceeded = runTimeExceeded;
    // Set the cancelFlag, so that query in planning phase will be canceled
    // by super.checkCancel() in DremioVolcanoPlanner and DremioHepPlanner 
   //  后续的一些查询计划会取消
    queryContext.getPlannerSettings().cancelPlanning(reason,
                                                     queryContext.getCurrentEndpoint(),
                                                     cancelContext,
                                                     isCancelledByHeapMonitor);
    // Do not cancel queries in running or queued state when canceled by coordinator heap monitor
    if (!isCancelledByHeapMonitor) {
      // interrupt execution immediately if query is blocked in any of the stages where it can get blocked.
      // For instance, in ENGINE START stage, query could be blocked for engine to start or in QUEUED stage,
      // the query could be blocked on a slot to become available when number of concurrent queries exceeds number of
      // available slots (max concurrency).
      maestroService.interruptExecutionInWaitStates(queryId, currentExecutionStage);
      // Put the cancel in the event queue:
      // Note: Since the event processor only processes events after the attempt manager has completed all coordinator
      // stages (including maestro's executeQuery), it is assumed that the interruptions done above will make the
      // meastro end the executeQuery prematurely so that the state machine gets started and all pending events
      // including this cancel gets processed.
      addToEventQueue(QueryState.CANCELED, null);
    }
  }

连接池使用

可以通过hikari 包, 当然实际使用中我们一般都是集成服务集成的(比如spring boot) 基于datasource 是更好的选择

说明

以上碰到的问题,实际上一般如果是基于查询,然后获取数据的问题并不大,但是如果我们对于执行结果不太关注的场景,就会有上边的问题了,基于连接池是比较好的选择

参考资料

https://github.com/brettwooldridge/HikariCP
client/base/src/main/java/com/dremio/sabot/rpc/user/QueryResultHandler.java
client/base/src/main/java/com/dremio/exec/client/DremioClient.java
client/jdbc/src/main/java/com/dremio/jdbc/impl/DremioCursor.java
client/base/src/main/java/com/dremio/sabot/rpc/user/UserClient.java
sabot/kernel/src/main/java/com/dremio/sabot/rpc/user/UserRPCServer.java
client/base/src/main/java/com/dremio/sabot/rpc/user/QueryResultHandler.java
client/base/src/main/java/com/dremio/sabot/rpc/user/UserResultsListener.java
sabot/kernel/src/main/java/com/dremio/exec/work/foreman/AttemptManager.java
sabot/kernel/src/main/java/com/dremio/exec/work/protector/Foreman.java
sabot/kernel/src/main/java/com/dremio/exec/work/protector/ForemenWorkManager.java
sabot/kernel/src/main/java/com/dremio/exec/work/protector/UserWorker.java
https://github.com/dremio/dremio-oss/blob/master/client/jdbc/src/main/java/com/dremio/jdbc/impl/DremioCursor.java#L427
https://github.com/dremio/dremio-oss/blob/master/client/jdbc/src/main/java/com/dremio/jdbc/impl/DremioResultSetImpl.java

posted on 2024-01-17 10:43  荣锋亮  阅读(31)  评论(0编辑  收藏  举报

导航