dremio jdbc 访问最好使用链接池工具
昨天在碰到一个dremio jdbc 比较奇怪的问题,按照了标准的jdbc 操作(建立链接,创建Statement,处理数据,关闭Statement,关闭连接)
当进行多次执行(多次建立连接操作)发现dremio 有一个操作异常,造成数据表创建有问题(事务不完整)
参考代码
private static void v3() {
try {
String sql = " select * from pg.public.sensor";
String url = "jdbc:dremio:direct=localhost:31011;schema=sys";
Connection conn1 = DriverManager.getConnection(url, "xxxx", "xxxxx");
//3.获取Statement对象
Statement statement1 = conn1.createStatement();
statement1.executeQuery(sql);
statement1.close();
conn1.close();
String sql2 = " select * from pg.public.sensor";
Connection conn2 = DriverManager.getConnection(url, "dalong", "dalong123");
Statement statement2 = conn2.createStatement();
statement2.executeQuery(sql2);
statement2.close();
conn2.close();
String sql3 = " select * from pg.public.sensor";
Connection conn3 = DriverManager.getConnection(url, "dalong", "dalong123");
Statement statement3 = conn3.createStatement();
statement3.executeQuery(sql3);
statement3.close();
conn3.close();
} catch (SQLException e) {
throw new RuntimeException(e);
}
}
注意上边我们没有执行进行数据的处理statement.executeQuery 部分
错误信息
Query cancelled by user 'xxxxx'
解决方法
通过简单跟进代码发现,提示信息是正确的,因为我们进行了 statement 的close 内部使用了游标,dremio 发现如果内部执行还没完成的时候执行
进行close 会通过dremio client 发送一个cancel query 的rpc 请求(dremio 内部是基于job 调度的,具体任务什么时候完成不太确定),如果进行了
数据查询获取,这样dremio jdbc 就可以知道啥时数据是完成的(内部通过游标遍历),然后close 肯定也没有问题,最好的实践方法我们是通过连接池
工具,这样可以进行连接的复用,确保连接不会出现问题,同时也有利于资源的管理
关于连接的处理
- statement close 部分
因为dremio 的statement 实际上是包装的avatica 的,内部数据处理使用了游标具体类是DremioResultSetImpl以及DremioCursor
代码部分AvaticaResultSet
public void close() {
this.closed = true;
// dremio 部分会传递自己的DremioCursor
Cursor cursor = this.cursor;
if (cursor != null) {
this.cursor = null;
cursor.close();
}
if (this.statement != null) {
this.statement.onResultSetClose(this);
}
}
DremioCursor close 部分 (client/jdbc/src/main/java/com/dremio/jdbc/impl/DremioCursor.java)
public void close() {
// Clean up result set (to deallocate any buffers).
cleanup();
// TODO: CHECK: Something might need to set statement.openResultSet to
// null. Also, AvaticaResultSet.close() doesn't check whether already
// closed and skip calls to cursor.close(), statement.onResultSetClose()
}
cleanup 部分 (client/jdbc/src/main/java/com/dremio/jdbc/impl/DremioCursor.java)
synchronized void cleanup() {
// 可以看出如果resultsListener 没有完成,同时调用了一个close 操作,那个jdbc 会调用DremioClient 的一个cancelQuery rpc 请求
if (resultsListener.getQueryId() != null && ! resultsListener.completed) {
connection.getClient().cancelQuery(resultsListener.getQueryId());
}
resultsListener.close();
currentBatchHolder.clear();
}
cancelQuery 处理(client/base/src/main/java/com/dremio/exec/client/DremioClient.java)
public RpcFuture<Ack> cancelQuery(QueryId id) {
if(logger.isDebugEnabled()) {
logger.debug("Cancelling query {}", QueryIdHelper.getQueryId(id));
}
// 会发送CANCEL_QUERY,内部调用的是UserClient
return client.send(RpcType.CANCEL_QUERY, id, Ack.class);
}
UserClient 内部会基于一个QueryResultHandler进行数据处理(一个包装类,同时还包含了一些事件监听)
QueryResultHandler 包含数据处理的一些事件比如成功,失败
UserRPCServer 内部cancel 处理实际有自己的包装,会进行一些操作的取消(执行计划之类的)
ForemenWorkManager 部分(sabot/kernel/src/main/java/com/dremio/exec/work/protector/ForemenWorkManager.java)
public Ack cancelQuery(ExternalId query, String username) {
cancel(query, String.format("Query cancelled by user '%s'", username), true, false);
return Acks.OK;
}
调用的是Foreman,这个属于dremio 比较核心的一个类了
public synchronized void cancel(String reason, boolean clientCancelled, boolean runTimeExceeded) {
cancel(reason, clientCancelled, null, false, runTimeExceeded);
}
public synchronized void cancel(String reason, boolean clientCancelled, String cancelContext,
boolean isCancelledByHeapMonitor, boolean runTimeExceeded) {
if (!canceled) {
canceled = true;
if (attemptManager != null) {
// 内部基于了AttemptManager
attemptManager.cancel(reason, clientCancelled, cancelContext, isCancelledByHeapMonitor, runTimeExceeded);
}
} else {
logger.debug("Cancel of queryId:{} was already attempted before. Ignoring cancelling request now.",
QueryIdHelper.getQueryId(attemptId.toQueryId()));
}
}
AttemptManager 的处理
public void cancel(String reason, boolean clientCancelled, String cancelContext, boolean isCancelledByHeapMonitor, boolean runTimeExceeded) {
// Note this can be called from outside of run() on another thread, or after run() completes
profileTracker.setCancelReason(reason);
this.clientCancelled = clientCancelled;
this.runTimeExceeded = runTimeExceeded;
// Set the cancelFlag, so that query in planning phase will be canceled
// by super.checkCancel() in DremioVolcanoPlanner and DremioHepPlanner
// 后续的一些查询计划会取消
queryContext.getPlannerSettings().cancelPlanning(reason,
queryContext.getCurrentEndpoint(),
cancelContext,
isCancelledByHeapMonitor);
// Do not cancel queries in running or queued state when canceled by coordinator heap monitor
if (!isCancelledByHeapMonitor) {
// interrupt execution immediately if query is blocked in any of the stages where it can get blocked.
// For instance, in ENGINE START stage, query could be blocked for engine to start or in QUEUED stage,
// the query could be blocked on a slot to become available when number of concurrent queries exceeds number of
// available slots (max concurrency).
maestroService.interruptExecutionInWaitStates(queryId, currentExecutionStage);
// Put the cancel in the event queue:
// Note: Since the event processor only processes events after the attempt manager has completed all coordinator
// stages (including maestro's executeQuery), it is assumed that the interruptions done above will make the
// meastro end the executeQuery prematurely so that the state machine gets started and all pending events
// including this cancel gets processed.
addToEventQueue(QueryState.CANCELED, null);
}
}
连接池使用
可以通过hikari 包, 当然实际使用中我们一般都是集成服务集成的(比如spring boot) 基于datasource 是更好的选择
说明
以上碰到的问题,实际上一般如果是基于查询,然后获取数据的问题并不大,但是如果我们对于执行结果不太关注的场景,就会有上边的问题了,基于连接池是比较好的选择
参考资料
https://github.com/brettwooldridge/HikariCP
client/base/src/main/java/com/dremio/sabot/rpc/user/QueryResultHandler.java
client/base/src/main/java/com/dremio/exec/client/DremioClient.java
client/jdbc/src/main/java/com/dremio/jdbc/impl/DremioCursor.java
client/base/src/main/java/com/dremio/sabot/rpc/user/UserClient.java
sabot/kernel/src/main/java/com/dremio/sabot/rpc/user/UserRPCServer.java
client/base/src/main/java/com/dremio/sabot/rpc/user/QueryResultHandler.java
client/base/src/main/java/com/dremio/sabot/rpc/user/UserResultsListener.java
sabot/kernel/src/main/java/com/dremio/exec/work/foreman/AttemptManager.java
sabot/kernel/src/main/java/com/dremio/exec/work/protector/Foreman.java
sabot/kernel/src/main/java/com/dremio/exec/work/protector/ForemenWorkManager.java
sabot/kernel/src/main/java/com/dremio/exec/work/protector/UserWorker.java
https://github.com/dremio/dremio-oss/blob/master/client/jdbc/src/main/java/com/dremio/jdbc/impl/DremioCursor.java#L427
https://github.com/dremio/dremio-oss/blob/master/client/jdbc/src/main/java/com/dremio/jdbc/impl/DremioResultSetImpl.java