dremio hive jdbc arp date 类型问题记录
简单记录下碰到的一些问题
分析
- arthas stack 查看调用
对于hive 是类似的,我测试的是mysql 的
stack com.mysql.cj.jdbc.result.ResultSetImpl getDate
效果
ffect(class count: 2 , method count: 4) cost in 329 ms, listenerId: 11
ts=2023-12-26 06:18:17;thread_name=e3 - 1a758fd6-4c6d-9baa-6d8f-31fa8220ee00:frag:0:0;id=c4;is_daemon=false;priority=5;TCCL=sun.misc.Launcher$AppClassLoader@18b4aac2
@com.mysql.cj.jdbc.result.ResultSetImpl.getDate()
at org.apache.commons.dbcp2.DelegatingResultSet.getDate(DelegatingResultSet.java:682)
at org.apache.commons.dbcp2.DelegatingResultSet.getDate(DelegatingResultSet.java:682)
at com.dremio.exec.store.jdbc.JdbcRecordReader$DateCopier.copy(JdbcRecordReader.java:688)
at com.dremio.exec.store.jdbc.JdbcRecordReader.next(JdbcRecordReader.java:291)
at com.dremio.exec.store.CoercionReader.next(CoercionReader.java:187)
at com.dremio.sabot.op.scan.ScanOperator.outputData(ScanOperator.java:365)
at com.dremio.sabot.driver.SmartOp$SmartProducer.outputData(SmartOp.java:551)
at com.dremio.sabot.driver.StraightPipe.pump(StraightPipe.java:56)
at com.dremio.sabot.driver.Pipeline.doPump(Pipeline.java:124)
at com.dremio.sabot.driver.Pipeline.pumpOnce(Pipeline.java:114)
at com.dremio.sabot.exec.fragment.FragmentExecutor$DoAsPumper.run(FragmentExecutor.java:565)
at com.dremio.sabot.exec.fragment.FragmentExecutor.run(FragmentExecutor.java:480)
at com.dremio.sabot.exec.fragment.FragmentExecutor.access$1700(FragmentExecutor.java:109)
at com.dremio.sabot.exec.fragment.FragmentExecutor$AsyncTaskImpl.run(FragmentExecutor.java:1016)
at com.dremio.sabot.task.AsyncTaskWrapper.run(AsyncTaskWrapper.java:122)
at com.dremio.sabot.task.slicing.SlicingThread.mainExecutionLoop(SlicingThread.java:249)
at com.dremio.sabot.task.slicing.SlicingThread.run(SlicingThread.java:171)
ts=2023-12-26 06:57:01;thread_name=e6 - 1a7586c2-583b-6a25-56f0-01038d824a00:frag:0:0;id=c7;is_daemon=false;priority=5;TCCL=sun.misc.Launcher$AppClassLoader@18b4aac2
@com.mysql.cj.jdbc.result.ResultSetImpl.getDate()
at org.apache.commons.dbcp2.DelegatingResultSet.getDate(DelegatingResultSet.java:682)
at org.apache.commons.dbcp2.DelegatingResultSet.getDate(DelegatingResultSet.java:682)
at com.dremio.exec.store.jdbc.JdbcRecordReader$DateCopier.copy(JdbcRecordReader.java:688)
at com.dremio.exec.store.jdbc.JdbcRecordReader.next(JdbcRecordReader.java:291)
at com.dremio.exec.store.CoercionReader.next(CoercionReader.java:187)
at com.dremio.sabot.op.scan.ScanOperator.outputData(ScanOperator.java:365)
at com.dremio.sabot.driver.SmartOp$SmartProducer.outputData(SmartOp.java:551)
at com.dremio.sabot.driver.StraightPipe.pump(StraightPipe.java:56)
at com.dremio.sabot.driver.Pipeline.doPump(Pipeline.java:124)
at com.dremio.sabot.driver.Pipeline.pumpOnce(Pipeline.java:114)
at com.dremio.sabot.exec.fragment.FragmentExecutor$DoAsPumper.run(FragmentExecutor.java:565)
at com.dremio.sabot.exec.fragment.FragmentExecutor.run(FragmentExecutor.java:480)
at com.dremio.sabot.exec.fragment.FragmentExecutor.access$1700(FragmentExecutor.java:109)
at com.dremio.sabot.exec.fragment.FragmentExecutor$AsyncTaskImpl.run(FragmentExecutor.java:1016)
at com.dremio.sabot.task.AsyncTaskWrapper.run(AsyncTaskWrapper.java:122)
at com.dremio.sabot.task.slicing.SlicingThread.mainExecutionLoop(SlicingThread.java:249)
at com.dremio.sabot.task.slicing.SlicingThread.run(SlicingThread.java:171)
- 类的反编译
jad com.dremio.exec.store.jdbc.JdbcRecordReader$DateCopier
效果
ClassLoader:
+-sun.misc.Launcher$AppClassLoader@18b4aac2
+-sun.misc.Launcher$ExtClassLoader@5614c340
Location:
/opt/dremio/jars/dremio-ce-jdbc-plugin-24.3.0-202312190021150029-52db2faf.jar
/*
* Decompiled with CFR.
*/
package com.dremio.exec.store.jdbc;
import com.dremio.exec.store.jdbc.JdbcRecordReader;
import java.sql.Date;
import java.sql.ResultSet;
import java.sql.SQLException;
import java.util.Calendar;
import java.util.TimeZone;
import org.apache.arrow.vector.DateMilliVector;
private static class JdbcRecordReader.DateCopier
extends JdbcRecordReader.Copier<DateMilliVector> {
private final Calendar calendar = Calendar.getInstance(TimeZone.getTimeZone("UTC"));
JdbcRecordReader.DateCopier(int columnIndex, ResultSet result, DateMilliVector vector) {
/*683*/ super(columnIndex, result, vector);
}
@Override
void copy(int index) throws SQLException {
// 会传递calendar
Date date = this.getResult().getDate(this.getColumnIndex(), this.calendar);
/*689*/ if (date != null) {
/*690*/ ((DateMilliVector)this.getValueVector()).setSafe(index, date.getTime());
}
}
}
- hive getdate 包含calendar 的处理
HiveBaseResultSet 类
效果
public Date getDate(int columnIndex, Calendar cal) throws SQLException {
logger.trace("{}, {}", this.traceInfo(), columnIndex);
throw new SQLException("Method not supported");
}
已经很明显了,核心是dremio传递了Calendar,但是hive 的getdate 不支持Calendar,所以对于类型肯定就会有问题了
解决方法
- 修改dremio
不建议,侵入太大,而且会有影响 - 直接修改hive jdbc 驱动
对于getDate 包含Calendar 的处理,使用public Date getDate(int columnIndex)
的实现,忽略Calendar
说明
对于构建,如果不想自己完整编译,可以通过反编译,替换class 文件的模式(如果有源码不推荐这么玩,对于缺少源码的场景可以使用此方法)
实际一个修改的代码,参考下边的github
参考资料
https://github.com/apache/hive/blob/master/jdbc/src/java/org/apache/hive/jdbc/HiveBaseResultSet.java#L360
https://github.com/rongfengliang/inceptor-sdk-transwarp-fix