StreamSets SQL Parser组件二次开发 - 增加对隐藏列和虚拟列的解析
最近在使用StreamSets做数据同步,过程中发现SQL Parser无法解析表中的隐藏列和虚拟列,所以想着研究下源码,改写一下。
根据SQL Parser报错的信息定位到对应的Java类:
com.streamsets.pipeline.api.base.OnRecordErrorException: JDBC_402 - Columns 'XX_XXX' are not present in XX_XXXXXXX at com.streamsets.pipeline.stage.processor.parser.sql.SqlParserProcessor.process(SqlParserProcessor.java:203) at com.streamsets.pipeline.stage.processor.parser.sql.SqlParserProcessor.process(SqlParserProcessor.java:152)
下载好源码以后,找了个开发同事帮忙修改。
修改文件 - datacollector-3.15\jdbc-protolib\src\main\java\com\streamsets\pipeline\stage\processor\parser\sql\SqlParserProcessor.java - 290行
resolveSchema方法
原始实现:
private void resolveSchema(SchemaAndTable schemaAndTable) throws StageException {
Map<String, Integer> columns = new HashMap<>();
String schema = schemaAndTable.getSchema();
String table = schemaAndTable.getTable();
try (Statement s = connection.createStatement()) {
ResultSetMetaData md = s.executeQuery(
Utils.format(
"SELECT * FROM {}{} WHERE 1 = 0",
StringUtils.isNotEmpty(schema) ? "\'" + schema + "\'" : "",
"\'" + table + "\'"
)).getMetaData();
int colCount = md.getColumnCount();
for (int i = 1; i <= colCount; i++) {
int colType = md.getColumnType(i);
String colName = md.getColumnName(i);
if (!configBean.caseSensitive) {
colName = colName.toUpperCase();
}
if (colType == Types.DATE || colType == Types.TIME || colType == Types.TIMESTAMP) {
dateTimeColumns.computeIfAbsent(schemaAndTable, k -> new HashMap<>());
dateTimeColumns.get(schemaAndTable).put(colName, md.getColumnTypeName(i));
}
if (colType == Types.DECIMAL || colType == Types.NUMERIC) {
decimalColumns
.computeIfAbsent(schemaAndTable, k -> new HashMap<>())
.put(colName, new PrecisionAndScale(md.getPrecision(i), md.getScale(i)));
}
columns.put(md.getColumnName(i), md.getColumnType(i));
}
tableSchemas.put(schemaAndTable, columns);
} catch (SQLException ex) {
throw new StageException(JDBC_00, configBean.hikariConfigBean.connectionString);
}
}
增强实现:
private void resolveSchema(SchemaAndTable schemaAndTable) throws StageException {
Map<String, Integer> columns = new HashMap<>();
String schema = schemaAndTable.getSchema();
String table = schemaAndTable.getTable();
try {
ResultSet mdRs = connection.getMetaData().getColumns(null, schema.toUpperCase(), table.toUpperCase(), null);
while (mdRs.next()) {
int colType = mdRs.getInt("DATA_TYPE");
String colName = mdRs.getString("COLUMN_NAME");
int scale = mdRs.getInt("DECIMAL_DIGITS");
int prec = mdRs.getInt("COLUMN_SIZE");
if (!configBean.caseSensitive) {
colName = colName.toUpperCase();
}
if (colType == Types.DATE || colType == Types.TIME || colType == Types.TIMESTAMP) {
dateTimeColumns.computeIfAbsent(schemaAndTable, k -> new HashMap<>());
dateTimeColumns.get(schemaAndTable).put(colName, colName);
}
if (colType == Types.DECIMAL || colType == Types.NUMERIC) {
decimalColumns
.computeIfAbsent(schemaAndTable, k -> new HashMap<>())
.put(colName, new PrecisionAndScale(prec, scale));
}
columns.put(colName, colType);
}
tableSchemas.put(schemaAndTable, columns);
} catch (SQLException ex) {
throw new StageException(JDBC_00, configBean.hikariConfigBean.connectionString);
}
}
这样修改以后,就可以取到隐藏列和虚拟列的元数据,可以实现对应列的解析了。
本次修改基于StreamSets Data Collector 3.15版本,已经测试通过。
streamsets-datacollector-jdbc-protolib-3.15.0.jar的包,不知道怎么上传在本文章里,如果有需要的,可以留言。