StreamSets SQL Parser组件二次开发 - 增加对隐藏列和虚拟列的解析

最近在使用StreamSets做数据同步，过程中发现SQL Parser无法解析表中的隐藏列和虚拟列，所以想着研究下源码，改写一下。

根据SQL Parser报错的信息定位到对应的Java类：

com.streamsets.pipeline.api.base.OnRecordErrorException: JDBC_402 - Columns 'XX_XXX' are not present in XX_XXXXXXX
	at com.streamsets.pipeline.stage.processor.parser.sql.SqlParserProcessor.process(SqlParserProcessor.java:203)
	at com.streamsets.pipeline.stage.processor.parser.sql.SqlParserProcessor.process(SqlParserProcessor.java:152)

下载好源码以后，找了个开发同事帮忙修改。

修改文件 - datacollector-3.15\jdbc-protolib\src\main\java\com\streamsets\pipeline\stage\processor\parser\sql\SqlParserProcessor.java - 290行

resolveSchema方法

原始实现：

private void resolveSchema(SchemaAndTable schemaAndTable) throws StageException {
  Map<String, Integer> columns = new HashMap<>();
  String schema = schemaAndTable.getSchema();
  String table = schemaAndTable.getTable();
  try (Statement s = connection.createStatement()) {
    ResultSetMetaData md = s.executeQuery(
        Utils.format(
            "SELECT * FROM {}{} WHERE 1 = 0",
            StringUtils.isNotEmpty(schema) ? "\'" + schema + "\'" : "",
            "\'" + table + "\'"
        )).getMetaData();
    int colCount = md.getColumnCount();
    for (int i = 1; i <= colCount; i++) {
      int colType = md.getColumnType(i);
      String colName = md.getColumnName(i);
      if (!configBean.caseSensitive) {
        colName = colName.toUpperCase();
      }
      if (colType == Types.DATE || colType == Types.TIME || colType == Types.TIMESTAMP) {
        dateTimeColumns.computeIfAbsent(schemaAndTable, k -> new HashMap<>());
        dateTimeColumns.get(schemaAndTable).put(colName, md.getColumnTypeName(i));
      }

      if (colType == Types.DECIMAL || colType == Types.NUMERIC) {
        decimalColumns
            .computeIfAbsent(schemaAndTable, k -> new HashMap<>())
            .put(colName, new PrecisionAndScale(md.getPrecision(i), md.getScale(i)));
      }
      columns.put(md.getColumnName(i), md.getColumnType(i));
    }
    tableSchemas.put(schemaAndTable, columns);
  } catch (SQLException ex) {
    throw new StageException(JDBC_00, configBean.hikariConfigBean.connectionString);
  }
}

增强实现：

private void resolveSchema(SchemaAndTable schemaAndTable) throws StageException {
  Map<String, Integer> columns = new HashMap<>();
  String schema = schemaAndTable.getSchema();
  String table = schemaAndTable.getTable();
  try  {
    ResultSet mdRs = connection.getMetaData().getColumns(null, schema.toUpperCase(), table.toUpperCase(), null);

    while (mdRs.next()) {
      int colType = mdRs.getInt("DATA_TYPE");
      String colName = mdRs.getString("COLUMN_NAME");
      int scale = mdRs.getInt("DECIMAL_DIGITS");
      int prec = mdRs.getInt("COLUMN_SIZE");
      if (!configBean.caseSensitive) {
        colName = colName.toUpperCase();
      }
      if (colType == Types.DATE || colType == Types.TIME || colType == Types.TIMESTAMP) {
        dateTimeColumns.computeIfAbsent(schemaAndTable, k -> new HashMap<>());
        dateTimeColumns.get(schemaAndTable).put(colName, colName);
      }

      if (colType == Types.DECIMAL || colType == Types.NUMERIC) {
        decimalColumns
                .computeIfAbsent(schemaAndTable, k -> new HashMap<>())
                .put(colName, new PrecisionAndScale(prec, scale));
      }
      columns.put(colName, colType);
    }
    tableSchemas.put(schemaAndTable, columns);
  } catch (SQLException ex) {
    throw new StageException(JDBC_00, configBean.hikariConfigBean.connectionString);
  }
}

这样修改以后，就可以取到隐藏列和虚拟列的元数据，可以实现对应列的解析了。

本次修改基于StreamSets Data Collector 3.15版本，已经测试通过。

streamsets-datacollector-jdbc-protolib-3.15.0.jar的包，不知道怎么上传在本文章里，如果有需要的，可以留言。

posted @ 2020-08-10 17:34 Evan苏C 阅读(623) 评论(1) 编辑收藏举报

刷新页面返回顶部

Evan苏C

StreamSets SQL Parser组件二次开发 - 增加对隐藏列和虚拟列的解析

公告