[Debezium/FlinkCDC] 自定义列值转换器(`CustomConverter<SchemaBuilder, RelationalColumn>`)

需求描述

  • 希望精确地、统一地控制 mysqldatetime / timestamp 等字段从 debezium 框架 Bilog CDC 采集、转换后的值。

  • 又或者希望 mysql 的 int 、tinyint 、varchar 等字段,从 debezium 框架 Bilog CDC 采集、转换后的值类型统一为某个 Java Class 或值格式。

  • 这些场景,均可以直接利用 debezium 框架预留的接口io.debezium.spi.converter.CustomConverter实现即可满足诉求。

针对 底层也是基于debezium的flink cdc应用程序,也直接可使用这套代码。

  • io.debezium.spi.converter.CustomConverter

https://github.com/debezium/debezium/blob/v1.4.1.Final/debezium-api/src/main/java/io/debezium/spi/converter/CustomConverter.java

public interface CustomConverter<S, F extends ConvertedField> {

    @FunctionalInterface
    interface Converter {//将数据从一个类型转换为另一个类型的功能
        Object convert(Object input);
    }

    public interface ConverterRegistration<S> {//注册转换器的回调
        void register(S fieldSchema, Converter converter); //为当前字段注册给定的模式和转换器。对于同一字段,不应多次调用一次
    }

    //将连接器配置中指定的属性传递给转换器实例。configure 方法在连接器初始化时运行。您可以将转换器与多个连接器搭配使用,并根据连接器的属性设置修改其行为。
    void configure(Properties props);

    //注册转换器来处理数据源中的特定列或字段。Debezium 调用 converterFor(...) 方法,以提示转换器 来调用转换的注册。converterFor 方法为每个列运行一次。
    void converterFor(F field, ConverterRegistration<S> registration); //注册自定义值和模式转换器,以用于特定字段
}
  • 版本
  • debezium : 1.4.1.Final
  • flink cdc : 1.3.0
  • flink : 1.12.6

原理剖析

Debezium 的 TableSchemaBuilder 调用 自定义列值转换器

  • io.debezium.relational.TableSchemaBuilder

https://github.com/debezium/debezium/blob/v1.4.1.Final/debezium-core/src/main/java/io/debezium/relational/TableSchemaBuilder.java

    public TableSchema create(String schemaPrefix, String envelopSchemaName, Table table, Tables.ColumnNameFilter filter, ColumnMappers mappers, Key.KeyMapper keysMapper) {
		...
        Schema valSchema = valSchemaBuilder.optional().build();
        Schema keySchema = hasPrimaryKey.get() ? keySchemaBuilder.build() : null;
        if (LOGGER.isDebugEnabled()) {
            LOGGER.debug("Mapped primary key for table '{}' to schema: {}", tableId, SchemaUtil.asDetailedString(keySchema));
            LOGGER.debug("Mapped columns for table '{}' to schema: {}", tableId, SchemaUtil.asDetailedString(valSchema));
        }

        Envelope envelope = Envelope.defineSchema().withName(this.schemaNameAdjuster.adjust(envelopSchemaName)).withRecord(valSchema).withSource(this.sourceInfoSchema).build();
        StructGenerator keyGenerator = this.createKeyGenerator(keySchema, tableId, tableKey.keyColumns());
        StructGenerator valueGenerator = this.createValueGenerator(valSchema, tableId, table.columns(), filter, mappers);//重点方法,调用自定义列值 Converter
        return new TableSchema(tableId, keySchema, keyGenerator, envelope, valSchema, valueGenerator);
    }

    protected StructGenerator createValueGenerator(Schema schema, TableId tableId, List<Column> columns, Tables.ColumnNameFilter filter, ColumnMappers mappers) {
        if (schema != null) {
            List<Column> columnsThatShouldBeAdded = (List)columns.stream().filter((column) -> {
                return filter == null || filter.matches(tableId.catalog(), tableId.schema(), tableId.table(), column.name());
            }).collect(Collectors.toList());
            int[] recordIndexes = this.indexesForColumns(columnsThatShouldBeAdded);
            Field[] fields = this.fieldsForColumns(schema, columnsThatShouldBeAdded);
            int numFields = recordIndexes.length;
            ValueConverter[] converters = this.convertersForColumns(schema, tableId, columnsThatShouldBeAdded, mappers);
            return (row) -> {
                Struct result = new Struct(schema);

                for(int i = 0; i != numFields; ++i) {
                    this.validateIncomingRowToInternalMetadata(recordIndexes, fields, converters, row, i);
                    Object value = row[recordIndexes[i]];
                    ValueConverter converter = converters[i];
                    if (converter != null) {
                        LOGGER.trace("converter for value object: *** {} ***", converter);
                    } else {
                        LOGGER.trace("converter is null...");
                    }

                    if (converter != null) {
                        Column col;
                        try {
                            col = (Column)columns.get(i);
                            
							//注释行,可忽略,属于笔者自己追加的 debug 代码
                            //Object firstColumnValue = row[0];
                            //if(firstColumnValue.toString().equalsIgnoreCase("38")){//仅打印目标行
                            //    Object newValue = converter.convert(value);
                            //    LOGGER.info(
                            //        "columnName : {} | typeName : {}, jdbcType : {} | converterClass:{} | oldValue: {}(class:{}) , newValue:{}(class:{})"
                            //        , col.name(), col.typeName(), col.jdbcType(), converter.getClass().getCanonicalName()
                            //        , value, value.getClass().getCanonicalName(), newValue , newValue.getClass().getCanonicalName()
                            //    );
                            //}
                            value = converter.convert(value);
                            result.put(fields[i], value);
                        } catch (IllegalArgumentException | DataException var15) {
                            col = (Column)columns.get(i);
                            LOGGER.error("Failed to properly convert data value for '{}.{}' of type {} for row {}:", new Object[]{tableId, col.name(), col.typeName(), row, var15});
                        } catch (Exception var16) {
                            col = (Column)columns.get(i);
                            LOGGER.error("Failed to properly convert data value for '{}.{}' of type {} for row {}:", new Object[]{tableId, col.name(), col.typeName(), row, var16});
                        }
                    }
                }

                return result;
            };
        } else {
            return null;
        }
    }

    protected ValueConverter createValueConverterFor(TableId tableId, Column column, Field fieldDefn) {
        // this.valueConverterProvider.converter(column, fieldDefn) : 实际调用 io.debezium.connector.mysql.MySqlValueConverters#converter 
        return (ValueConverter)this.customConverterRegistry.getValueConverter(tableId, column).orElse(this.valueConverterProvider.converter(column, fieldDefn));
    }

即 如果用户没有为目标列配置自定义列值转换器,则:使用 debezium 的默认实现

Debezium ValueConverter : 列值转换器的顶级接口

  • io.debezium.relational.ValueConverter
//
// Source code recreated from a .class file by IntelliJ IDEA
// (powered by FernFlower decompiler)
//

package io.debezium.relational;

@FunctionalInterface
public interface ValueConverter {
    Object convert(Object var1);//核心接口,对列值的转换

    default ValueConverter or(ValueConverter fallback) {
        return fallback == null ? this : (data) -> {
            Object result = this.convert(data);
            return result == null && data != null ? fallback.convert(data) : result;
        };
    }

    default ValueConverter and(ValueConverter delegate) {
        return delegate == null ? this : (data) -> {
            return delegate.convert(this.convert(data));
        };
    }

    default ValueConverter nullOr() {
        return (data) -> {
            return data == null ? null : this.convert(data);
        };
    }

    static ValueConverter passthrough() {
        return (data) -> {
            return data;
        };
    }
}
  • 其负责实现ValueConverter接口的地方 (非所有)
  • io.debezium.relational.TableSchemaBuilder#wrapInMappingConverterIfNeeded(ColumnMappers mappers, TableId tableId, Column column, ValueConverter converter)
    /**
     * Obtain the array of converters for each column in a row. A converter might be null if the column is not be included in the records.
     *
     * @param schema the schema; may not be null
     * @param tableId the identifier of the table that contains the columns
     * @param columns the columns in the row; may not be null
     * @param mappers the mapping functions for columns; may be null if none of the columns are to be mapped to different values
     * @return the converters for each column in the rows; never null
     */
    //注:调用本方法的地方: 
    //io.debezium.relational.TableSchemaBuilder#createKeyGenerator
    //  ValueConverter[] converters = this.convertersForColumns(schema, columnSetName, columns, (ColumnMappers)null);
    //io.debezium.relational.TableSchemaBuilder#createValueGenerator
    //  ValueConverter[] converters = this.convertersForColumns(schema, tableId, columnsThatShouldBeAdded, mappers);
    protected ValueConverter[] convertersForColumns(Schema schema, TableId tableId, List<Column> columns, ColumnMappers mappers) {

        ValueConverter[] converters = new ValueConverter[columns.size()];

        for (int i = 0; i < columns.size(); i++) {
            Column column = columns.get(i);

            ValueConverter converter = createValueConverterFor(tableId, column, schema.field(fieldNamer.fieldNameFor(column)));
            converter = wrapInMappingConverterIfNeeded(mappers, tableId, column, converter);

            if (converter == null) {
                LOGGER.warn(
                        "No converter found for column {}.{} of type {}. The column will not be part of change events for that table.",
                        tableId, column.name(), column.typeName());
            }

            // may be null if no converter found
            converters[i] = converter;
        }

        return converters;
    }

    private ValueConverter wrapInMappingConverterIfNeeded(ColumnMappers mappers, TableId tableId, Column column, ValueConverter converter) {
        if (mappers == null || converter == null) {
            return converter;
        }

        ValueConverter mappingConverter = mappers.mappingConverterFor(tableId, column);
        if (mappingConverter == null) {
            return converter;
        }

        return (value) -> mappingConverter.convert(converter.convert(value));
    }
  • io.debezium.relational.CustomConverterRegistry#getValueConverter
    public Optional<ValueConverter> getValueConverter(TableId table, Column column) {
        final ConverterDefinition<SchemaBuilder> converterDefinition = conversionFunctionMap.get(fullColumnName(table, column));
        if (converterDefinition == null) {
            return Optional.empty();
        }
        return Optional.of(x -> {
            return converterDefinition.converter.convert(x);
        });
    }

Debezium CustomConverterRegistry : 自定义列值转换器的注册器

io.debezium.relational.CustomConverterRegistry : 自定义列值转换器的注册器
作为 io.debezium.relational.TableSchemaBuilder 的内部属性(customConverterRegistry)
TableSchemaBuilder#createValueConverterFor方法中customConverterRegistry属性被调用,以获取自定义列值转换器。其createValueConverterFor方法的上游调用链路

//io.debezium.relational.TableSchemaBuilder
protected ValueConverter[] convertersForColumns(Schema schema, TableId tableId, List<Column> columns, ColumnMappers mappers)//获取 columns 各列的列值转换器
	...
	ValueConverter converter = this.createValueConverterFor(tableId, column, schema.field(this.fieldNamer.fieldNameFor(column)));
	converter = this.wrapInMappingConverterIfNeeded(mappers, tableId, column, converter);
	...
  • public Optional<ValueConverter> getValueConverter(TableId table, Column column) : 获取
    /**
     * Obtain a pre-registered converter for a given column.
     *
     * @param table the table that contains the column
     * @param column the column metadata
     * @return the the value converter or empty if converter does not support the column
     */
    public Optional<ValueConverter> getValueConverter(TableId table, Column column) {//获取目标列的自定义列值转换器
        final ConverterDefinition<SchemaBuilder> converterDefinition = conversionFunctionMap.get(fullColumnName(table, column));
        if (converterDefinition == null) {
            return Optional.empty();
        }
        return Optional.of(x -> {//ValueConverter的实现类
		    //注释行是笔者追加的 debug 代码
            //Converter converter = converterDefinition.converter;
            //log.info("getValueConverter | columnName:{} ,typeName:{} ,jdbcType:{} | converterClass:{} | x:{}"
            //    , column.name(), column.typeName(), column.jdbcType()
            //    , converter.getClass().getCanonicalName()
            //    , x
            //);
            return converterDefinition.converter.convert(x);
        });
    }
  • CustomConverterRegistry.ConverterDefinition : CustomConverterRegistry 的内部类
//io.debezium.relational.CustomConverterRegistry.ConverterDefinition
    /**
     * Class binding together the schema of the conversion result and the converter code.
     *
     * @param <S> schema describing the output type, usually {@link org.apache.kafka.connect.data.SchemaBuilder}
     */
    public class ConverterDefinition<S> {
        public final S fieldSchema;
        public final CustomConverter.Converter converter;

        public ConverterDefinition(S fieldSchema, CustomConverter.Converter converter) {
            this.fieldSchema = fieldSchema;
            this.converter = converter;
        }
    }

Debezium 的 MySqlValueConverters extends JdbcValueConverters :debezium 对数据库字段与java列值转换的默认实现

Debezium JdbcValueConverters

  • io.debezium.jdbc.JdbcValueConverters implements io.debezium.relational.ValueConverterProvider

https://github.com/debezium/debezium/blob/v1.4.1.Final/debezium-core/src/main/java/io/debezium/jdbc/JdbcValueConverters.java

  • public ValueConverter converter(Column column, Field fieldDefn)
  • public SchemaBuilder schemaBuilder(Column column)
    /**
     * Create a new instance that always uses UTC for the default time zone when converting values without timezone information
     * to values that require timezones, and uses adapts time and timestamp values based upon the precision of the database
     * columns.
     */
    public JdbcValueConverters() {
        this(null, TemporalPrecisionMode.ADAPTIVE, ZoneOffset.UTC, null, null, null);
    }

    /**
     * Create a new instance, and specify the time zone offset that should be used only when converting values without timezone
     * information to values that require timezones. This default offset should not be needed when values are highly-correlated
     * with the expected SQL/JDBC types.
     *
     * @param decimalMode how {@code DECIMAL} and {@code NUMERIC} values should be treated; may be null if
     *            {@link DecimalMode#PRECISE} is to be used
     * @param temporalPrecisionMode temporal precision mode based on {@link io.debezium.jdbc.TemporalPrecisionMode}
     * @param defaultOffset the zone offset that is to be used when converting non-timezone related values to values that do
     *            have timezones; may be null if UTC is to be used
     * @param adjuster the optional component that adjusts the local date value before obtaining the epoch day; may be null if no
     *            adjustment is necessary
     * @param bigIntUnsignedMode how {@code BIGINT UNSIGNED} values should be treated; may be null if
     *            {@link BigIntUnsignedMode#PRECISE} is to be used
     * @param binaryMode how binary columns should be represented
     */
    public JdbcValueConverters(DecimalMode decimalMode, TemporalPrecisionMode temporalPrecisionMode, ZoneOffset defaultOffset,
                               TemporalAdjuster adjuster, BigIntUnsignedMode bigIntUnsignedMode, BinaryHandlingMode binaryMode) {
        this.defaultOffset = defaultOffset != null ? defaultOffset : ZoneOffset.UTC;//涉及时区配置
        this.adaptiveTimePrecisionMode = temporalPrecisionMode.equals(TemporalPrecisionMode.ADAPTIVE);
        this.adaptiveTimeMicrosecondsPrecisionMode = temporalPrecisionMode.equals(TemporalPrecisionMode.ADAPTIVE_TIME_MICROSECONDS);
        this.decimalMode = decimalMode != null ? decimalMode : DecimalMode.PRECISE;
        this.adjuster = adjuster;
        this.bigIntUnsignedMode = bigIntUnsignedMode != null ? bigIntUnsignedMode : BigIntUnsignedMode.PRECISE;
        this.binaryMode = binaryMode != null ? binaryMode : BinaryHandlingMode.BYTES;

        this.fallbackTimestampWithTimeZone = ZonedTimestamp.toIsoString(//涉及时区配置
                OffsetDateTime.of(LocalDate.ofEpochDay(0), LocalTime.MIDNIGHT, defaultOffset),
                defaultOffset,
                adjuster);
        this.fallbackTimeWithTimeZone = ZonedTime.toIsoString(//涉及时区配置
                OffsetTime.of(LocalTime.MIDNIGHT, defaultOffset),
                defaultOffset,
                adjuster);
    }

    public ValueConverter converter(Column column, Field fieldDefn) {
        switch (column.jdbcType()) {
            ...
            // Date and time values
            case Types.DATE:
                if (adaptiveTimePrecisionMode || adaptiveTimeMicrosecondsPrecisionMode) {
                    return (data) -> convertDateToEpochDays(column, fieldDefn, data);
                }
                return (data) -> convertDateToEpochDaysAsDate(column, fieldDefn, data);
            case Types.TIME:
                return (data) -> convertTime(column, fieldDefn, data);
            case Types.TIMESTAMP:
                if (adaptiveTimePrecisionMode || adaptiveTimeMicrosecondsPrecisionMode) {
                    if (getTimePrecision(column) <= 3) {
                        return data -> convertTimestampToEpochMillis(column, fieldDefn, data); //dbz.Timestamp => long
                    }
                    if (getTimePrecision(column) <= 6) {
                        return data -> convertTimestampToEpochMicros(column, fieldDefn, data);//dbz.MicroTimestamp => long
                    }
                    return (data) -> convertTimestampToEpochNanos(column, fieldDefn, data);//dbz.NanoTimestamp => long
                }
                return (data) -> convertTimestampToEpochMillisAsDate(column, fieldDefn, data);//dbz.Timestamp => java.util.Date
            case Types.TIME_WITH_TIMEZONE:
                return (data) -> convertTimeWithZone(column, fieldDefn, data);
            case Types.TIMESTAMP_WITH_TIMEZONE:
                return (data) -> convertTimestampWithZone(column, fieldDefn, data);

            // Other types ...
    }

    // MYSQL Timestamp 类型 => java.sql.Types.TIMESTAMP_WITH_TIMEZONE(2014) => java.lang.String : convertTimestampWithZone(column, fieldDefn, data)
    protected Object convertTimestampWithZone(Column column, Field fieldDefn, Object data) {
        return convertValue(column, fieldDefn, data, fallbackTimestampWithTimeZone, (r) -> {
            try {
                r.deliver(ZonedTimestamp.toIsoString(data, defaultOffset, adjuster));//返回 java.lang.String
            }
            catch (IllegalArgumentException e) {
            }
        });
    }

    // MYSQL Datetime 类型 => java.sql.Types.TIMESTAMP(93) => [情况1] java.lang.Long: convertTimestampToEpochMillis(column, fieldDefn, data)
    protected Object convertTimestampToEpochMillis(Column column, Field fieldDefn, Object data) {
        // epoch is the fallback value
        return convertValue(column, fieldDefn, data, 0L, (r) -> {
            try {
                r.deliver(Timestamp.toEpochMillis(data, adjuster));// Timestamp: io.debezium.time.Timestamp , 返回 : long
            }
            catch (IllegalArgumentException e) {
            }
        });
    }

    // MYSQL Datetime 类型 => java.sql.Types.TIMESTAMP(93) => [情况2] java.lang.Long: convertTimestampToEpochMicros(column, fieldDefn, data)
    protected Object convertTimestampToEpochMicros(Column column, Field fieldDefn, Object data) {
        // epoch is the fallback value
        return convertValue(column, fieldDefn, data, 0L, (r) -> {
            try {
                r.deliver(MicroTimestamp.toEpochMicros(data, adjuster));// Timestamp: io.debezium.time.MicroTimestamp , 返回 : long
            }
            catch (IllegalArgumentException e) {
            }
        });
    }

    // MYSQL Datetime 类型 => java.sql.Types.TIMESTAMP(93) => [情况3] java.lang.Long: convertTimestampToEpochMicros(column, fieldDefn, data)
    protected Object convertTimestampToEpochNanos(Column column, Field fieldDefn, Object data) {
        // epoch is the fallback value
        return convertValue(column, fieldDefn, data, 0L, (r) -> {
            try {
                r.deliver(NanoTimestamp.toEpochNanos(data, adjuster));// Timestamp: io.debezium.time.NanoTimestamp, 返回 : long
            }
            catch (IllegalArgumentException e) {
            }
        });
    }

    // MYSQL Datetime 类型 => java.sql.Types.TIMESTAMP(93) => [情况4] convertTimestampToEpochMillisAsDate(column, fieldDefn, data)
    protected Object convertTimestampToEpochMillisAsDate(Column column, Field fieldDefn, Object data) {
        // epoch is the fallback value
        return convertValue(column, fieldDefn, data, new java.util.Date(0L), (r) -> {
            try {
                r.deliver(new java.util.Date(Timestamp.toEpochMillis(data, adjuster)));// Timestamp: java.util.Date 返回 : java.util.Date
            }
            catch (IllegalArgumentException e) {
            }
        });
    }

    protected Object convertValue(Column column, Field fieldDefn, Object data, Object fallback, ValueConversionCallback callback) {
        if (data == null) {
            if (column.isOptional()) {
                return null;
            } else {
                Object schemaDefault = fieldDefn.schema().defaultValue();
                return schemaDefault != null ? schemaDefault : fallback;
            }
        } else {
            this.logger.trace("Value from data object: *** {} ***", data);
            ResultReceiver r = ResultReceiver.create();
            callback.convert(r);
            this.logger.trace("Callback is: {}", callback);
            this.logger.trace("Value from ResultReceiver: {}", r);
            return r.hasReceived() ? r.get() : this.handleUnknownData(column, fieldDefn, data);
        }
    }
  • io.debezium.time.Timestamp

https://github.com/debezium/debezium/blob/v1.4.1.Final/debezium-core/src/main/java/io/debezium/time/Timestamp.java

    public static long toEpochMillis(Object value, TemporalAdjuster adjuster) {
        if (value instanceof Long) {
            return (Long) value;
        }
        LocalDateTime dateTime = Conversions.toLocalDateTime(value);
        if (adjuster != null) {
            dateTime = dateTime.with(adjuster);
        }

        return dateTime.toInstant(ZoneOffset.UTC).toEpochMilli();
    }
  • io.debezium.time.MicroTimestamp

https://github.com/debezium/debezium/blob/v1.4.1.Final/debezium-core/src/main/java/io/debezium/time/MicroTimestamp.java

    public static long toEpochMicros(Object value, TemporalAdjuster adjuster) {
        LocalDateTime dateTime = Conversions.toLocalDateTime(value);
        if (adjuster != null) {
            dateTime = dateTime.with(adjuster);
        }
        return Conversions.toEpochMicros(dateTime.toInstant(ZoneOffset.UTC));// UTC+0 时区
    }
  • io.debezium.time.MicroTimestamp

https://github.com/debezium/debezium/blob/v1.4.1.Final/debezium-core/src/main/java/io/debezium/time/NanoTimestamp.java
https://github.com/debezium/debezium/blob/v1.4.1.Final/debezium-core/src/main/java/io/debezium/time/Conversions.java

    public static long toEpochNanos(Object value, TemporalAdjuster adjuster) {
        LocalDateTime dateTime = Conversions.toLocalDateTime(value);
        if (adjuster != null) {
            dateTime = dateTime.with(adjuster);
        }
        return toEpochNanos(dateTime);
    }

    private static long toEpochNanos(LocalDateTime timestamp) {
        long nanoInDay = timestamp.toLocalTime().toNanoOfDay();
        long nanosOfDay = toEpochNanos(timestamp.toLocalDate());
        return nanosOfDay + nanoInDay;
    }

    private static long toEpochNanos(LocalDate date) {
        long epochDay = date.toEpochDay();
        return epochDay * Conversions.NANOSECONDS_PER_DAY;
    }

Debezium MySqlValueConverters

  • io.debezium.connector.mysql.MySqlValueConverters extends JdbcValueConverters

https://github.com/debezium/debezium/blob/v1.4.1.Final/debezium-connector-mysql/src/main/java/io/debezium/connector/mysql/MySqlValueConverters.java

...
import java.sql.Timestamp;
...
   public MySqlValueConverters(DecimalMode decimalMode, TemporalPrecisionMode temporalPrecisionMode, BigIntUnsignedMode bigIntUnsignedMode,
                                BinaryHandlingMode binaryMode,
                                TemporalAdjuster adjuster, ParsingErrorHandler parsingErrorHandler) {
        super(decimalMode, temporalPrecisionMode, ZoneOffset.UTC, adjuster, bigIntUnsignedMode, binaryMode);//此处写死了,defaultOffset = ZoneOffset.UTC (UTC+0时区)
        this.parsingErrorHandler = parsingErrorHandler;
    }

    @Override
    public ValueConverter converter(Column column, Field fieldDefn) {
        ...

        // We have to convert bytes encoded in the column's character set ...
        switch (column.jdbcType()) {
            // Types 即 : java.sql.Types
            case Types.CHAR: // variable-length
            case Types.VARCHAR: // variable-length
            case Types.LONGVARCHAR: // variable-length
            case Types.CLOB: // variable-length
            case Types.NCHAR: // fixed-length
            case Types.NVARCHAR: // fixed-length
            case Types.LONGNVARCHAR: // fixed-length
            case Types.NCLOB: // fixed-length
            case Types.DATALINK:
            case Types.SQLXML:
                Charset charset = charsetFor(column);
                if (charset != null) {
                    logger.debug("Using {} charset by default for column: {}", charset, column);
                    return (data) -> convertString(column, fieldDefn, charset, data);
                }
                logger.warn("Using UTF-8 charset by default for column without charset: {}", column);
                return (data) -> convertString(column, fieldDefn, StandardCharsets.UTF_8, data);
            case Types.TIME: // java.sql.Types#TIME (92)
                if (adaptiveTimeMicrosecondsPrecisionMode) {
                    return data -> convertDurationToMicroseconds(column, fieldDefn, data);
                }
            case Types.TIMESTAMP: // java.sql.Types#TIMESTAMP (93)
                return ((ValueConverter) (data -> convertTimestampToLocalDateTime(column, fieldDefn, data))).and(super.converter(column, fieldDefn));
                // 调用 : convertTimestampToLocalDateTime method
            default:
                break;
        }

        // Otherwise, let the base class handle it ...
        return super.converter(column, fieldDefn);
    }

    protected Object convertTimestampToLocalDateTime(Column column, Field fieldDefn, Object data) {
        if (data == null && !fieldDefn.schema().isOptional()) {
            return null;
        }
        if (!(data instanceof Timestamp)) {
            return data;
        }

        return ((Timestamp) data).toLocalDateTime();
    }

案例示范

案例:MySqlDateTimeConverter : MYSQL日期时间列值转换器

Step0 需求分析 : debezium 对 时间字段的默认实现

  • mysql转换的默认策略

mysql启动时,快照期间初始化转换器,在binlog期间仍进行一次初始化转换器。(使用的类不同)

MYSQL 字段类型 快照类型(jdbcType)
Debezium TableSchemaBuilder 转换前的原始类型
Debezium JdbcValueConverters 的转换类型 SourceRecord 的列值类型
DATE java.time.LocalDate(java.sql.Types#TIMESTAMP/93)?待验证 未知 未知
TIME java.time.Duration(java.sql.Types#TIME/92)?待验证 未知 未知
DATETIME java.sql.Timestamp(java.sql.Types#TIMESTAMP/93) io.debezium.time.Timestamp => return long
io.debezium.time.MicroTimestamp => return long
io.debezium.time.NanoTimestamp => return long
java.util.Date => return java.util.Date
java.lang.Long
TIMESTAMP java.sql.Timestamp(java.sql.Types#TIMESTAMP_WITH_TIMEZONE/2014) io.debezium.time.ZonedTime => return string java.lang.String
  • MYSQL样例表的样例数据

MYSQL时区配置 : system_time_zone +08 | time_zone SYSTEM

id[bitint(20)]=38
createTime[datetime(3) 类型] = '2024-01-31 17:56:43.717000000' 
    => 若以 UTC+8 时区转换为时间戳,则 : 1706695003717
createTimeTs[timestamp 类型] = '2024-01-31 17:56:44' 
    => 若以 UTC+8 时区转为时间戳,则: 1706695004000 (毫秒级时间戳)
	select UNIX_TIMESTAMP(createTimeTs) = 1706695004 (秒级时间戳)
  • 试验1:MYSQL时区(time_zone)=UTC+8 | Debezium/FlinkCDC MySQLSource.serverTimeZone = utc+8 , 不配置任何自定义的日期时间列值转换器
[2024/12/04 20:42:26.248] [TRACE] [debezium-mysqlconnector-mysql_binlog_source-snapshot] [io.debezium.connector.mysql.MySqlValueConverters            :1289 convertValue] Value from ResultReceiver: [received = true, object = 1706723803717]

[2024/12/04 20:42:26.248] [INFO ] [debezium-mysqlconnector-mysql_binlog_source-snapshot] [io.debezium.relational.TableSchemaBuilder                   :162 lambda$createValueGenerator$5] columnName : createTime | typeName : DATETIME, jdbcType : 93 | converterClass:io.debezium.relational.ValueConverter$$Lambda$986/143348969 | oldValue: 2024-01-31T17:56:43.717+0800(class:java.sql.Timestamp) , newValue:1706723803717(class:java.lang.Long)


[2024/12/04 20:42:26.305] [TRACE] [debezium-mysqlconnector-mysql_binlog_source-snapshot] [io.debezium.connector.mysql.MySqlValueConverters            :1289 convertValue] Value from ResultReceiver: [received = true, object = 2024-01-31T09:56:44Z]

[2024/12/04 20:42:26.305] [INFO ] [debezium-mysqlconnector-mysql_binlog_source-snapshot] [io.debezium.relational.TableSchemaBuilder                   :162 lambda$createValueGenerator$5] columnName : createTimeTs | typeName : TIMESTAMP, jdbcType : 2014 | converterClass:io.debezium.jdbc.JdbcValueConverters$$Lambda$942/460149074 | oldValue: 2024-01-31T17:56:44.000+0800(class:java.sql.Timestamp) , newValue:2024-01-31T09:56:44Z(class:java.lang.String)

[2024/12/04 21:29:03.868] [INFO ] [debezium-engine] [com.xxx.cdc.mysql.MysqlCdcDeserializationSchema         :71 deserialize] id: 38, createTime: 1706723803717, type: java.lang.Long
[2024/12/04 21:29:03.869] [TRACE] [debezium-mysqlconnector-mysql_binlog_source-snapshot] [io.debezium.connector.mysql.MySqlValueConverters            :1288 convertValue] Callback is: io.debezium.jdbc.JdbcValueConverters$$Lambda$1015/2001322212@30fe60b4
[2024/12/04 21:29:03.869] [INFO ] [debezium-engine] [com.xxx.cdc.mysql.MysqlCdcDeserializationSchema         :72 deserialize] id: 38, createTimeTs: 2024-01-31T09:56:44Z, type: java.lang.String
  • 试验2: MYSQL时区(time_zone)=UTC+8 | Debezium/FlinkCDC MySQLSource.serverTimeZone = utc, 不配置任何自定义的日期时间列值转换器
[2024/12/04 20:58:19.316] [TRACE] [debezium-mysqlconnector-mysql_binlog_source-snapshot] [io.debezium.connector.mysql.MySqlValueConverters            :1289 convertValue] Value from ResultReceiver: [received = true, object = 1706723803717]

[2024/12/04 20:58:19.317] [INFO ] [debezium-mysqlconnector-mysql_binlog_source-snapshot] [io.debezium.relational.TableSchemaBuilder                   :162 lambda$createValueGenerator$5] columnName : createTime | typeName : DATETIME, jdbcType : 93 | converterClass:io.debezium.relational.ValueConverter$$Lambda$958/157024793 | oldValue: 2024-01-31T17:56:43.717+0800(class:java.sql.Timestamp) , newValue:1706723803717(class:java.lang.Long)

[2024/12/04 20:58:19.342] [TRACE] [debezium-mysqlconnector-mysql_binlog_source-snapshot] [io.debezium.connector.mysql.MySqlValueConverters            :1289 convertValue] Value from ResultReceiver: [received = true, object = 2024-01-31T17:56:44Z]

[2024/12/04 20:58:19.342] [INFO ] [debezium-mysqlconnector-mysql_binlog_source-snapshot] [io.debezium.relational.TableSchemaBuilder                   :162 lambda$createValueGenerator$5] columnName : createTimeTs | typeName : TIMESTAMP, jdbcType : 2014 | converterClass:io.debezium.jdbc.JdbcValueConverters$$Lambda$915/1619603243 | oldValue: 2024-02-01T01:56:44.000+0800(class:java.sql.Timestamp) , newValue:2024-01-31T17:56:44Z(class:java.lang.String)

[2024/12/04 21:16:56.082] [INFO ] [debezium-engine] [com.xxx.cdc.mysql.MysqlCdcDeserializationSchema         :71 deserialize] id: 38, createTime: 1706723803717, type: java.lang.Long

[2024/12/04 21:16:56.082] [TRACE] [debezium-mysqlconnector-mysql_binlog_source-snapshot] [io.debezium.connector.mysql.MySqlValueConverters            :1288 convertValue] Callback is: io.debezium.jdbc.JdbcValueConverters$$Lambda$1075/1910762610@4f2229a4

[2024/12/04 21:16:56.082] [INFO ] [debezium-engine] [com.xxx.cdc.mysql.MysqlCdcDeserializationSchema         :72 deserialize] id: 38, createTimeTs: 2024-01-31T17:56:44Z, type: java.lang.String
  • sqlserver 转换

参见 : https://cloud.tencent.com/developer/article/2216144 (仅供参考)

Step1 Debezium 依赖引入

<dependency>
    <groupId>io.debezium</groupId>
    <artifactId>debezium-api</artifactId>
    <version>${debezium.version}</version> 1
</dependency>
<dependency>
    <groupId>org.apache.kafka</groupId>
    <artifactId>connect-api</artifactId>
    <version>${kafka.version}</version> 2
</dependency>
  • debezium.version = 1.4.1.Final (与 flink cdc :1.3.0 内置的 debezium 版本一致)
  • kafka.version = 2.6.1

Step2 自定义 Debezium CustomConverter


import io.debezium.spi.converter.CustomConverter;
import io.debezium.spi.converter.RelationalColumn;
import org.apache.kafka.connect.data.SchemaBuilder;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

import java.time.*;
import java.time.format.DateTimeFormatter;
import java.util.Properties;
import java.util.function.Consumer;

/**
 * 处理Debezium时间转换的问题
 */
public class MySqlDateTimeConverter implements CustomConverter<SchemaBuilder, RelationalColumn> {

    private final static Logger logger = LoggerFactory.getLogger(MySqlDateTimeConverter.class);

    private DateTimeFormatter dateFormatter = DateTimeFormatter.ISO_DATE;
    private DateTimeFormatter timeFormatter = DateTimeFormatter.ISO_TIME;
    private DateTimeFormatter datetimeFormatter = DateTimeFormatter.ISO_DATE_TIME;
    private DateTimeFormatter timestampFormatter = DateTimeFormatter.ISO_DATE_TIME;

    private ZoneId timestampZoneId = ZoneId.systemDefault();

    @Override
    public void configure(Properties props) {
        readProps(props, "format.date", p -> dateFormatter = DateTimeFormatter.ofPattern(p));
        readProps(props, "format.time", p -> timeFormatter = DateTimeFormatter.ofPattern(p));
        readProps(props, "format.datetime", p -> datetimeFormatter = DateTimeFormatter.ofPattern(p));
        readProps(props, "format.timestamp", p -> timestampFormatter = DateTimeFormatter.ofPattern(p));
        readProps(props, "format.timestamp.zone", z -> timestampZoneId = ZoneId.of(z));
    }

    private void readProps(Properties properties, String settingKey, Consumer<String> callback) {
        String settingValue = (String) properties.get(settingKey);
        if (settingValue == null || settingValue.length() == 0) {
            return;
        }
        try {
            callback.accept(settingValue.trim());
        } catch (IllegalArgumentException | DateTimeException e) {
            logger.error("The {} setting is illegal: {}", settingKey, settingValue);
            throw e;
        }
    }

    @Override
    public void converterFor(RelationalColumn column, ConverterRegistration<SchemaBuilder> registration) {
        String sqlType = column.typeName().toUpperCase();
        SchemaBuilder schemaBuilder = null;
        Converter converter = null;
        if ("DATE".equals(sqlType)) {
            schemaBuilder = SchemaBuilder.string().optional().name("com.unicdata.debezium.date.string");
            converter = this::convertDate;
        }
        if ("TIME".equals(sqlType)) {
            schemaBuilder = SchemaBuilder.string().optional().name("com.unicdata.debezium.time.string");
            converter = this::convertTime;
        }
        if ("DATETIME".equals(sqlType)) {
            schemaBuilder = SchemaBuilder.string().optional().name("com.unicdata.debezium.datetime.string");
            converter = this::convertDateTime;
        }
        if ("TIMESTAMP".equals(sqlType)) {
            schemaBuilder = SchemaBuilder.string().optional().name("com.unicdata.debezium.timestamp.string");
            converter = this::convertTimestamp;
        }
        if (schemaBuilder != null) {
            registration.register(schemaBuilder, converter);
        }
    }

    private String convertDate(Object input) {
        if (input == null) {
            return null;
        }
        if (input instanceof LocalDate) {
            return dateFormatter.format((LocalDate) input);
        }
        if (input instanceof Integer) {
            LocalDate date = LocalDate.ofEpochDay((Integer) input);
            return dateFormatter.format(date);
        }
        return String.valueOf(input);
    }

    private String convertTime(Object input) {
        if (input == null) {
            return null;
        }
        if (input instanceof Duration) {
            Duration duration = (Duration) input;
            long seconds = duration.getSeconds();
            int nano = duration.getNano();
            LocalTime time = LocalTime.ofSecondOfDay(seconds).withNano(nano);
            return timeFormatter.format(time);
        }
        return String.valueOf(input);
    }

    private String convertDateTime(Object input) {
        if (input == null) {
            return null;
        }
        if (input instanceof LocalDateTime) {
            return datetimeFormatter.format((LocalDateTime) input);
        }
        return String.valueOf(input);
    }

    private String convertTimestamp(Object input) {
        if (input == null) {
            return null;
        }
        if (input instanceof ZonedDateTime) {
            // mysql的timestamp会转成UTC存储,这里的zonedDatetime都是UTC时间
            ZonedDateTime zonedDateTime = (ZonedDateTime) input;
            LocalDateTime localDateTime = zonedDateTime.withZoneSameInstant(timestampZoneId).toLocalDateTime();
            return timestampFormatter.format(localDateTime);
        }
        return String.valueOf(input);
    }
}

Step3 Debezium Properties中定义自定义的列值转换器

  • Source阶段添加该配置
public static Properties getDebeziumProperties() {
	Properties properties = new Properties();
	properties.setProperty("converters", "dateConverters");
	//根据类在那个包下面修改
	properties.setProperty("dateConverters.type", "com.xxx.bdz.schema.MySqlDateTimeConverter");
	properties.setProperty("dateConverters.database.type", "mysql");
	properties.setProperty("dateConverters.format.date", "yyyy-MM-dd");
	properties.setProperty("dateConverters.format.time", "HH:mm:ss");
	properties.setProperty("dateConverters.format.datetime", "yyyy-MM-dd HH:mm:ss");
	properties.setProperty("dateConverters.format.timestamp", "yyyy-MM-dd HH:mm:ss");
	properties.setProperty("dateConverters.format.timestamp.zone", "UTC+8");
	properties.setProperty("debezium.snapshot.locking.mode", "none"); //全局读写锁,可能会影响在线业务,跳过锁设置
	properties.setProperty("bigint.unsigned.handling.mode", "long");
	properties.setProperty("decimal.handling.mode", "string");

	return properties;
}

//flink cdc 中的使用方式
//MySqlSource<String> mySqlSource = MySqlSource.<String>builder()
SourceFunction<String> sourceCdc = MySQLSource.<String>builder()
	.hostname( appArgs.getHost())
	.port(Integer.parseInt( appArgs.getPort()))
	.databaseList( appArgs.getDatabaseName()) // set captured database
	.tableList( tableList) // set captured table
	.username( appArgs.getUserName())
	.password( appArgs.getPassword())
	//.includeSchemaChanges(true)
	.debeziumProperties( getDebeziumProperties())
	.deserializer( new JsonDebeziumDeserializationSchema()) // converts SourceRecord to JSON String
	.startupOptions( getStartUpMode(appArgs))
	.serverTimeZone( "Asia/Shanghai" )
	.build();

X 参考文献

  • openjdk

https://github.com/halfye/Debezium-Converter/blob/master/src/main/java/org/util/DebeziumConverter.java 【推荐】

public class MySqlDateTimeConverter implements CustomConverter<SchemaBuilder, RelationalColumn>
Debezium默认将MySQL中datetime类型转成UTC的时间戳({@link io.debezium.time.Timestamp}),时区是写死的无法更改
Debezium默认的做法,将导致数据库中设置的UTC+8,到kafka中变成了多八个小时的long型时间戳
Debezium默认将MySQL中的timestamp类型转成时间字符串。

mysql mysql-binlog-connector debezium
date
(2021-01-28)
LocalDate
(2021-01-28)
Integer
(18655)
time
(17:29:04)
Duration
(PT17H29M4S)
Long
(62944000000)
timestamp
(2021-01-28 17:29:04)
ZonedDateTime
(2021-01-28T09:29:04Z)
String
(2021-01-28T09:29:04Z)
Datetime
(2021-01-28 17:29:04)
LocalDateTime
(2021-01-28T17:29:04)
Long
(1611854944000)
posted @ 2024-12-04 20:43  千千寰宇  阅读(36)  评论(0编辑  收藏  举报