蓝天

Flink MySQL CDC 增量同步要求源表有主键

版本:2.2

如果源表没有主键,则运行时报错:

2023-03-13 21:28:25,244 INFO  [679] [com.ververica.cdc.connectors.mysql.source.assigners.ChunkSplitter.generateSplits(ChunkSplitter.java:78)]  - Start splitting table test_db.test_table into chunks...
2023-03-13 21:28:25,283 INFO  [681] [io.debezium.jdbc.JdbcConnection.lambda$doClose$3(JdbcConnection.java:946)]  - Connection gracefully closed
2023-03-13 21:28:25,284 ERROR [678] [org.apache.flink.runtime.source.coordinator.SourceCoordinator.lambda$runInEventLoop$9(SourceCoordinator.java:415)]  - Uncaught exception in the SplitEnumerator for Source Source: cdc_test_table[1] while handling operator event RequestSplitEvent (host='3.27.15.22') from subtask 0. Triggering job failover.
org.apache.flink.util.FlinkRuntimeException: Chunk splitting has encountered exception
        at com.ververica.cdc.connectors.mysql.source.assigners.MySqlSnapshotSplitAssigner.checkSplitterErrors(MySqlSnapshotSplitAssigner.java:412)
        at com.ververica.cdc.connectors.mysql.source.assigners.MySqlSnapshotSplitAssigner.getNext(MySqlSnapshotSplitAssigner.java:206)
        at com.ververica.cdc.connectors.mysql.source.assigners.MySqlSnapshotSplitAssigner.getNext(MySqlSnapshotSplitAssigner.java:224)
        at com.ververica.cdc.connectors.mysql.source.assigners.MySqlHybridSplitAssigner.getNext(MySqlHybridSplitAssigner.java:125)
        at com.ververica.cdc.connectors.mysql.source.enumerator.MySqlSourceEnumerator.assignSplits(MySqlSourceEnumerator.java:203)
        at com.ververica.cdc.connectors.mysql.source.enumerator.MySqlSourceEnumerator.handleSplitRequest(MySqlSourceEnumerator.java:119)
        at org.apache.flink.runtime.source.coordinator.SourceCoordinator.lambda$handleEventFromOperator$2(SourceCoordinator.java:230)
        at org.apache.flink.runtime.source.coordinator.SourceCoordinator.lambda$runInEventLoop$9(SourceCoordinator.java:406)
        at org.apache.flink.util.ThrowableCatchingRunnable.run(ThrowableCatchingRunnable.java:40)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.flink.util.FlinkRuntimeException: Generate Splits for table test_db.test_table error
        at com.ververica.cdc.connectors.mysql.source.assigners.ChunkSplitter.generateSplits(ChunkSplitter.java:115)
        at com.ververica.cdc.connectors.mysql.source.assigners.MySqlSnapshotSplitAssigner.splitChunksForRemainingTables(MySqlSnapshotSplitAssigner.java:390)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        ... 3 more
Caused by: org.apache.flink.table.api.ValidationException: Incremental snapshot for tables requires primary key, but table test_db.test_table doesn't have primary key.
        at com.ververica.cdc.connectors.mysql.source.utils.ChunkUtils.getSplitColumn(ChunkUtils.java:63)

相关代码:

public static Column getSplitColumn(Table table) {
    List<Column> primaryKeys = table.primaryKeyColumns();
    if (primaryKeys.isEmpty()) {
        throw new ValidationException(
                String.format(
                        "Incremental snapshot for tables requires primary key,"
                                + " but table %s doesn't have primary key.",
                        table.id()));
    }

    // use first field in primary key as the split key
    return primaryKeys.get(0);
}

/** Generates all snapshot splits (chunks) for the give table path. */
public Collection<MySqlSnapshotSplit> generateSplits(TableId tableId) {
    try (JdbcConnection jdbc = openJdbcConnection(sourceConfig)) {

        LOG.info("Start splitting table {} into chunks...", tableId);
        long start = System.currentTimeMillis();

        Table table = mySqlSchema.getTableSchema(jdbc, tableId).getTable();
        Column splitColumn = ChunkUtils.getSplitColumn(table); // 源表没有主键会走到这里,并抛出异常
        final List<ChunkRange> chunks;
        try {
            chunks = splitTableIntoChunks(jdbc, tableId, splitColumn);
        } catch (SQLException e) {
            throw new FlinkRuntimeException("Failed to split chunks for table " + tableId, e);
        }

        // convert chunks into splits
        List<MySqlSnapshotSplit> splits = new ArrayList<>();
        RowType splitType = ChunkUtils.getSplitType(splitColumn);
        for (int i = 0; i < chunks.size(); i++) {
            ChunkRange chunk = chunks.get(i);
            MySqlSnapshotSplit split =
                    createSnapshotSplit(
                            jdbc,
                            tableId,
                            i,
                            splitType,
                            chunk.getChunkStart(),
                            chunk.getChunkEnd());
            splits.add(split);
        }

        long end = System.currentTimeMillis();        
        LOG.info( // 成功会走到这里
                "Split table {} into {} chunks, time cost: {}ms.",
                tableId,
                splits.size(),
                end - start);
        return splits;
    } catch (Exception e) {
        throw new FlinkRuntimeException( // 源表没有主键会走到这里
                String.format("Generate Splits for table %s error", tableId), e);
    }
}

成功时可看到的日志:

2023-03-13 22:10:39,704 INFO  [47] [org.apache.hudi.common.table.HoodieTableConfig.<init>(HoodieTableConfig.java:295)]  - Loading table properties from hdfs://test/user/root/warehouse/hudi_test.db/hudi_test_table/.hoodie/hoodie.properties
2023-03-13 22:10:40,005 INFO  [36] [org.apache.hadoop.ipc.Client$Connection.handleConnectionFailure(Client.java:968)]  - Retrying connect to server: qy-flink13-rm1.tianqiong.woa.com/11.149.49.26:18030. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1000 MILLISECONDS)
2023-03-13 22:10:40,110 INFO  [620] [com.ververica.cdc.connectors.mysql.source.assigners.ChunkSplitter.calculateDistributionFactor(ChunkSplitter.java:300)]  - The distribution factor of table test_db.test_table is 34301.9489 according to the min split key 1, max split key 33547306 and approximate row count 978
2023-03-13 22:10:40,111 INFO  [620] [com.ververica.cdc.connectors.mysql.source.assigners.ChunkSplitter.splitUnevenlySizedChunks(ChunkSplitter.java:204)]  - Use unevenly-sized chunks for table test_db.test_table, the chunk size is 8096
2023-03-13 22:10:40,145 INFO  [47] [org.apache.hudi.common.table.HoodieTableMetaClient.<init>(HoodieTableMetaClient.java:144)]  - Finished Loading Table of type MERGE_ON_READ(version=1, baseFileFormat=PARQUET) from hdfs://test/user/root/warehouse/hudi_test.db/hudi_test_table
2023-03-13 22:10:40,156 INFO  [620] [com.ververica.cdc.connectors.mysql.source.assigners.ChunkSplitter.generateSplits(ChunkSplitter.java:107)]  - Split table test_db.test_table into 1 chunks, time cost: 720ms.
2023-03-13 22:10:40,157 INFO  [626] [io.debezium.jdbc.JdbcConnection.lambda$doClose$3(JdbcConnection.java:946)]  - Connection gracefully closed

posted on 2023-03-14 09:39  #蓝天  阅读(698)  评论(0编辑  收藏  举报

导航