Flink MySQL CDC 增量同步要求源表有主键
版本:2.2
如果源表没有主键,则运行时报错:
2023-03-13 21:28:25,244 INFO [679] [com.ververica.cdc.connectors.mysql.source.assigners.ChunkSplitter.generateSplits(ChunkSplitter.java:78)] - Start splitting table test_db.test_table into chunks...
2023-03-13 21:28:25,283 INFO [681] [io.debezium.jdbc.JdbcConnection.lambda$doClose$3(JdbcConnection.java:946)] - Connection gracefully closed
2023-03-13 21:28:25,284 ERROR [678] [org.apache.flink.runtime.source.coordinator.SourceCoordinator.lambda$runInEventLoop$9(SourceCoordinator.java:415)] - Uncaught exception in the SplitEnumerator for Source Source: cdc_test_table[1] while handling operator event RequestSplitEvent (host='3.27.15.22') from subtask 0. Triggering job failover.
org.apache.flink.util.FlinkRuntimeException: Chunk splitting has encountered exception
at com.ververica.cdc.connectors.mysql.source.assigners.MySqlSnapshotSplitAssigner.checkSplitterErrors(MySqlSnapshotSplitAssigner.java:412)
at com.ververica.cdc.connectors.mysql.source.assigners.MySqlSnapshotSplitAssigner.getNext(MySqlSnapshotSplitAssigner.java:206)
at com.ververica.cdc.connectors.mysql.source.assigners.MySqlSnapshotSplitAssigner.getNext(MySqlSnapshotSplitAssigner.java:224)
at com.ververica.cdc.connectors.mysql.source.assigners.MySqlHybridSplitAssigner.getNext(MySqlHybridSplitAssigner.java:125)
at com.ververica.cdc.connectors.mysql.source.enumerator.MySqlSourceEnumerator.assignSplits(MySqlSourceEnumerator.java:203)
at com.ververica.cdc.connectors.mysql.source.enumerator.MySqlSourceEnumerator.handleSplitRequest(MySqlSourceEnumerator.java:119)
at org.apache.flink.runtime.source.coordinator.SourceCoordinator.lambda$handleEventFromOperator$2(SourceCoordinator.java:230)
at org.apache.flink.runtime.source.coordinator.SourceCoordinator.lambda$runInEventLoop$9(SourceCoordinator.java:406)
at org.apache.flink.util.ThrowableCatchingRunnable.run(ThrowableCatchingRunnable.java:40)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.flink.util.FlinkRuntimeException: Generate Splits for table test_db.test_table error
at com.ververica.cdc.connectors.mysql.source.assigners.ChunkSplitter.generateSplits(ChunkSplitter.java:115)
at com.ververica.cdc.connectors.mysql.source.assigners.MySqlSnapshotSplitAssigner.splitChunksForRemainingTables(MySqlSnapshotSplitAssigner.java:390)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
... 3 more
Caused by: org.apache.flink.table.api.ValidationException: Incremental snapshot for tables requires primary key, but table test_db.test_table doesn't have primary key.
at com.ververica.cdc.connectors.mysql.source.utils.ChunkUtils.getSplitColumn(ChunkUtils.java:63)
相关代码:
public static Column getSplitColumn(Table table) {
List<Column> primaryKeys = table.primaryKeyColumns();
if (primaryKeys.isEmpty()) {
throw new ValidationException(
String.format(
"Incremental snapshot for tables requires primary key,"
+ " but table %s doesn't have primary key.",
table.id()));
}
// use first field in primary key as the split key
return primaryKeys.get(0);
}
/** Generates all snapshot splits (chunks) for the give table path. */
public Collection<MySqlSnapshotSplit> generateSplits(TableId tableId) {
try (JdbcConnection jdbc = openJdbcConnection(sourceConfig)) {
LOG.info("Start splitting table {} into chunks...", tableId);
long start = System.currentTimeMillis();
Table table = mySqlSchema.getTableSchema(jdbc, tableId).getTable();
Column splitColumn = ChunkUtils.getSplitColumn(table); // 源表没有主键会走到这里,并抛出异常
final List<ChunkRange> chunks;
try {
chunks = splitTableIntoChunks(jdbc, tableId, splitColumn);
} catch (SQLException e) {
throw new FlinkRuntimeException("Failed to split chunks for table " + tableId, e);
}
// convert chunks into splits
List<MySqlSnapshotSplit> splits = new ArrayList<>();
RowType splitType = ChunkUtils.getSplitType(splitColumn);
for (int i = 0; i < chunks.size(); i++) {
ChunkRange chunk = chunks.get(i);
MySqlSnapshotSplit split =
createSnapshotSplit(
jdbc,
tableId,
i,
splitType,
chunk.getChunkStart(),
chunk.getChunkEnd());
splits.add(split);
}
long end = System.currentTimeMillis();
LOG.info( // 成功会走到这里
"Split table {} into {} chunks, time cost: {}ms.",
tableId,
splits.size(),
end - start);
return splits;
} catch (Exception e) {
throw new FlinkRuntimeException( // 源表没有主键会走到这里
String.format("Generate Splits for table %s error", tableId), e);
}
}
成功时可看到的日志:
2023-03-13 22:10:39,704 INFO [47] [org.apache.hudi.common.table.HoodieTableConfig.<init>(HoodieTableConfig.java:295)] - Loading table properties from hdfs://test/user/root/warehouse/hudi_test.db/hudi_test_table/.hoodie/hoodie.properties
2023-03-13 22:10:40,005 INFO [36] [org.apache.hadoop.ipc.Client$Connection.handleConnectionFailure(Client.java:968)] - Retrying connect to server: qy-flink13-rm1.tianqiong.woa.com/11.149.49.26:18030. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1000 MILLISECONDS)
2023-03-13 22:10:40,110 INFO [620] [com.ververica.cdc.connectors.mysql.source.assigners.ChunkSplitter.calculateDistributionFactor(ChunkSplitter.java:300)] - The distribution factor of table test_db.test_table is 34301.9489 according to the min split key 1, max split key 33547306 and approximate row count 978
2023-03-13 22:10:40,111 INFO [620] [com.ververica.cdc.connectors.mysql.source.assigners.ChunkSplitter.splitUnevenlySizedChunks(ChunkSplitter.java:204)] - Use unevenly-sized chunks for table test_db.test_table, the chunk size is 8096
2023-03-13 22:10:40,145 INFO [47] [org.apache.hudi.common.table.HoodieTableMetaClient.<init>(HoodieTableMetaClient.java:144)] - Finished Loading Table of type MERGE_ON_READ(version=1, baseFileFormat=PARQUET) from hdfs://test/user/root/warehouse/hudi_test.db/hudi_test_table
2023-03-13 22:10:40,156 INFO [620] [com.ververica.cdc.connectors.mysql.source.assigners.ChunkSplitter.generateSplits(ChunkSplitter.java:107)] - Split table test_db.test_table into 1 chunks, time cost: 720ms.
2023-03-13 22:10:40,157 INFO [626] [io.debezium.jdbc.JdbcConnection.lambda$doClose$3(JdbcConnection.java:946)] - Connection gracefully closed