[Flink] Flink CDC FAQ
Flink CDC Connectors FAQ
近期遇到 Flink CDC 的问题较多,故基于第1篇参考文献的FAQ文档基础之上,对这些问题做个系统的总结。
MYSQL CDC
Q:作业报错 ConnectException: A slave with the same server_uuid/server_id as this slave has connected to the master
,怎么办呢?
Flink CDC 官方FAQ:
- 出现这种错误是 作业里使用的 server id 和其他作业或其他同步工具使用的server id 冲突了,server id 需要全局唯一,server id 是一个int类型整数。
- 在 CDC 2.x 版本中,source 的每个并发都需要一个server id,建议合理规划好server id,比如作业的 source 设置成了四个并发,可以配置 'serverid' = '5001-5004', 这样每个 source task 就不会冲突了。
推荐文献
flink version:flink-1.13.5
;cdc version:2.1.1
- 关键错误日志
org.apache.flink.runtime.JobException: Recovery is suppressed by FixedDelayRestartBackoffTimeStrategy(maxNumberRestartAttempts=3, backoffTimeMS=10000)
Caused by: com.ververica.cdc.connectors.shaded.org.apache.kafka.connect.errors.ConnectException: An exception occurred in the change event producer. This connector will be stopped.
Caused by: io.debezium.DebeziumException: A slave with the same server_uuid/server_id as this slave has connected to the master; the first event '' at 4, the last event read from '/data/mysql/storage/logs/bin_log/bin.001086' at 426321679, the last byte read from '/data/mysql/storage/logs/bin_log/bin.001086' at 426321679. Error code: 1236; SQLSTATE: HY000.
Caused by: com.github.shyiko.mysql.binlog.network.ServerException: A slave with the same server_uuid/server_id as this slave has connected to the master; the first event '' at 4, the last event read from '/data/mysql/storage/logs/bin_log/bin.001086' at 426321679, the last byte read from '/data/mysql/storage/logs/bin_log/bin.001086' at 426321679
- 原因分析
flink cdc是基于debezium实现的mysql实时同步,debezium是以slave server的方式去读取mysql的binlog日志。
默认情况下,系统会自动生成一个介于 5400 和 6400 之间的随机数,作为debezium这个客户端的server-id,
而这个id在mysql cluster中必须是唯一的,报这个错说明是有重复的server-id了,
建议你显示的配上这个参数“server-id”,可以配置成一个数字或者一个范围。
另外当 scan.incremental.snapshot.enabled 设置为true时(默认为true),则建议设置为范围,因为增量读取快照时,source是可以并行执行的,
这些并行的客户端也必须有着唯一的server-id,增量读取快照的并行度由参数“parallelism.default”控制,而且server-id设置的范围必须要大于并行度。
详情参考:
https://ververica.github.io/flink-cdc-connectors/master/content/connectors/mysql-cdc.html#connector-options
https://nightlies.apache.org/flink/flink-cdc-docs-release-3.1/docs/faq/faq/
配置页里关于 server-id 和 scan.incremental.snapshot.enabled 的解释
- 关键日志
com.github.shyiko.mysql.binlog.network.ServerException: A slave with the same server_uuid/server_id as this slave has connected to the master。
- 解决办法:
目前已经优化增加随机生成 serverid,之前的任务中如果在 mysql 高级参数中显示指定了 server-id 建议删除,因为可能多个任务使用了相同的数据源,并且 server-id 设置的相同导致冲突。
- [数据库] 浅谈mysql的serverId/serverUuid - 博客园/千千寰宇 【推荐】
- mysql主从复制错误:A slave with the same server_uuid/server_id as this slave has connected to the master; - 博客园
- 报错:A slave with the same server_uuid/server_id as this slave has connected to the master - 阿里云
Q:作业报错 The connector is trying to read binlog starting at GTIDs xxx and binlog file 'binlog.000064', pos=89887992, skipping 4 events plus 1 rows, but this is no longer available on the server. Reconfigure the connector to use a snapshot when needed,怎么办呢?
Flink CDC 官方FAQ:
出现这种错误是:
- 情况1:作业正在读取的binlog文件在 MySQL 服务器已经被清理掉,这种情况一般是 MySQL 服务器上保留的 binlog 文件过期时间太短,可
以将该值设置大一点,比如7天。
mysql> show variables like 'expire_logs_days';
mysql> set global expire_logs_days=7;
- 情况2: flink cdc 作业消费binlog 太慢,这种一般分配足够的资源即可。
推荐文献
- 关键日志
Caused by: org.apache.kafka.connect.errors.ConnectException: The connector is trying to read binlog starting at GTIDs xxx and binlog file 'binlog.xxx', pos=xxx, skipping 4 events plus 1 rows, but this is no longer available on the server.
Reconfigure the connector to use a snapshot when needed。
- 错误原因:
作业正在读取的 binlog 文件在 MySQL 服务器已经被清理时,会产生报错。导致 Binlog 清理的原因较多,可能是 Binlog 保留时间设置的过短;或者作业处理的速度追不上 Binlog 产生的速度,超过了 MySQL Binlog 文件的最大保留时间,MySQL 服务器上的 Binlog 文件被清理,导致正在读的 Binlog 位点变得无效。
- 解决办法:
如果作业处理速度无法追上 Binlog 产生速度,可以考虑增加 Binlog 的保留时间也可以优化作业减轻反压来加速 source 消费。如果作业状态没有异常,可能是数据库发生了其他操作导致 Binlog 被清理,从而无法访问,需要结合 MySQL 数据库侧的信息来确定Binlog被清理的原因。
Q:作业报错 The slave is connecting using CHANGE MASTER TO MASTER_AUTO_POSITION = 1, but the master has purged binary logs containing GTIDs that the slave requires. 怎么办呢 ?
- Flink CDC 官方FAQ:
出现这个问题的原因是的作业全量阶段读取太慢,在全量阶段读完后,之前记录的全量阶段开始时的 gtid 位点已经被 mysql 清理掉了。这种可以增大 mysql 服务器上 binlog 文件的保存时间,也可以调大 source 的并发,让全量阶段读取更快。
Q: mysql cdc支持监听从库吗?从库需要如何配置?
- 支持的,从库需要配置 log-slave-updates = 1 使从实例也能将从主实例同步的数据写入从库的 binlog 文件中,如果主库开启了gtid mode,从库也需要开启。
log-slave-updates = 1
gtid_mode = on
enforce_gtid_consistency = on
Q:作业报错 ConnectException: Received DML ‘…’ for processing, binlog probably contains events generated with statement or mixed based replication format,怎么办呢?
出现这种错误是 MySQL 服务器配置不对,需要检查下 binlog_format 是不是 ROW? 可以通过下面的命令查看
mysql> show variables like '%binlog_format%';
Q: Flink CDC (mysql-cdc) 启动失败,JobManager日志报: FlinkRuntimeException: Failed to discovery tables to capture
/ IllegalArgumentException: Can't find any matched tables, please check your configured database-name: [xxx_devicecenter] and table-name: [xxx_device]
- 问题描述
Flink CDC MYSQL Job (2.4.0 | mysql-cdc)启动失败,JobManager日志报:
FlinkRuntimeException: Failed to discovery tables to capture
/IllegalArgumentException: Can't find any matched tables, please check your configured database-name: [xxx_devicecenter] and table-name: [xxx_device]
org.apache.flink.util.FlinkException: Global failure triggered by OperatorCoordinator for 'Source: CDC From DimDevice' (operator bc764cd8ddf7a0cff126f51c16239658).
at org.apache.flink.runtime.operators.coordination.OperatorCoordinatorHolder$LazyInitializedCoordinatorContext.failJob(OperatorCoordinatorHolder.java:556) ~[flink-dist-1.15.0-h0.cbu.dli.330.r34.jar:1.15.0-h0.cbu.dli.330.r34]
at org.apache.flink.runtime.operators.coordination.RecreateOnResetOperatorCoordinator$QuiesceableContext.failJob(RecreateOnResetOperatorCoordinator.java:236) ~[flink-dist-1.15.0-h0.cbu.dli.330.r34.jar:1.15.0-h0.cbu.dli.330.r34]
at org.apache.flink.runtime.source.coordinator.SourceCoordinatorContext.failJob(SourceCoordinatorContext.java:321) ~[flink-dist-1.15.0-h0.cbu.dli.330.r34.jar:1.15.0-h0.cbu.dli.330.r34]
at org.apache.flink.runtime.source.coordinator.SourceCoordinator.lambda$runInEventLoop$9(SourceCoordinator.java:429) ~[flink-dist-1.15.0-h0.cbu.dli.330.r34.jar:1.15.0-h0.cbu.dli.330.r34]
at org.apache.flink.util.ThrowableCatchingRunnable.run(ThrowableCatchingRunnable.java:40) ~[flink-core-1.15.0-h0.cbu.dli.330.r34.jar:1.15.0-h0.cbu.dli.330.r34]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) ~[?:1.8.0_412]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[?:1.8.0_412]
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) ~[?:1.8.0_412]
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) ~[?:1.8.0_412]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[?:1.8.0_412]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[?:1.8.0_412]
at java.lang.Thread.run(Thread.java:750) ~[?:1.8.0_412]
Caused by: org.apache.flink.util.FlinkRuntimeException: Failed to discovery tables to capture //关键日志1
at com.ververica.cdc.connectors.mysql.source.assigners.MySqlSnapshotSplitAssigner.discoveryCaptureTables(MySqlSnapshotSplitAssigner.java:186) ~[flink-sql-connector-mysql-cdc-2.4.0-h0.cbu.mrs.330.r1-SNAPSHOT.jar:2.4.0-h0.cbu.mrs.330.r1-SNAPSHOT]
at com.ververica.cdc.connectors.mysql.source.assigners.MySqlSnapshotSplitAssigner.open(MySqlSnapshotSplitAssigner.java:171) ~[flink-sql-connector-mysql-cdc-2.4.0-h0.cbu.mrs.330.r1-SNAPSHOT.jar:2.4.0-h0.cbu.mrs.330.r1-SNAPSHOT]
at com.ververica.cdc.connectors.mysql.source.assigners.MySqlHybridSplitAssigner.open(MySqlHybridSplitAssigner.java:93) ~[flink-sql-connector-mysql-cdc-2.4.0-h0.cbu.mrs.330.r1-SNAPSHOT.jar:2.4.0-h0.cbu.mrs.330.r1-SNAPSHOT]
at com.ververica.cdc.connectors.mysql.source.enumerator.MySqlSourceEnumerator.start(MySqlSourceEnumerator.java:92) ~[flink-sql-connector-mysql-cdc-2.4.0-h0.cbu.mrs.330.r1-SNAPSHOT.jar:2.4.0-h0.cbu.mrs.330.r1-SNAPSHOT]
at org.apache.flink.runtime.source.coordinator.SourceCoordinator.lambda$start$1(SourceCoordinator.java:217) ~[flink-dist-1.15.0-h0.cbu.dli.330.r34.jar:1.15.0-h0.cbu.dli.330.r34]
at org.apache.flink.runtime.source.coordinator.SourceCoordinator.lambda$runInEventLoop$9(SourceCoordinator.java:415) ~[flink-dist-1.15.0-h0.cbu.dli.330.r34.jar:1.15.0-h0.cbu.dli.330.r34]
... 8 more
Caused by: java.lang.IllegalArgumentException: Can't find any matched tables, please check your configured database-name: [xxx_devicecenter] and table-name: [xxx_device] // 关键日志2
at com.ververica.cdc.connectors.mysql.debezium.DebeziumUtils.discoverCapturedTables(DebeziumUtils.java:196) ~[flink-sql-connector-mysql-cdc-2.4.0-h0.cbu.mrs.330.r1-SNAPSHOT.jar:2.4.0-h0.cbu.mrs.330.r1-SNAPSHOT]
at com.ververica.cdc.connectors.mysql.source.assigners.MySqlSnapshotSplitAssigner.discoveryCaptureTables(MySqlSnapshotSplitAssigner.java:182) ~[flink-sql-connector-mysql-cdc-2.4.0-h0.cbu.mrs.330.r1-SNAPSHOT.jar:2.4.0-h0.cbu.mrs.330.r1-SNAPSHOT]
at com.ververica.cdc.connectors.mysql.source.assigners.MySqlSnapshotSplitAssigner.open(MySqlSnapshotSplitAssigner.java:171) ~[flink-sql-connector-mysql-cdc-2.4.0-h0.cbu.mrs.330.r1-SNAPSHOT.jar:2.4.0-h0.cbu.mrs.330.r1-SNAPSHOT]
at com.ververica.cdc.connectors.mysql.source.assigners.MySqlHybridSplitAssigner.open(MySqlHybridSplitAssigner.java:93) ~[flink-sql-connector-mysql-cdc-2.4.0-h0.cbu.mrs.330.r1-SNAPSHOT.jar:2.4.0-h0.cbu.mrs.330.r1-SNAPSHOT]
at com.ververica.cdc.connectors.mysql.source.enumerator.MySqlSourceEnumerator.start(MySqlSourceEnumerator.java:92) ~[flink-sql-connector-mysql-cdc-2.4.0-h0.cbu.mrs.330.r1-SNAPSHOT.jar:2.4.0-h0.cbu.mrs.330.r1-SNAPSHOT]
at org.apache.flink.runtime.source.coordinator.SourceCoordinator.lambda$start$1(SourceCoordinator.java:217) ~[flink-dist-1.15.0-h0.cbu.dli.330.r34.jar:1.15.0-h0.cbu.dli.330.r34]
at org.apache.flink.runtime.source.coordinator.SourceCoordinator.lambda$runInEventLoop$9(SourceCoordinator.java:415) ~[flink-dist-1.15.0-h0.cbu.dli.330.r34.jar:1.15.0-h0.cbu.dli.330.r34]
... 8 more
- 问题分析
- 源码分析
package com.ververica.cdc.connectors.mysql.source.assigners;
...
import static com.ververica.cdc.connectors.mysql.debezium.DebeziumUtils.discoverCapturedTables;
...
class MySqlSnapshotSplitAssigner implements MySqlSplitAssigner {
@Override
public void open() {
chunkSplitter.open();
discoveryCaptureTables(); // 问题入口、问题行1
captureNewlyAddedTables();
startAsynchronouslySplit();
}
private void discoveryCaptureTables() {//问题行2 <-- 问题行1
// discovery the tables lazily
if (needToDiscoveryTables()) {
long start = System.currentTimeMillis();
LOG.debug("The remainingTables is empty, start to discovery tables");
try (JdbcConnection jdbc = openJdbcConnection(sourceConfig)) {
final List<TableId> discoverTables = discoverCapturedTables(jdbc, sourceConfig);// 问题行3 | 此处调用了 : com.ververica.cdc.connectors.mysql.debezium.DebeziumUtils.discoverCapturedTables;
this.remainingTables.addAll(discoverTables);
this.isTableIdCaseSensitive = DebeziumUtils.isTableIdCaseSensitive(jdbc);
} catch (Exception e) {
throw new FlinkRuntimeException("Failed to discovery tables to capture", e);//关键日志1
}
LOG.debug(
"Discovery tables success, time cost: {} ms.",
System.currentTimeMillis() - start);
}
// when restore the job from legacy savepoint, the legacy state may haven't snapshot
// remaining tables, discovery remaining table here
else if (!isRemainingTablesCheckpointed && !isSnapshotAssigningFinished(assignerStatus)) {
try (JdbcConnection jdbc = openJdbcConnection(sourceConfig)) {
final List<TableId> discoverTables = discoverCapturedTables(jdbc, sourceConfig);
discoverTables.removeAll(alreadyProcessedTables);
this.remainingTables.addAll(discoverTables);
this.isTableIdCaseSensitive = DebeziumUtils.isTableIdCaseSensitive(jdbc);
} catch (Exception e) {
throw new FlinkRuntimeException(
"Failed to discover remaining tables to capture", e);
}
}
}
}
package com.ververica.cdc.connectors.mysql.debezium;
...
import static com.ververica.cdc.connectors.mysql.source.utils.TableDiscoveryUtils.listTables;
...
public class DebeziumUtils {
public static List<TableId> discoverCapturedTables(//问题行4 <-- 问题行3
JdbcConnection jdbc, MySqlSourceConfig sourceConfig) {
final List<TableId> capturedTableIds;
try {
capturedTableIds = listTables(jdbc, sourceConfig.getTableFilters());//问题行5
} catch (SQLException e) {
throw new FlinkRuntimeException("Failed to discover captured tables", e);//关键日志1
}
if (capturedTableIds.isEmpty()) {//问题行6
throw new IllegalArgumentException(
String.format(
"Can't find any matched tables, please check your configured database-name: %s and table-name: %s",
sourceConfig.getDatabaseList(), sourceConfig.getTableList()));//关键日志2
}
return capturedTableIds;
}
}
package com.ververica.cdc.connectors.mysql.source.utils;
public class TableDiscoveryUtils {
...
public static List<TableId> listTables(JdbcConnection jdbc, RelationalTableFilters tableFilters)//问题行7 <-- 问题行5
throws SQLException {
final List<TableId> capturedTableIds = new ArrayList<>();
// -------------------
// READ DATABASE NAMES
// -------------------
// Get the list of databases ...
LOG.info("Read list of available databases");
final List<String> databaseNames = new ArrayList<>();
jdbc.query( //问题行8 | 查询指定数据库是否有对应表的获取权限 | SQL : SHOW FULL TABLES IN xxx_devicecenter where Table_Type = 'BASE TABLE'; -- xxx_device
"SHOW DATABASES",
rs -> {
while (rs.next()) {
databaseNames.add(rs.getString(1));
}
});
LOG.info("\t list of available databases is: {}", databaseNames);
// ----------------
// READ TABLE NAMES
// ----------------
// Get the list of table IDs for each database. We can't use a prepared statement with
// MySQL, so we have to build the SQL statement each time. Although in other cases this
// might lead to SQL injection, in our case we are reading the database names from the
// database and not taking them from the user ...
LOG.info("Read list of available tables in each database");
for (String dbName : databaseNames) {
try {
jdbc.query(
"SHOW FULL TABLES IN " + quote(dbName) + " where Table_Type = 'BASE TABLE'",
rs -> {
while (rs.next()) {
TableId tableId = new TableId(dbName, null, rs.getString(1));
if (tableFilters.dataCollectionFilter().isIncluded(tableId)) {
capturedTableIds.add(tableId);
LOG.info("\t including '{}' for further processing", tableId);
} else {
LOG.info("\t '{}' is filtered out of capturing", tableId);
}
}
});
} catch (SQLException e) {
// We were unable to execute the query or process the results, so skip this ...
LOG.warn(
"\t skipping database '{}' due to error reading tables: {}",
dbName,
e.getMessage());
}
}
return capturedTableIds;
}
...
}
- 原因分析(汇总)
- 数据库连接配置错误:请检查Flink CDC的配置文件中的数据库连接信息是否正确,包括主机名、端口号、用户名和密码等。
- Flink CDC版本不兼容:请确保你使用的Flink CDC版本与你的MySQL数据库版本兼容。如果不兼容,可以尝试升级或降级Flink CDC版本。
- 表名或数据库名拼写错误:请检查Flink CDC配置文件中指定的表名和数据库名是否正确,没有拼写错误。
- 表名、库名的大小写问题 | SQL
show variables like '%case%';
- 权限问题:请确保Flink CDC进程具有足够的权限访问指定的数据库和表。如果没有足够的权限,可以尝试使用具有足够权限的用户运行Flink CDC进程。
- 查询用户的BINLOG权限 | SQL :
SHOW GRANTS FOR 'xxx_cdc'@'%';
- 查询指定数据库是否有对应表的获取权限 | SQL :
SHOW FULL TABLES IN xxx_devicecenter where Table_Type = 'BASE TABLE'; -- xxx_device
- binlog文件损坏:如果binlog文件损坏,Flink CDC可能无法读取到正确的数据。可以尝试重新生成binlog文件或者从备份中恢复。
- 网络问题:请检查Flink CDC进程与MySQL数据库之间的网络连接是否正常。如果网络不稳定,可能会导致Flink CDC无法正常读取数据。
- Flink CDC配置问题:请检查Flink CDC的配置文件中的其他设置,例如过滤条件、转换逻辑等,确保它们正确无误。
- 本次问题的最终结论
经排查,环境搞混了,环境A的 Flink Job 对应配置文件的 mysql host 写成了 环境B的 mysql host
- 参考文献
show variables like '%case%';
默认在变量lower_case_table_names=0
的情况下,表名是严格区分大小写的,若查询时大小写弄混淆就会直接报错表不存在
Flink CDC里Can't find any matched tables, please check your configured database-name: [demo] and table-name: [test] flink 配置root用户 监控binlog 老是找不到该表 明明数据库中有这个表啊为什么监控不到?
这个问题可能是由于以下原因导致的:
- 数据库连接配置错误
- Flink CDC版本不兼容
- 表名或数据库名拼写错误
- 权限问题
- binlog文件损坏
- 网络问题
- Flink CDC配置问题
关于本问题的更多回答可点击原文查看:https://developer.aliyun.com/ask/590881
X 参考文献
- Flink CDC 官网
- 数据集成实时同步常见问题 - 腾讯云 【推荐】TODO
本文链接: https://www.cnblogs.com/johnnyzen
关于博文:评论和私信会在第一时间回复,或直接私信我。
版权声明:本博客所有文章除特别声明外,均采用 BY-NC-SA 许可协议。转载请注明出处!
日常交流:大数据与软件开发-QQ交流群: 774386015 【入群二维码】参见左下角。您的支持、鼓励是博主技术写作的重要动力!