debezium 使用踩坑
在已经启动后的连接器配置中table.include.list 添加了一张已有数据的表,如何为该表做snapshot
> 开发环境 debezium版本是1.3.final
如题,这里要介绍一个参数 “snapshot.new.tables” ,这个参数有点神奇,是被官方雪藏起来的,官方issue给的解释是 https://issues.redhat.com/browse/DBZ-1977
示例如下:
{
"connector.class": "io.debezium.connector.mysql.MySqlConnector",
"database.user": "debezium_mysql",
"tasks.max": "1",
"database.history.kafka.bootstrap.servers": "cdh04:9092,cdh05:9092,cdh06:9092",
"database.history.kafka.topic": "ninth_studio_connector_history",
"database.history.kafka.recovery.poll.interval.ms": "5000",
"database.server.name": "ninth_studio_connector",
"database.port": "3306",
"tombstones.on.delete": "false",
"snapshot.new.tables":"parallel",
"database.hostname": "common.mysql.test.local",
"database.password": "********",
"database.serverTimezone":"UTC" ,
"table.include.list": "ninth_studio.wehub_action_logs,ninth_studio.ab_py_assistant_binds",
"database.include.list": "ninth_studio"
}
ps:1.3版本在配置了snapshot.mode = when_needed , snapshot.include.collection.list 后,偶尔会出现监听不到数据的情况,要更新配置之后(随便更新什么配置,主要是为了让connect重启,单纯的使用rest api 重启不会起作用)才能继续读取数据
时区问题
设置参考连接: https://my.oschina.net/dacoolbaby/blog/3096451
这位大佬分享了很多debezium的坑点,有很多可以借鉴的地方
decimal数据类型转换
参考链接: https://blog.csdn.net/u012551524/article/details/83546765
默认precise会将其转为“F3A=”,设置为double则可以正常显示了
binlog文件失效导致抛异常
异常如下:
Connector requires binlog file 'mysql-bin.000003', but MySQL only has mysql-bin.000041, mysql-bin.000042 (io.debezium.connector.mysql.MySqlConnectorTask:330)
[2018-10-03 12:48:43,752] INFO Stopping MySQL connector task (io.debezium.connector.mysql.MySqlConnectorTask:245)
[2018-10-03 12:48:43,752] INFO WorkerSourceTask{id=debezium-mysql-0} Committing offsets (org.apache.kafka.connect.runtime.WorkerSourceTask:397)
[2018-10-03 12:48:43,752] INFO WorkerSourceTask{id=debezium-mysql-0} flushing 0 outstanding messages for offset commit (org.apache.kafka.connect.runtime.WorkerSourceTask:414)
[2018-10-03 12:48:43,753] ERROR WorkerSourceTask{id=debezium-mysql-0} Task threw an uncaught and unrecoverable exception (org.apache.kafka.connect.runtime.WorkerTask:177)
org.apache.kafka.connect.errors.ConnectException: The connector is trying to read binlog starting at binlog file 'mysql-bin.000003', pos=715349422, skipping 982 events plus 40 rows, but this is no longer available on the server. Reconfigure the connector to use a snapshot when needed.
[2018-10-03 12:48:43,752] INFO Stopping MySQL connector task (io.debezium.connector.mysql.MySqlConnectorTask:245)
[2018-10-03 12:48:43,752] INFO WorkerSourceTask{id=debezium-mysql-0} Committing offsets (org.apache.kafka.connect.runtime.WorkerSourceTask:397)
[2018-10-03 12:48:43,752] INFO WorkerSourceTask{id=debezium-mysql-0} flushing 0 outstanding messages for offset commit (org.apache.kafka.connect.runtime.WorkerSourceTask:414)
[2018-10-03 12:48:43,753] ERROR WorkerSourceTask{id=debezium-mysql-0} Task threw an uncaught and unrecoverable exception (org.apache.kafka.connect.runtime.WorkerTask:177)
org.apache.kafka.connect.errors.ConnectException: The connector is trying to read binlog starting at binlog file 'mysql-bin.000003', pos=715349422, skipping 982 events plus 40 rows, but this is no longer available on the server. Reconfigure the connector to use a snapshot when needed.
这种时候最简单的方式是将 "snapshot.mode"设置为when_needed,如果想从根本上避免,可以考虑将mysql的binlog失效时间
expire_logs_days
调大一点ps: 有时并不会直接抛出异常,而是会在日志里不断打印 `Couldn't commit processed log positions with the source database due to a concurrent connector shutdown or restart`
这时需要尝试重启connect 服务,然后再去查看任务运行状态