debezium+kafka实现mysql数据同步(debezium-connector-mysql)

1.情景展示

在企业当中,往往会存在不同数据库之间的表的数据需要保持一致的情况(数据同步)。

如何将A库a表的数据同步至B库a表当中呢?(包含:新增、修改和删除)

往往不仅仅需要保持数据的一致性,还要保证数据的即时性,即:A库a表的数据发生变化后,B库a表也能立刻同步变化。

实时保持两表数据的一致性。

如何实现?

2.具体分析

要想及时知晓A库a表的数据变化,我们需要读取数据库的操作日志,从日志当中提取a表的操作日志。(至少要包含:新增、修改和删除)。

以mysql为例,就是读取mysql的binlog日志,而读取日志提取a表操作记录,我们可以通过组件来完成,无需自己手动解析日志。

3.解决方案

debezium-connector-mysql插件,可以很好的完成这个任务。

Debezium是一个开源的数据库事件捕捉和发布平台,旨在提供可靠的实时数据流。它基于分布式日志(如Apache Kafka)来捕获并传输数据库的变更事件,从而实现高效的数据同步和分发。通过使用Debezium的mysql连接器,可以轻松地将mysql数据库中的DML操作(包括INSERT、UPDATE、DELETE)的变更事件提取出来,并以实时的方式推送到Kafka消息队列中。

由于debezium-connector-mysql插件是结合kafka来实现的,我们自然需要用到kafka。

debezium-connector-mysql插件最终实现的效果是:监听mysql库的binlog日志,实时捕获mysql数据变更记录,并将变化的数据发布到kafka的主题当中

在开始之前,我们需要先了解一下kafka connect,通过它我们可以将其它系统与kakfa进行连接,完成主题的发布与订阅。

具体就是:通过该服务,我们可以使用REST API的方式调用kafka服务器来完成消息的发布与订阅。

更多关于kafka connect的用法,见文末推荐。

使用debezium官方提供的source connector,部署到apache kafka connect中,debezium的connector捕获到源数据库数据更新,发送到kafka集群中。

插件下载

地址1:CONFLUNT官网

https://www.confluent.io/hub/debezium/debezium-connector-mysql

2.21版具体地址:https://d1i4a15mxbxib1.cloudfront.net/api/plugins/debezium/debezium-connector-mysql/versions/2.2.1/debezium-debezium-connector-mysql-2.2.1.zip

插件源码地址:https://github.com/debezium/debezium

地址2:DEBEZIUM官网(推荐使用)

https://debezium.io/releases/

这个页面会展示当前debezium的最新版本,一般情况下,我们直接采用最新版就可以了。

以2.5版本为例,进行举例说明:

我们点击“More info”按钮,会跳转到此版本详情页:https://debezium.io/releases/2.5/

首先,映入眼帘的是:运行此插件所需的java版本,kafka版本,以及其所支持的数据库类型、版本号和驱动版本。

往下走,看到的是:Documentation

也就是说明文档,点击“Documentation”按钮,将会跳转到当前版本对应的说明文档页:https://debezium.io/documentation/reference/2.5/

然后找到:Getting Started-->点击"Installation",会跳转到插件安装界面:https://debezium.io/documentation/reference/2.5/install.html

debezium插件列表如下:

mysql插件:https://repo1.maven.org/maven2/io/debezium/debezium-connector-mysql/2.5.0.Final/debezium-connector-mysql-2.5.0.Final-plugin.tar.gz

oracle插件:https://repo1.maven.org/maven2/io/debezium/debezium-connector-oracle/2.5.0.Final/debezium-connector-oracle-2.5.0.Final-plugin.tar.gz

SQLserver插件:https://repo1.maven.org/maven2/io/debezium/debezium-connector-sqlserver/2.5.0.Final/debezium-connector-sqlserver-2.5.0.Final-plugin.tar.gz

jdbc插件:https://repo1.maven.org/maven2/io/debezium/debezium-connector-jdbc/$2.5.0.Final/debezium-connector-jdbc-2.5.0.Final-plugin.tar.gz

插件下载说明:

当你发现插件下载失败的时候,需要检查下载地址当中是否存在$,如果存在将其删掉,才是正确的地址。

如上面的jdbc插件,由于多了一个$,导致下载失败,我们把它去掉再下载就可以了:https://repo1.maven.org/maven2/io/debezium/debezium-connector-jdbc/2.5.0.Final/debezium-connector-jdbc-2.5.0.Final-plugin.tar.gz

插件用法参数说明

点击不同的数据库,将会跳转到对应的参数说明页。

mysql:https://debezium.io/documentation/reference/2.5/connectors/mysql.html

oracle:https://debezium.io/documentation/reference/2.5/connectors/oracle.html

jdbc:https://debezium.io/documentation/reference/2.5/connectors/jdbc.html

如何下载历史插件版本?

在说明文档页,我们点击切换说明文档的版本号,就能看到历史版本信息。

以2.0进行举例说明

如何下载2.0版的插件呢?

我们点击“2.0”,将会切换到2.0版的说明页:https://debezium.io/documentation/reference/2.0/index.html

我们点击"Mysql Connector plugin archive",将会自动下载debezium-mysql-2.0.1.Final,下载地址为:https://repo1.maven.org/maven2/io/debezium/debezium-connector-mysql/2.0.1.Final/debezium-connector-mysql-2.0.1.Final-plugin.tar.gz

插件安装

下载成功后,进行解压。

来到KAFKA_HOME目录下,创建一个plugins目录。

并将刚才解压的插件移到plugins目录下。

由于名称里面带有两个debezium,可以去掉一个。

参数说明

说明文档:https://docs.confluent.io/kafka-connectors/debezium-mysql-source/current/mysql_source_connector_config.html

https://access.redhat.com/documentation/zh-cn/red_hat_build_of_debezium/2.3.4/html/debezium_user_guide/deployment-of-debezium-mysql-connectors#mysql-required-connector-configuration-properties

https://debezium.io/documentation/reference/2.2/connectors/mysql.html

这里我只讲自己用的的参数,其余参数说明及用法,可通过上述说明文档自行查阅。

2.5.0版本参数说明

具体的数据结构,下面有。 

name属性:代表的是连接器的名称,该名称具有唯一性!(名字随便起,但必须唯一)。

名字最好能让人望文生义,如:debezium-connector-source-mysql-63,这一看就知道:创建的是Source Connector,用的插件是:debezium-connector-mysql。

connector.class属性:Source Connector的实现类,在这里我们需要填:io.debezium.connector.mysql.MySqlConnector(它是debezium-connector-mysql的源连接器)。

database.hostname属性:数据库服务器所属IP,如:127.0.0.1。

database.port属性:数据库端口号,如:3306。

database.user属性:数据库用户名,如:scott。

database.password属性:数据库密码,如:123456。

database.server.id属性:给数据库设置id,该值具有唯一性,默认取值范围:[5400,6400],数据类型:int。

database.connectionTimeZone属性:指定MySQL数据库的时区,默认值:UTC。需将此值设为东八区,即:GMT+8。

否则,会出现timestamp类型比实际时间早6个小时的问题。

database.include.list属性:要监控的数据库名称列表,可以同时监控多个数据库,但通常情况下,我们只监控一个数据库,如:test。

table.include.list属性:要捕获数据变更记录的表名称列表,可以同时监控多张表。构成:databaseName.tableName,多个使用逗号隔开,如:test.t_patient。

schema.history.internal.store.only.captured.tables.ddl属性:默认值:false,代表的含义是:debezium会将被监控数据库下所有的表的变更记录,都进行捕获。

而实际上,我们只需捕获table.include.list里面设置的表,而不是所有的表。需将此值设为:true。

schema.history.internal.store.only.captured.databases.ddl属性:默认值:false,代表的含义是:debezium会将被监控当前用户下属所有数据库的变更记录,都进行捕获。

而实际上,我们只需捕获database.include.list里面设置的数据库,而不是所有的数据库。需将此值设为:true。

column.include.list属性:可以设置只捕获表中的部分字段变更记录,可以监控多张表的部分字段变更记录。构成:databaseName.tableName.columnName,多个请使用逗号隔开,如:test.t_patient.create_time,test.t_patient.id。

schema.history.internal.kafka.topic属性:连接器将在其中存储数据库模式历史记录的Kafka主题的全名,如:schema-history-test-63。

schema.history.internal.kafka.bootstrap.servers属性:kafka服务器地址,如:localhost:9092。

event.processing.failure.handling.mode属性:默认值:error,表示的是:进程错误处理模式。可选值:[fail,error,warn,skip]。

我们选择warn,而不是直接报错,更容易发现问题。

errors.log.enable属性:默认值:false,是否输出错误日志。需将此值设为:true。

errors.log.include.messages属性:默认值:false,错误日志是否包含错误信息。需将此值设为:true。

topic.prefix属性:将要发布的主题的名称前缀,该值具有唯一性(kafka会根据此主题前缀来生成主题名称。消费者需要根据topic名称来订阅数据)。

snapshot.mode属性:快照模式,默认值:initial,可选值:[initial,initial_only,when_needed,never,schema_only,schema_only_recovery]。

initial(默认)(初始全量,后续增量):连接器执行数据库的初始一致性快照,快照完成后,连接器开始为后续数据库更改流式传输事件记录。

initial_only(只全量,不增量):连接器只执行数据库的初始一致性快照,不允许捕获任何后续更改的事件。

schema_only(只增量,不全量):连接器只捕获所有相关表的表结构,不捕获初始数据,但是会同步后续数据库的更改记录。 

skipped.operations属性:默认值:t,不需要监控的操作。可选值:c(insert/create),u(update),d(delete),t(truncate),none。

如果不需要跳过truncate操作,需将其值设为:none。

4.运行

准备工作

启动服务

启动zookeeper,启动kafka,启动kafka connect。

如果没有安装kafka或者不知道怎么启动,具体操作见文末推荐。

日志授权

binlog是Mysql sever层维护的一种二进制日志,与innodb引擎中的redo/undo log是完全不同的日志;

其主要是用来记录对mysql数据更新或潜在发生更新的SQL语句,并以“事务”的形式保存在磁盘中。

第一步:开启binlog

SHOW VARIABLES LIKE 'log_bin';

 确保binlog是开启状态,如果结果是"OFF",则表示binlog未启用。

 如果未启用,需运行以下命令:

SET GLOBAL log_bin = 'ON';

 可以通过修改mysql的配置文件来管理binlog的存储位置和格式等。

log_bin = /path/to/binlog/file.log
server_id = unique_server_id
expire_logs_days = 1
binlog_row_image = full
binlog_format = row

保存并关闭配置文件,如果修改了配置文件,需要重新启动MySQL服务以使更改生效。

sudo service mysql restart

第二步:授权

查看当前用户的权限:

SHOW GRANTS FOR CURRENT_USER;
SHOW GRANTS FOR '用户名'@'%';

确认要进行cdc的用户,是否有REPLICATION SLAVE和RELOAD/FLUSH_TABLES权限。

授予用户读取binlog的权限

GRANT REPLICATION SLAVE ON *.* TO 'username'@'hostname';

将 'username' 替换为你要授予权限的用户名,'hostname' 替换为该用户的主机名或IP地址。 

或者

GRANT REPLICATION SLAVE ON *.* TO 'username'@'%';

刷新权限使更改生效:

FLUSH PRIVILEGES;

 授予用户RELOAD权限或者FLUSH_TABLES权限

如果没有权限,在启动source connector后,报错信息如下:

GRANT RELOAD, FLUSH_TABLES ON *.* TO 'current_user'@'localhost';

然后,刷新权限使更改生效。

发布主题

接口地址:

http://localhost:8083/connectors

请求数据:

{
	"name" : "debezium-connector-source-mysql-63",
	"config" : {
		"connector.class" : "io.debezium.connector.mysql.MySqlConnector",
		"database.user" : "username",
		"database.server.id" : 63,
		"schema.history.internal.kafka.bootstrap.servers" : "localhost:9092",
		"column.include.list" : "test.t_patient.create_time,test.t_patient.id,test.t_patient.jz_time,test.t_patient.patient_id_no,test.t_patient.patient_name,test.t_patient.patient_sex,test.t_patient.update_time,test.t_patient.zy_time",
		"database.port" : "3306",
		"schema.history.internal.store.only.captured.tables.ddl" : true,
		"inconsistent.schema.handling.mode" : "warn",
		"topic.prefix" : "topic-63",
		"schema.history.internal.kafka.topic" : "schema-history-63",
		"database.hostname" : "192.168.0.1",
		"database.password" : "password",
		"table.include.list" : "test.t_patient",
		"database.include.list" : "test",
		"snapshot.mode" : "initial"
	}
}

响应数据:

{
	"name" : "debezium-connector-source-mysql-63",
	"type" : "source",
	"config" : {
		"connector.class" : "io.debezium.connector.mysql.MySqlConnector",
		"database.user" : "username",
		"database.server.id" : "63",
		"schema.history.internal.kafka.bootstrap.servers" : "localhost:9092",
		"column.include.list" : "test.t_patient.create_time,test.t_patient.id,test.t_patient.jz_time,test.t_patient.patient_id_no,test.t_patient.patient_name,test.t_patient.patient_sex,test.t_patient.update_time,test.t_patient.zy_time",
		"database.port" : "3306",
		"schema.history.internal.store.only.captured.tables.ddl" : "true",
		"inconsistent.schema.handling.mode" : "warn",
		"topic.prefix" : "topic-63",
		"schema.history.internal.kafka.topic" : "schema-history-63",
		"database.hostname" : "192.168.0.142",
		"database.password" : "password",
		"table.include.list" : "test.t_patient",
		"database.include.list" : "test",
		"snapshot.mode" : "initial",
		"name" : "debezium-connector-source-mysql-63"
	},
	"tasks" : [{
			"connector" : "debezium-connector-source-mysql-63",
			"task" : 0
		}
	]
}

主题发布成功后,debezium将会自动捕获mysql数据变更日志,并将数据推送到kafka当中。

说明:

如果出现一直没有出现类似上面的响应结果,很有可能是你在切换到kafka connect命令窗口时,误操作将其暂停了运行,按Enter键,命令就会继续执行了。

查看kafka connect命令窗口,是否出现报错信息。

只要没有出现报错信息,耐心等待就可以了(30分钟以内会有执行结果)。

查询主题

主题的生成条件说明:

debezium source connector启动之后,当指定监控的表数据发生变更时,它才会创建主题(也就是往kafka当中推送数据)。

 

如果处于监控中的表一直没有数据变更,那这个主题就一直不会产生。

具体就是:

当snapshot.mode=schema_only时,source connector启动后,不会立马创建主题,而是直到源表数据发生变化时,才会创建主题(也就是:debezium-mysql插件通过读取日志,监控到该表发生了变化,会自动将变化的数据推送到kafka上,这时,主题自然就生成了)。

当snapshot.mode=initial时,source connector启动后,如果源表已经有数据,它会立马创建主题;

反之,如果源表一条数据都没有,不会立马创建主题,直到源表有新增数据时,才会创建主题。 

mysql主题的生成规则:topic.prefix.databaseName.tableName。

查看已发布的主题

请求地址:

http://localhost:8083/connectors/debezium-connector-source-mysql-63/topics

响应数据: 

{"debezium-connector-source-mysql-63":{"topics":["topic-test-63.test.t_patient","topic-test-63"]}}

主题生成规则:

一张表对应一个主题。

这里t_patient表对应的主题是:topic-test-63.t_patient。

说明:

在kafka connect中,如果连接器不存在,调用获取对应topics接口信息,是不会报404的。这可以视为一个bug。

事实上,这个连接器已经不存在了。

消费数据

通过产生的主题消费数据。

切换到KAFKA_HOME\bin\windows目录下,输入cmd,按Enter键打开cmd窗口。

输入以下命令并执行

kafka-console-consumer.bat --bootstrap-server localhost:9092 --topic topic-63.数据库名.t_patient  --from-beginning

执行结果如下:

{"schema":{"type":"struct","fields":[{"type":"struct","fields":[{"type":"int64","optional":false,"field":"id"},{"type":"string","optional":true,"field":"patient_name"},{"type":"string","optional":true,"field":"patient_sex"},{"type":"string","optional":true,"field":"patient_id_no"},{"type":"int64","optional":true,"name":"io.debezium.time.Timestamp","version":1,"field":"create_time"},{"type":"int32","optional":true,"name":"io.debezium.time.Date","version":1,"field":"update_time"},{"type":"string","optional":true,"name":"io.debezium.time.ZonedTimestamp","version":1,"field":"jz_time"},{"type":"int64","optional":true,"name":"io.debezium.time.MicroTime","version":1,"field":"zy_time"}],"optional":true,"name":"topic-dbName-63.dbName.t_patient.Value","field":"before"},{"type":"struct","fields":[{"type":"int64","optional":false,"field":"id"},{"type":"string","optional":true,"field":"patient_name"},{"type":"string","optional":true,"field":"patient_sex"},{"type":"string","optional":true,"field":"patient_id_no"},{"type":"int64","optional":true,"name":"io.debezium.time.Timestamp","version":1,"field":"create_time"},{"type":"int32","optional":true,"name":"io.debezium.time.Date","version":1,"field":"update_time"},{"type":"string","optional":true,"name":"io.debezium.time.ZonedTimestamp","version":1,"field":"jz_time"},{"type":"int64","optional":true,"name":"io.debezium.time.MicroTime","version":1,"field":"zy_time"}],"optional":true,"name":"topic-dbName-63.dbName.t_patient.Value","field":"after"},{"type":"struct","fields":[{"type":"string","optional":false,"field":"version"},{"type":"string","optional":false,"field":"connector"},{"type":"string","optional":false,"field":"name"},{"type":"int64","optional":false,"field":"ts_ms"},{"type":"string","optional":true,"name":"io.debezium.data.Enum","version":1,"parameters":{"allowed":"true,last,false,incremental"},"default":"false","field":"snapshot"},{"type":"string","optional":false,"field":"db"},{"type":"string","optional":true,"field":"sequence"},{"type":"string","optional":true,"field":"table"},{"type":"int64","optional":false,"field":"server_id"},{"type":"string","optional":true,"field":"gtid"},{"type":"string","optional":false,"field":"file"},{"type":"int64","optional":false,"field":"pos"},{"type":"int32","optional":false,"field":"row"},{"type":"int64","optional":true,"field":"thread"},{"type":"string","optional":true,"field":"query"}],"optional":false,"name":"io.debezium.connector.mysql.Source","field":"source"},{"type":"string","optional":false,"field":"op"},{"type":"int64","optional":true,"field":"ts_ms"},{"type":"struct","fields":[{"type":"string","optional":false,"field":"id"},{"type":"int64","optional":false,"field":"total_order"},{"type":"int64","optional":false,"field":"data_collection_order"}],"optional":true,"name":"event.block","version":1,"field":"transaction"}],"optional":false,"name":"topic-dbName-63.dbName.t_patient.Envelope","version":1},"payload":{"before":null,"after":{"id":1,"patient_name":"Lily","patient_sex":"女","patient_id_no":"123123123","create_time":1701941356000,"update_time":19698,"jz_time":"2023-12-07T15:29:27Z","zy_time":34173000000},"source":{"version":"2.2.1.Final","connector":"mysql","name":"topic-dbName-63","ts_ms":1703893892000,"snapshot":"first","db":"dbName","sequence":null,"table":"t_patient","server_id":0,"gtid":null,"file":"binlog.003274","pos":140774141,"row":0,"thread":null,"query":null},"op":"r","ts_ms":1703843838705,"transaction":null}}
{"schema":{"type":"struct","fields":[{"type":"struct","fields":[{"type":"int64","optional":false,"field":"id"},{"type":"string","optional":true,"field":"patient_name"},{"type":"string","optional":true,"field":"patient_sex"},{"type":"string","optional":true,"field":"patient_id_no"},{"type":"int64","optional":true,"name":"io.debezium.time.Timestamp","version":1,"field":"create_time"},{"type":"int32","optional":true,"name":"io.debezium.time.Date","version":1,"field":"update_time"},{"type":"string","optional":true,"name":"io.debezium.time.ZonedTimestamp","version":1,"field":"jz_time"},{"type":"int64","optional":true,"name":"io.debezium.time.MicroTime","version":1,"field":"zy_time"}],"optional":true,"name":"topic-dbName-63.dbName.t_patient.Value","field":"before"},{"type":"struct","fields":[{"type":"int64","optional":false,"field":"id"},{"type":"string","optional":true,"field":"patient_name"},{"type":"string","optional":true,"field":"patient_sex"},{"type":"string","optional":true,"field":"patient_id_no"},{"type":"int64","optional":true,"name":"io.debezium.time.Timestamp","version":1,"field":"create_time"},{"type":"int32","optional":true,"name":"io.debezium.time.Date","version":1,"field":"update_time"},{"type":"string","optional":true,"name":"io.debezium.time.ZonedTimestamp","version":1,"field":"jz_time"},{"type":"int64","optional":true,"name":"io.debezium.time.MicroTime","version":1,"field":"zy_time"}],"optional":true,"name":"topic-dbName-63.dbName.t_patient.Value","field":"after"},{"type":"struct","fields":[{"type":"string","optional":false,"field":"version"},{"type":"string","optional":false,"field":"connector"},{"type":"string","optional":false,"field":"name"},{"type":"int64","optional":false,"field":"ts_ms"},{"type":"string","optional":true,"name":"io.debezium.data.Enum","version":1,"parameters":{"allowed":"true,last,false,incremental"},"default":"false","field":"snapshot"},{"type":"string","optional":false,"field":"db"},{"type":"string","optional":true,"field":"sequence"},{"type":"string","optional":true,"field":"table"},{"type":"int64","optional":false,"field":"server_id"},{"type":"string","optional":true,"field":"gtid"},{"type":"string","optional":false,"field":"file"},{"type":"int64","optional":false,"field":"pos"},{"type":"int32","optional":false,"field":"row"},{"type":"int64","optional":true,"field":"thread"},{"type":"string","optional":true,"field":"query"}],"optional":false,"name":"io.debezium.connector.mysql.Source","field":"source"},{"type":"string","optional":false,"field":"op"},{"type":"int64","optional":true,"field":"ts_ms"},{"type":"struct","fields":[{"type":"string","optional":false,"field":"id"},{"type":"int64","optional":false,"field":"total_order"},{"type":"int64","optional":false,"field":"data_collection_order"}],"optional":true,"name":"event.block","version":1,"field":"transaction"}],"optional":false,"name":"topic-dbName-63.dbName.t_patient.Envelope","version":1},"payload":{"before":null,"after":{"id":2,"patient_name":"anny","patient_sex":"男","patient_id_no":"123456789","create_time":1702027779000,"update_time":19699,"jz_time":"2023-12-08T15:29:50Z","zy_time":34194000000},"source":{"version":"2.2.1.Final","connector":"mysql","name":"topic-dbName-63","ts_ms":1703893892000,"snapshot":"true","db":"dbName","sequence":null,"table":"t_patient","server_id":0,"gtid":null,"file":"binlog.003274","pos":140774141,"row":0,"thread":null,"query":null},"op":"r","ts_ms":1703843838722,"transaction":null}}
{"schema":{"type":"struct","fields":[{"type":"struct","fields":[{"type":"int64","optional":false,"field":"id"},{"type":"string","optional":true,"field":"patient_name"},{"type":"string","optional":true,"field":"patient_sex"},{"type":"string","optional":true,"field":"patient_id_no"},{"type":"int64","optional":true,"name":"io.debezium.time.Timestamp","version":1,"field":"create_time"},{"type":"int32","optional":true,"name":"io.debezium.time.Date","version":1,"field":"update_time"},{"type":"string","optional":true,"name":"io.debezium.time.ZonedTimestamp","version":1,"field":"jz_time"},{"type":"int64","optional":true,"name":"io.debezium.time.MicroTime","version":1,"field":"zy_time"}],"optional":true,"name":"topic-dbName-63.dbName.t_patient.Value","field":"before"},{"type":"struct","fields":[{"type":"int64","optional":false,"field":"id"},{"type":"string","optional":true,"field":"patient_name"},{"type":"string","optional":true,"field":"patient_sex"},{"type":"string","optional":true,"field":"patient_id_no"},{"type":"int64","optional":true,"name":"io.debezium.time.Timestamp","version":1,"field":"create_time"},{"type":"int32","optional":true,"name":"io.debezium.time.Date","version":1,"field":"update_time"},{"type":"string","optional":true,"name":"io.debezium.time.ZonedTimestamp","version":1,"field":"jz_time"},{"type":"int64","optional":true,"name":"io.debezium.time.MicroTime","version":1,"field":"zy_time"}],"optional":true,"name":"topic-dbName-63.dbName.t_patient.Value","field":"after"},{"type":"struct","fields":[{"type":"string","optional":false,"field":"version"},{"type":"string","optional":false,"field":"connector"},{"type":"string","optional":false,"field":"name"},{"type":"int64","optional":false,"field":"ts_ms"},{"type":"string","optional":true,"name":"io.debezium.data.Enum","version":1,"parameters":{"allowed":"true,last,false,incremental"},"default":"false","field":"snapshot"},{"type":"string","optional":false,"field":"db"},{"type":"string","optional":true,"field":"sequence"},{"type":"string","optional":true,"field":"table"},{"type":"int64","optional":false,"field":"server_id"},{"type":"string","optional":true,"field":"gtid"},{"type":"string","optional":false,"field":"file"},{"type":"int64","optional":false,"field":"pos"},{"type":"int32","optional":false,"field":"row"},{"type":"int64","optional":true,"field":"thread"},{"type":"string","optional":true,"field":"query"}],"optional":false,"name":"io.debezium.connector.mysql.Source","field":"source"},{"type":"string","optional":false,"field":"op"},{"type":"int64","optional":true,"field":"ts_ms"},{"type":"struct","fields":[{"type":"string","optional":false,"field":"id"},{"type":"int64","optional":false,"field":"total_order"},{"type":"int64","optional":false,"field":"data_collection_order"}],"optional":true,"name":"event.block","version":1,"field":"transaction"}],"optional":false,"name":"topic-dbName-63.dbName.t_patient.Envelope","version":1},"payload":{"before":null,"after":{"id":3,"patient_name":"张三丰","patient_sex":"男","patient_id_no":"123987345","create_time":1702027860000,"update_time":19699,"jz_time":"2023-12-08T15:31:11Z","zy_time":34275000000},"source":{"version":"2.2.1.Final","connector":"mysql","name":"topic-dbName-63","ts_ms":1703893892000,"snapshot":"true","db":"dbName","sequence":null,"table":"t_patient","server_id":0,"gtid":null,"file":"binlog.003274","pos":140774141,"row":0,"thread":null,"query":null},"op":"r","ts_ms":1703843838723,"transaction":null}}
{"schema":{"type":"struct","fields":[{"type":"struct","fields":[{"type":"int64","optional":false,"field":"id"},{"type":"string","optional":true,"field":"patient_name"},{"type":"string","optional":true,"field":"patient_sex"},{"type":"string","optional":true,"field":"patient_id_no"},{"type":"int64","optional":true,"name":"io.debezium.time.Timestamp","version":1,"field":"create_time"},{"type":"int32","optional":true,"name":"io.debezium.time.Date","version":1,"field":"update_time"},{"type":"string","optional":true,"name":"io.debezium.time.ZonedTimestamp","version":1,"field":"jz_time"},{"type":"int64","optional":true,"name":"io.debezium.time.MicroTime","version":1,"field":"zy_time"}],"optional":true,"name":"topic-dbName-63.dbName.t_patient.Value","field":"before"},{"type":"struct","fields":[{"type":"int64","optional":false,"field":"id"},{"type":"string","optional":true,"field":"patient_name"},{"type":"string","optional":true,"field":"patient_sex"},{"type":"string","optional":true,"field":"patient_id_no"},{"type":"int64","optional":true,"name":"io.debezium.time.Timestamp","version":1,"field":"create_time"},{"type":"int32","optional":true,"name":"io.debezium.time.Date","version":1,"field":"update_time"},{"type":"string","optional":true,"name":"io.debezium.time.ZonedTimestamp","version":1,"field":"jz_time"},{"type":"int64","optional":true,"name":"io.debezium.time.MicroTime","version":1,"field":"zy_time"}],"optional":true,"name":"topic-dbName-63.dbName.t_patient.Value","field":"after"},{"type":"struct","fields":[{"type":"string","optional":false,"field":"version"},{"type":"string","optional":false,"field":"connector"},{"type":"string","optional":false,"field":"name"},{"type":"int64","optional":false,"field":"ts_ms"},{"type":"string","optional":true,"name":"io.debezium.data.Enum","version":1,"parameters":{"allowed":"true,last,false,incremental"},"default":"false","field":"snapshot"},{"type":"string","optional":false,"field":"db"},{"type":"string","optional":true,"field":"sequence"},{"type":"string","optional":true,"field":"table"},{"type":"int64","optional":false,"field":"server_id"},{"type":"string","optional":true,"field":"gtid"},{"type":"string","optional":false,"field":"file"},{"type":"int64","optional":false,"field":"pos"},{"type":"int32","optional":false,"field":"row"},{"type":"int64","optional":true,"field":"thread"},{"type":"string","optional":true,"field":"query"}],"optional":false,"name":"io.debezium.connector.mysql.Source","field":"source"},{"type":"string","optional":false,"field":"op"},{"type":"int64","optional":true,"field":"ts_ms"},{"type":"struct","fields":[{"type":"string","optional":false,"field":"id"},{"type":"int64","optional":false,"field":"total_order"},{"type":"int64","optional":false,"field":"data_collection_order"}],"optional":true,"name":"event.block","version":1,"field":"transaction"}],"optional":false,"name":"topic-dbName-63.dbName.t_patient.Envelope","version":1},"payload":{"before":null,"after":{"id":4,"patient_name":"李丽丽","patient_sex":"女","patient_id_no":"126736790","create_time":1702027934000,"update_time":19699,"jz_time":"2023-12-08T15:32:19Z","zy_time":34341000000},"source":{"version":"2.2.1.Final","connector":"mysql","name":"topic-dbName-63","ts_ms":1703893892000,"snapshot":"last","db":"dbName","sequence":null,"table":"t_patient","server_id":0,"gtid":null,"file":"binlog.003274","pos":140774141,"row":0,"thread":null,"query":null},"op":"r","ts_ms":1703843838723,"transaction":null}}

说明:

消费的数据,如果含有中文,中文会出现乱码,不用理会;(因为windows中文版,cmd的编码集默认为:GBK,而mysql数据库一般采用的是utf8mb4)。

消费的数据,如果含有数值类型,会被自动编码成base64格式,这个也不用管。(这是使用kafka-consume-consumer.bat造成的,它本身还是数值类型)

源库源表数据

5.同步数据

这样一来,我们就实现了数据同步的前半部分:抽取数据 push。 

示例:

{
	"name": "debezium-connector-source-mysql-122",
	"config": {
		"connector.class": "io.debezium.connector.mysql.MySqlConnector",
		"errors.log.include.messages": true,
		"database.user": "marydon",
		"database.server.id": 122,
		"schema.history.internal.kafka.bootstrap.servers": "localhost:9092",
		"event.processing.failure.handling.mode": "warn",
		"column.include.list": "test.tb_project.CREATE_BY,test.tb_project.CREATE_TIME,test.tb_project.ID,test.tb_project.PROJECT_BEGIN_TIME,test.tb_project.PROJECT_CODE,test.tb_project.PROJECT_COMPANY,test.tb_project.PROJECT_END_TIME,test.tb_project.PROJECT_MANAGER,test.tb_project.PROJECT_NAME,test.tb_project.PROJECT_SUBMIT_TIME,test.tb_project.UPDATE_BY,test.tb_project.UPDATE_TIME",
		"database.port": "3306",
		"schema.history.internal.store.only.captured.tables.ddl": true,
		"schema.history.internal.store.only.captured.databases.ddl": true,
		"topic.prefix": "topic-test-122",
		"schema.history.internal.kafka.topic": "schema-history-test-122",
		"database.hostname": "192.168.0.11",
		"database.connectionTimeZone": "GMT+8",
		"database.password": "marydon123",
		"table.include.list": "test.tb_project",
		"skipped.operations": "none",
		"errors.log.enable": true,
		"database.include.list": "test",
		"snapshot.mode": "schema_only"
	}
}

这样一来,我们就实现了数据同步的前半部分:抽取数据 push。 

示例:

至于将kafka中存储的数据更新到别的数据库(同步数据 pull),有三种实现方式:

第一种:使用插件io.confluent.connect.jdbc.JdbcSinkConnector;

第二种:使用插件io.debezium.connector.jdbc.JdbcSinkConnector(推荐使用)。

说明:

网上看到的几乎都是使用第一种方式来完成数据同步,这其实是不对的,也有使用第3种方式来完成数据同步的。

但,最好的还是使用第2种方式,因为它可以和io.debezium.connector.mysql.OracleConnector插件无缝对接,不需要进行额外的数据转换操作。

当我们使用主键字段进行数据同步时(通常情况下),目标表字段的主键的名称需要与源表的主键名称完全一致(大小写也必须保持一致),否则,使用JdbcSinkConnector会报错:

主键字段不存在。

第三种:自定义开发消费组件;

当然,官方的jdbc插件也不是万能的,例如:

源表表名与目标表表名不一致;

目标表字段B需要对应源表表字段A,而不是A-->A;

在进行数据同步时,目标表需要增加时间戳字段或者实现假删除等等个性化需求,官方插件就不能用了。

我们只能自己开发kafka connect组件了。

2024-08-29 15:09:22

6.相关问题

启动没有报错,但没有将源表数据上传至kafka

情形1:源表为空表(没有数据);

情形2:设置了增量同步;

snapshot.mode=schema_only

设置成了增量同步,只有当连接器启动成功后,源表产生新数据时才会将其上传至kafka当中。

如果源表一直没有产生新数据,自然在kafka中无法查到数据。

情形3:只设置了表名,没有设置库名。

"table.include.list": "base_news_info"

如果你去查看,就会发现该连接只有一个主题,而且该主题存的是表结构。

这是错误的,table.include.list的参数值格式为:数据库名称.表名。

"table.include.list": "test.base_news_info"

情形4:使用了source connector并不存在的参数。

启动报错

错误1

The db history topic is missing. You may attempt to recover it by reconfiguring the connector to SCHEMA_ONLY_RECOVERY。

请求参数

解决办法:

第一步:删除此连接器。

第二步:增加配置参数。

在原有参数的基础上,增加下面这个配置参数。

"snapshot.mode": "schema_only_recovery"

说明:

snapshot.mode的格式默认为initial,即:全量+增量同步。

第三步:重新创建连接器。

 

不出意外的话,就会启动成功。

第四步:查看连接器的主题。

我们可以看到:

此连接器的主题有且只有一个,而且该主题只是存储了表结构。

后面,当我将snapshot.mode的值设置为initial时,不知道为何,是按照增量同步的(应该是全量+增量同步才对),不知道是怎么回事。

7.补充说明

关于连接器主题的说明

前提:当要监控的表只有一张时,以下结论成立

table.include.list参数配置说明

只设置了表名,没有设置库名。

"table.include.list": "base_news_info"

如果你去查看,就会发现该连接器只有一个主题,而且该主题存的是表结构。

snapshot.mode参数配置说明

当snapshot.mode的值设置为schema_only_recovery时该连接器将不会生成主题信息

当snapshot.mode的值设置为schema_only时,该连接器至多会生成一个主题信息(表数据对应的主题)。

schema_only表示增量同步,只有在连接器启动后,源表后续产生的数据才会被连接器发布到kafka对应的主题当中。

这就会出现下面这种情况:

在连接器启动后,源表的数据并未发生变化,那么,该连接器将永远不会创建主题(主题信息为空)。

直到源表数据发生变化时(新增、修改或删除),才会生成源表对应的主题信息。

并且,此时,只有表数据对应的主题,没有表结构对应的主题

当snapshot.mode的值设置为initial时(默认值),连接器会生成两个主题信息:

一个是表结构对应的主题,一个是表数据对应的主题

posted @ 2023-12-29 18:08  Marydon  阅读(2385)  评论(0编辑  收藏  举报