Flink CDC 实时同步 MySQL

系列文章

准备工作

  • MySQL 数据库(version: 5.7.25),注意,MySQL 数据库版本必须大于 5.6,否则不支持。

  • 开启 MySQL 的 log-bin:

    [mysqld]
    # Binary Logging.
    log-bin=mysql-bin
    server-id=1
    
  • Flink (version : 1.18.1)

  • 添加以下 jar 包到 flink/lib

    • flink-sql-connector-mysql-cdc.jar 负责 source
    • flink-connector-jdbc.jar 负责 sink
    • mysql-connector-java.jar sink 端所需的 jdbc 驱动

  • 准备待同步源端表

    源端表:cdc_test_source.player_source

    CREATE TABLE `player_source`  (
      `id` int(11),
      `name` varchar(255),
      PRIMARY KEY (`id`)
    );
    
  • 准备目标端表

    在目标端创建和源端同构的表:cdc_test_target.player_target

    CREATE TABLE `player_target`  (
      `id` int(11),
      `name` varchar(255),
      PRIMARY KEY (`id`)
    );
    

SQL-Client 实现数据同步

使用 Flink 的 sql-client 实现数据同步

./bin/start-cluster.sh

启动 sql-client

sudo ./bin/sql-client.sh

数据同步任务创建

  • 创建源端表对应的逻辑表,参照 MySQL 字段类型和 FlinkSQL 字段类型的映射关系

    CREATE TABLE source_dest (
        `id` INT	,
        `name` STRING,
        PRIMARY KEY (`id`) NOT ENFORCED
    ) WITH (
        'connector' = 'mysql-cdc',
        'hostname' = '10.4.45.207',
        'port' = '3306',
        'username' = 'username',
        'password' = 'password',
        'database-name' = 'cdc_test_source',
        'table-name' = 'player_source'
    );
    
  • 创建目标端表对应的逻辑表

    CREATE TABLE sink_dest (
        `id` INT,
        `name` STRING,
        PRIMARY KEY (`id`) NOT ENFORCED
    ) WITH (
        'connector' = 'jdbc',
        'url' = 'jdbc:mysql://10.4.45.207:3306/cdc_test_target',
        'username' = 'username',
        'password' = 'password',
        'table-name' = 'player_target',
        'sink.parallelism' = '1'
    );
    
  • 建立源端逻辑表和目标端逻辑表的连接

    INSERT INTO
        sink_dest (id, name)
    SELECT
        id,
        name
    FROM
        source_dest;
    
  • 任务创建成功:

  • Flink Web 查看提交的任务:

  • 源端表中进行更新、删除操作,查看目标端表是否自动完成同步

TableAPI 实现数据同步

引入 maven 依赖:

<properties>
    <scala.binary.version>2.12</scala.binary.version>
    <flink.version>1.15.4</flink.version>
</properties>

<dependencies>
    <dependency>
        <groupId>org.apache.flink</groupId>
        <artifactId>flink-table-runtime</artifactId>
        <version>${flink.version}</version>
    </dependency>
    <dependency>
    	<groupId>org.apache.flink</groupId>
	  	<artifactId>flink-table-planner-loader</artifactId>
	  	<version>${flink.version}</version>
	</dependency>
	<dependency>
        <groupId>org.apache.flink</groupId>
        <artifactId>flink-runtime-web</artifactId>
        <version>${flink.version}</version>
    </dependency>
    <dependency>
		<groupId>org.apache.flink</groupId>
	    <artifactId>flink-connector-base</artifactId>
	    <version>${flink.version}</version>
	</dependency>
    <dependency>
    	<groupId>org.apache.flink</groupId>
	    <artifactId>flink-connector-jdbc</artifactId>
	    <version>${flink.version}</version>
	</dependency>
	<dependency>
        <groupId>com.ververica</groupId>
        <artifactId>flink-sql-connector-mysql-cdc</artifactId>
        <version>2.4.0</version>
    </dependency>
<dependencies>

程序实现:

import org.apache.flink.configuration.Configuration;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.table.api.StatementSet;
import org.apache.flink.table.api.bridge.java.StreamTableEnvironment;

public class FlinkSQL {

	public static void main(String[] args) throws Exception {
        Configuration configuration = new Configuration();
        configuration.setInteger("rest.port", 9091);
        // configuration.setString("execution.checkpointing.interval", "3min");
        StreamExecutionEnvironment env = StreamExecutionEnvironment
                .createLocalEnvironmentWithWebUI(configuration);
        StreamTableEnvironment tableEnv = StreamTableEnvironment.create(env);

        String createSourceTableSQL = "CREATE TABLE source_dest (" +
        			"`id` INT," +
        			"`name` STRING," +
        			"PRIMARY KEY (`id`) NOT ENFORCED" +
                ") WITH (" +
                	"'connector' = 'mysql-cdc'," +
                	"'hostname' = '10.4.45.207'," +
                	"'username' = 'username'," +
                	"'password' = 'password'," +
                	"'database-name' = 'cdc_test_source'," +
                	"'table-name' = 'player_source'," +
                	"'scan.startup.mode' = 'latest-offset'" +
                ");";
        tableEnv.executeSql(createSourceTableSQL);

        String createSinkTableSQL = "CREATE TABLE sink_dest (" +
        			"`id` INT," +
        			"`name` STRING," +
        			"PRIMARY KEY (`id`) NOT ENFORCED" +
                ") WITH (" +
                	"'connector' = 'jdbc'," +
                	"'url' = 'jdbc:mysql://10.4.45.207:3306/cdc_test_target'," +
                	"'username' = 'username'," +
                	"'password' = 'password'," +
                	"'table-name' = 'player_target'" +
                ");";
        tableEnv.executeSql(createSinkTableSQL);

        String insertSQL = "INSERT INTO sink_dest SELECT * FROM source_dest;";
        StatementSet statementSet = tableEnv.createStatementSet();
        statementSet.addInsertSql(insertSQL);
        statementSet.execute();
    }
}

我们在 Configuration 中设置了 rest.port = 9091, 程序启动成功后,可以在浏览器打开 localhost:9091 看到提交运行的任务。

scan.startup.mode 可取值为:

initial:当没有指定 `scan.startup.mode` 时,默认取值为 `initial`,官网对 initial 的 [说明原文](https://nightlies.apache.org/flink/flink-cdc-docs-master/zh/docs/connectors/cdc-connectors/mysql-cdc/#startup-reading-position) 是 `Performs an initial snapshot on the monitored database tables upon first startup, and continue to read the latest binlog.`
earliest-offset:可访问的最早的 binlog 偏移量。
latest-offset:从 binlog 的末尾位置开始读取,这意味着只接收 connector 启动之后的变更事件流。
specific-offset:跳过快照阶段,从特定偏移量开始读取 binlog 事件。该偏移量可以使用二进制日志文件名和位置指定。
timestamp:从一个指定的时间戳位置开始读取 binlog,时间的设定很方便,但是根据 timestamp 定位到具体的 offset 需要经过一点儿时间。

遇到的问题

执行 FlinkSQL 报错

[ERROR] Could not execute SQL statement. Reason:
org.apache.flink.table.api.ValidationException: 
Could not find any factory for identifier 'mysql-cdc' that 
implements 'org.apache.flink.table.factories.DynamicTableFactory' in the classpath.

Available factory identifiers are:

blackhole
datagen
filesystem
jdbc
oracle-cdc
print
python-input-format

错误原因

不识别源端定义的 mysql-cdc ,缺少 `flink-sql-connector-mysql-cdc.jar。

解决方案

添加 flink-connector-jdbc.jarflink-sql-connector-mysql-cdc.jar 重启后解决。

MySQL 开启 bin-log 报错

[ERROR] You have enabled the binary log, but you haven’t provided the mandatory server-id. 
Please refer to the proper server start-up parameters documentation 
2016-09-03T03:17:51.815890Z 0 [ERROR] Aborting

报错原因

在设置 bin-log 日志的时候,没有设置 server_id 参数。server-id 参数用于在复制中,为主库和备库提供一个独立的 ID,以区分主库和备库;

开启二进制文件的时候,需要设置这个参数。

解决方案

修改 MySQL 配置文件 my.ini (windows) / my.cnf (linux)

[mysqld]
# Binary Logging.
log-bin=mysql-bin
server-id=1

重启 MySQL 服务。

posted @ 2024-04-16 07:06  watermark's  阅读(329)  评论(0编辑  收藏  举报