Canal实战

疑惑点

本教程使用版本

  • Canal版本:1.1.5
  • Canal-Adapter版本:1.1.5
  • Eleasticsearch版本:7.10.2
  • MySQL版本:5.7.32

背景


MySQL表数据达到百万级后,复杂的查询导致响应时间很慢,计划用ES充当二级缓存。

整体架构如下:

整体架构

canal可以通过监听MySQL的binlog日志来实现数据库增量的日志解析,再通过ClientAdapter把数据实时同步给ES。

Canal工作流程图:

Canal工作流程图

Canal工作原理:

  • canal模拟MySQL slave的交互协议,伪装自己为MySQL slave,向MySQL master发送dump协议;
  • MySQL master收到dump请求,开始推送binary log给slave(即canal);
  • canal解析binary log对象(原始为byte流)。

操作流程


  1. 准备工作:
    1. MySQL:存放原始数据;
    2. canal:解析数据库日志,同步获取到的增量变更;
    3. canal-adater:解析转化数据到ES;
    4. ES:接收增量数据;
  2. 在ES中创建索引
    1. 要求Mapping中定义的字段名称和类型与待同步数据保持一致;
  3. 安装JDK:Canal基于Java开发,且版本≥1.8.0
  4. 安装并启动canal-server
    1. 安装canal-server,然后修改配置文件关联MySQL。canal-server模拟MySQL集群的一个slave,获取MySQL集群Master节点的二进制日志(binary log),并将日志推送到Canal-adapter。
  5. 安装并启动canal-adapter
    1. 安装canal-adapter,然后修改配置文件关联MySQL和ES,以及定义MySQL数据到ES数据的映射字段,用来将数据同步到ES。
  6. 验证增量数据同步

配置MySQL

查看MySQL是否开启了binlog:

SHOW VARIABLES LIKE 'log_bin';

需要先开启Binlog写入功能,配置binlog-format为ROW模式,my.cnf配置如下:

[mysqld]
log-bin=mysql-bin # 开启 binlog
binlog-format=ROW # 选择 ROW 模式
server_id=1 # 配置 MySQL replaction 需要定义,不要和 canal 的 slaveId 重复

授权 canal 链接 MySQL 账号具有作为 MySQL slave 的权限, 如果已有账户可直接 grant

CREATE USER canal IDENTIFIED BY 'canal';  
GRANT SELECT, REPLICATION SLAVE, REPLICATION CLIENT ON *.* TO 'canal'@'%';
-- GRANT ALL PRIVILEGES ON *.* TO 'canal'@'%' ;
FLUSH PRIVILEGES;

Canal安装配置

  1. 下载Canal

https://github.com/alibaba/canal/releases/download/canal-1.1.5/canal.deployer-1.1.5.tar.gz

canal.deployer对应的是canal的server端,负责订阅并解析MySQL-Binlog。

解压后的文件目录

canal1.1.5
├── bin
│   │── restart.sh    # 重启
│   │── startup.sh    # 启动canal
│   │── stop.sh       # 停止canal
│   └── ...
├── conf
│   │── canal.properties  # 配置文件
│   │── example
│   │   ├── h2.mv.db
│   │   ├── instance.properties  # 实例配置文件
│   │   └── meta.dat
│   │── logback.xml
│   └── ...
├── lib
├── logs
│   ├── canal
│   │   ├── canal.log
│   │   └── canal_stdout.log
│   └── example
│       ├── meta.log
│       └── example.log
└── plugin
  1. 配置Canal

修改conf/example/instance.properties文件

# position info
canal.instance.master.address=127.0.0.1:3306    # 数据库地址
canal.instance.master.journal.name=
canal.instance.master.position=
canal.instance.master.timestamp=
canal.instance.master.gtid=

canal.instance.dbUsername=root                   # 连接数据库用户名
canal.instance.dbPassword=123456                 # 连接数据库密码
canal.instance.connectionCharset = UTF-8

canal.instance.filter.regex=blog.user
  1. 启动
./bin/startup.sh

Canal-adapter安装配置

  1. 下载Canal adapter

https://github.com/alibaba/canal/releases/download/canal-1.1.5/canal.adapter-1.1.5.tar.gz

解压后的文件目录

canal.adapter-1.1.5
├── bin
│   │── restart.sh    # 重启
│   │── startup.sh    # 启动canal
│   │── stop.sh       # 停止canal
│   └── ...
├── conf
│   │── application.yml  # 配置文件
│   │── bootstrap.yml
│   │── es7
│   │   ├── customer.yml
│   │   └── ...
│   └── ...
├── lib
├── logs
└── plugin
  1. 配置adapter

修改conf/application.yml文件

server:
  port: 8081
spring:
  jackson:
    date-format: yyyy-MM-dd HH:mm:ss
    time-zone: GMT+8
    default-property-inclusion: non_null

canal.conf:
  mode: tcp #tcp kafka rocketMQ rabbitMQ
  flatMessage: true
  zookeeperHosts:
  syncBatchSize: 1000
  retries: 0
  timeout:
  accessKey:
  secretKey:
  consumerProperties:
  srcDataSources:
    defaultDS:
      url: jdbc:mysql://127.0.0.1:3306/blog?useUnicode=true
      username: root
      password: 123456
  canalAdapters:
    - instance: example # canal instance Name or mq topic name
      groups:
        - groupId: g1
          outerAdapters:
            - name: logger
            - name: es7
              hosts: http://127.0.0.1:9200,http://10.192.8.204:9200,http://10.192.8.209:9200
              properties:
                mode: rest # or transport
                # security.auth: test:123456 #  only used for rest mode
                cluster.name: dev-cluster

删除conf/es7/下的所有文件,并新建自己的yml文件。

dataSourceKey: defaultDS        # 源数据源的key, 对应上面配置的srcDataSources中的值
outerAdapterKey: exampleKey     # 对应application.yml中es配置的key 
destination: example            # cannal的instance或者MQ的topic
groupId:                        # 对应MQ模式下的groupId, 只会同步对应groupId的数据
esMapping:
  _index: mytest_user           # es 的索引名称
  _type: _doc                   # es 的type名称, es7下无需配置此项
  _id: _id                      # es 的_id, 如果不配置该项必须配置下面的pk项_id则会由es自动分配
#  pk: id                       # 如果不需要_id, 则需要指定一个属性为主键属性
  # sql映射
  sql: "select a.id as _id, a.name as _name, a.role_id as _role_id, b.role_name as _role_name,
        a.c_time as _c_time, c.labels as _labels from user a
        left join role b on b.id=a.role_id
        left join (select user_id, group_concat(label order by id desc separator ';') as labels from label
        group by user_id) c on c.user_id=a.id"
#  objFields:
#    _labels: array:;           # 数组或者对象属性, array:; 代表以;字段里面是以;分隔的
#    _obj: object               # json对象
  etlCondition: "where a.c_time>='{0}'"     # etl 的条件参数
  commitBatch: 3000                         # 提交批大小
  1. 启动ES
./bin/startup.sh

启动错误信息

1. cannot be cast to com.alibaba.druid.pool.DruidDataSource


2022-02-28 14:52:57.684 [main] INFO  c.a.o.c.client.adapter.es.core.config.ESSyncConfigLoader - ## Start loading es mapping config ...
2022-02-28 14:52:57.734 [main] INFO  c.a.o.c.client.adapter.es.core.config.ESSyncConfigLoader - ## ES mapping config loaded
2022-02-28 14:52:58.037 [main] ERROR c.a.o.canal.adapter.launcher.loader.CanalAdapterLoader - Load canal adapter: es7 failed
java.lang.RuntimeException: java.lang.RuntimeException: java.lang.ClassCastException: com.alibaba.druid.pool.DruidDataSource cannot be cast to com.alibaba.druid.pool.DruidDataSource
        at com.alibaba.otter.canal.client.adapter.es7x.ES7xAdapter.init(ES7xAdapter.java:54) ~[client-adapter.es7x-1.1.5-jar-with-dependencies.jar:na]
        at com.alibaba.otter.canal.adapter.launcher.loader.CanalAdapterLoader.loadAdapter(CanalAdapterLoader.java:225) [client-adapter.launcher-1.1.5.jar:na]
        at com.alibaba.otter.canal.adapter.launcher.loader.CanalAdapterLoader.init(CanalAdapterLoader.java:56) [client-adapter.launcher-1.1.5.jar:na]
        ...
        at com.alibaba.otter.canal.adapter.launcher.CanalAdapterApplication.main(CanalAdapterApplication.java:19) ~[client-adapter.launcher-1.1.5.jar:na]
Caused by: java.lang.RuntimeException: java.lang.ClassCastException: com.alibaba.druid.pool.DruidDataSource cannot be cast to com.alibaba.druid.pool.DruidDataSource
        at com.alibaba.otter.canal.client.adapter.es.core.ESAdapter.init(ESAdapter.java:83) ~[client-adapter.es7x-1.1.5-jar-with-dependencies.jar:na]
        at com.alibaba.otter.canal.client.adapter.es7x.ES7xAdapter.init(ES7xAdapter.java:52) ~[client-adapter.es7x-1.1.5-jar-with-dependencies.jar:na]
        ... 42 common frames omitted
Caused by: java.lang.ClassCastException: com.alibaba.druid.pool.DruidDataSource cannot be cast to com.alibaba.druid.pool.DruidDataSource
        at com.alibaba.otter.canal.client.adapter.es.core.ESAdapter.addSyncConfigToCache(ESAdapter.java:146) ~[client-adapter.es7x-1.1.5-jar-with-dependencies.jar:na]
        at com.alibaba.otter.canal.client.adapter.es.core.ESAdapter.init(ESAdapter.java:75) ~[client-adapter.es7x-1.1.5-jar-with-dependencies.jar:na]
        ... 43 common frames omitted

在Github issues中找到#3717 解决方案

下载源码压缩包canal-canal-1.1.5.zip,解压后用IDE打开项目,修改canal-canal-1.1.5/client-adapter/escore/pom.xml依赖

<dependency>
    <groupId>com.alibaba</groupId>
    <artifactId>druid</artifactId>
    <scope>provided</scope>
</dependency>

然后在项目根目录下执行mvn clean package -Dmaven.test.skip=true,然后到canal-canal-1.1.5\client-adapter\es7x\target\目录下找到client-adapter.es7x-1.1.5-jar-with-dependencies.jar

将这个文件复制到./canal_adapter/plugin/下,重启adapter。

2. NullPointerException

2022-03-01 09:45:29.443 [main] INFO  c.a.o.canal.adapter.launcher.loader.CanalAdapterLoader - Load canal adapter: es7 succeed
2022-03-01 09:45:29.464 [main] INFO  c.alibaba.otter.canal.connector.core.spi.ExtensionLoader - extension classpath dir: /usr/local/canal/canal.adapter-1.1.5/plugin
2022-03-01 09:45:29.524 [main] ERROR c.a.o.canal.adapter.launcher.loader.CanalAdapterService - ## something goes wrong when starting up the canal client adapters:
java.lang.NullPointerException: null
        at com.google.common.base.Preconditions.checkNotNull(Preconditions.java:782) ~[guava-22.0.jar:na]
        ...
        at com.alibaba.otter.canal.common.zookeeper.ZkClientx.getZkClient(ZkClientx.java:28) ~[canal.common-1.1.5.jar:na]
        at com.alibaba.otter.canal.connector.tcp.consumer.CanalTCPConsumer.init(CanalTCPConsumer.java:57) ~[connector.tcp-1.1.5-jar-with-dependencies.jar:na]
        at com.alibaba.otter.canal.adapter.launcher.loader.AdapterProcessor.<init>(AdapterProcessor.java:74) ~[client-adapter.launcher-1.1.5.jar:na]
        at com.alibaba.otter.canal.adapter.launcher.loader.CanalAdapterLoader.lambda$init$0(CanalAdapterLoader.java:65) ~[client-adapter.launcher-1.1.5.jar:na]
        at java.util.HashMap.computeIfAbsent(HashMap.java:1127) ~[na:1.8.0_161]
        at com.alibaba.otter.canal.adapter.launcher.loader.CanalAdapterLoader.init(CanalAdapterLoader.java:60) ~[client-adapter.launcher-1.1.5.jar:na]
        at com.alibaba.otter.canal.adapter.launcher.loader.CanalAdapterService.init(CanalAdapterService.java:60) ~[client-adapter.launcher-1.1.5.jar:na]

Google了半天也没找到网上类似的问题,所以怀疑是自己哪里配置文件写错了。经过对比,发现是多删除了application.yml文件的配置信息。

3. Connection refused

2022-03-01 11:21:49.823 [Thread-4] INFO  c.a.otter.canal.adapter.launcher.loader.AdapterProcessor - =============> Start to connect destination: example <=============
2022-03-01 11:21:49.824 [Thread-4] ERROR c.a.otter.canal.adapter.launcher.loader.AdapterProcessor - process error!
com.alibaba.otter.canal.protocol.exception.CanalClientException: java.net.ConnectException: Connection refused
        at com.alibaba.otter.canal.client.impl.SimpleCanalConnector.doConnect(SimpleCanalConnector.java:198) ~[na:na]
        at com.alibaba.otter.canal.client.impl.SimpleCanalConnector.connect(SimpleCanalConnector.java:115) ~[na:na]
        at com.alibaba.otter.canal.connector.tcp.consumer.CanalTCPConsumer.connect(CanalTCPConsumer.java:63) ~[na:na]
        at com.alibaba.otter.canal.adapter.launcher.loader.AdapterProcessor.process(AdapterProcessor.java:184) ~[client-adapter.launcher-1.1.5.jar:na]
        at java.lang.Thread.run(Thread.java:748) ~[na:1.8.0_161]
Caused by: java.net.ConnectException: Connection refused
        at sun.nio.ch.Net.connect0(Native Method) ~[na:1.8.0_161]
        at sun.nio.ch.Net.connect(Net.java:454) ~[na:1.8.0_161]
        at sun.nio.ch.Net.connect(Net.java:446) ~[na:1.8.0_161]
        at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:648) ~[na:1.8.0_161]
        at com.alibaba.otter.canal.client.impl.SimpleCanalConnector.doConnect(SimpleCanalConnector.java:150) ~[na:na]

canal server中的canal.serverMode配置的是kafka,改为tcp后,一切正常。

解决思路

  1. 提示拒绝连接,要分析哪个连接被拒绝了;
  2. 查看服务是否开放了该端口。命令:netstat -lntp

Kafka模式

配置canal.serverModel为kafka模式后,需要修改application.yml文件和

canal.conf:
  mode: kafka #tcp kafka rocketMQ rabbitMQ
    kafka.bootstrap.servers: 10.192.1.5:9092,10.192.1.6:9092,10.192.1.7:9092,10.192.1.8:9092
  canalAdapters:
  - instance: topic_dev  # mq topic name

修改映射文件,如vim es7/user.yml

dataSourceKey: defaultDS
destination: topic_dev      # mq topic name
groupId: g1
esMapping: 

数据初始化导入

全量导入:

curl http://127.0.0.1:8081/etl/es7/crm_opportunity.yml -X POST

统计导入结果:

curl http://127.0.0.1:8081/count/es7/crm_opportunity.yml

参考资料

posted @ 2022-03-22 21:25  LukeBlog  阅读(949)  评论(0编辑  收藏  举报