Canal实战
疑惑点
本教程使用版本
- Canal版本:1.1.5
- Canal-Adapter版本:1.1.5
- Eleasticsearch版本:7.10.2
- MySQL版本:5.7.32
背景
MySQL表数据达到百万级后,复杂的查询导致响应时间很慢,计划用ES充当二级缓存。
整体架构如下:
canal可以通过监听MySQL的binlog日志来实现数据库增量的日志解析,再通过ClientAdapter把数据实时同步给ES。
Canal工作流程图:
Canal工作原理:
- canal模拟MySQL slave的交互协议,伪装自己为MySQL slave,向MySQL master发送dump协议;
- MySQL master收到dump请求,开始推送binary log给slave(即canal);
- canal解析binary log对象(原始为byte流)。
操作流程
- 准备工作:
- MySQL:存放原始数据;
- canal:解析数据库日志,同步获取到的增量变更;
- canal-adater:解析转化数据到ES;
- ES:接收增量数据;
- 在ES中创建索引
- 要求Mapping中定义的字段名称和类型与待同步数据保持一致;
- 安装JDK:Canal基于Java开发,且版本≥1.8.0
- 安装并启动canal-server
- 安装canal-server,然后修改配置文件关联MySQL。canal-server模拟MySQL集群的一个slave,获取MySQL集群Master节点的二进制日志(binary log),并将日志推送到Canal-adapter。
- 安装并启动canal-adapter
- 安装canal-adapter,然后修改配置文件关联MySQL和ES,以及定义MySQL数据到ES数据的映射字段,用来将数据同步到ES。
- 验证增量数据同步
配置MySQL
查看MySQL是否开启了binlog:
SHOW VARIABLES LIKE 'log_bin';
需要先开启Binlog写入功能,配置binlog-format为ROW模式,my.cnf配置如下:
[mysqld]
log-bin=mysql-bin # 开启 binlog
binlog-format=ROW # 选择 ROW 模式
server_id=1 # 配置 MySQL replaction 需要定义,不要和 canal 的 slaveId 重复
授权 canal 链接 MySQL 账号具有作为 MySQL slave 的权限, 如果已有账户可直接 grant
CREATE USER canal IDENTIFIED BY 'canal';
GRANT SELECT, REPLICATION SLAVE, REPLICATION CLIENT ON *.* TO 'canal'@'%';
-- GRANT ALL PRIVILEGES ON *.* TO 'canal'@'%' ;
FLUSH PRIVILEGES;
Canal安装配置
- 下载Canal
https://github.com/alibaba/canal/releases/download/canal-1.1.5/canal.deployer-1.1.5.tar.gz
canal.deployer对应的是canal的server端,负责订阅并解析MySQL-Binlog。
解压后的文件目录
canal1.1.5
├── bin
│ │── restart.sh # 重启
│ │── startup.sh # 启动canal
│ │── stop.sh # 停止canal
│ └── ...
├── conf
│ │── canal.properties # 配置文件
│ │── example
│ │ ├── h2.mv.db
│ │ ├── instance.properties # 实例配置文件
│ │ └── meta.dat
│ │── logback.xml
│ └── ...
├── lib
├── logs
│ ├── canal
│ │ ├── canal.log
│ │ └── canal_stdout.log
│ └── example
│ ├── meta.log
│ └── example.log
└── plugin
- 配置Canal
修改conf/example/instance.properties
文件
# position info
canal.instance.master.address=127.0.0.1:3306 # 数据库地址
canal.instance.master.journal.name=
canal.instance.master.position=
canal.instance.master.timestamp=
canal.instance.master.gtid=
canal.instance.dbUsername=root # 连接数据库用户名
canal.instance.dbPassword=123456 # 连接数据库密码
canal.instance.connectionCharset = UTF-8
canal.instance.filter.regex=blog.user
- 启动
./bin/startup.sh
Canal-adapter安装配置
- 下载Canal adapter
https://github.com/alibaba/canal/releases/download/canal-1.1.5/canal.adapter-1.1.5.tar.gz
解压后的文件目录
canal.adapter-1.1.5
├── bin
│ │── restart.sh # 重启
│ │── startup.sh # 启动canal
│ │── stop.sh # 停止canal
│ └── ...
├── conf
│ │── application.yml # 配置文件
│ │── bootstrap.yml
│ │── es7
│ │ ├── customer.yml
│ │ └── ...
│ └── ...
├── lib
├── logs
└── plugin
- 配置adapter
修改conf/application.yml
文件
server:
port: 8081
spring:
jackson:
date-format: yyyy-MM-dd HH:mm:ss
time-zone: GMT+8
default-property-inclusion: non_null
canal.conf:
mode: tcp #tcp kafka rocketMQ rabbitMQ
flatMessage: true
zookeeperHosts:
syncBatchSize: 1000
retries: 0
timeout:
accessKey:
secretKey:
consumerProperties:
srcDataSources:
defaultDS:
url: jdbc:mysql://127.0.0.1:3306/blog?useUnicode=true
username: root
password: 123456
canalAdapters:
- instance: example # canal instance Name or mq topic name
groups:
- groupId: g1
outerAdapters:
- name: logger
- name: es7
hosts: http://127.0.0.1:9200,http://10.192.8.204:9200,http://10.192.8.209:9200
properties:
mode: rest # or transport
# security.auth: test:123456 # only used for rest mode
cluster.name: dev-cluster
删除conf/es7/
下的所有文件,并新建自己的yml文件。
dataSourceKey: defaultDS # 源数据源的key, 对应上面配置的srcDataSources中的值
outerAdapterKey: exampleKey # 对应application.yml中es配置的key
destination: example # cannal的instance或者MQ的topic
groupId: # 对应MQ模式下的groupId, 只会同步对应groupId的数据
esMapping:
_index: mytest_user # es 的索引名称
_type: _doc # es 的type名称, es7下无需配置此项
_id: _id # es 的_id, 如果不配置该项必须配置下面的pk项_id则会由es自动分配
# pk: id # 如果不需要_id, 则需要指定一个属性为主键属性
# sql映射
sql: "select a.id as _id, a.name as _name, a.role_id as _role_id, b.role_name as _role_name,
a.c_time as _c_time, c.labels as _labels from user a
left join role b on b.id=a.role_id
left join (select user_id, group_concat(label order by id desc separator ';') as labels from label
group by user_id) c on c.user_id=a.id"
# objFields:
# _labels: array:; # 数组或者对象属性, array:; 代表以;字段里面是以;分隔的
# _obj: object # json对象
etlCondition: "where a.c_time>='{0}'" # etl 的条件参数
commitBatch: 3000 # 提交批大小
- 启动ES
./bin/startup.sh
启动错误信息
1. cannot be cast to com.alibaba.druid.pool.DruidDataSource
2022-02-28 14:52:57.684 [main] INFO c.a.o.c.client.adapter.es.core.config.ESSyncConfigLoader - ## Start loading es mapping config ...
2022-02-28 14:52:57.734 [main] INFO c.a.o.c.client.adapter.es.core.config.ESSyncConfigLoader - ## ES mapping config loaded
2022-02-28 14:52:58.037 [main] ERROR c.a.o.canal.adapter.launcher.loader.CanalAdapterLoader - Load canal adapter: es7 failed
java.lang.RuntimeException: java.lang.RuntimeException: java.lang.ClassCastException: com.alibaba.druid.pool.DruidDataSource cannot be cast to com.alibaba.druid.pool.DruidDataSource
at com.alibaba.otter.canal.client.adapter.es7x.ES7xAdapter.init(ES7xAdapter.java:54) ~[client-adapter.es7x-1.1.5-jar-with-dependencies.jar:na]
at com.alibaba.otter.canal.adapter.launcher.loader.CanalAdapterLoader.loadAdapter(CanalAdapterLoader.java:225) [client-adapter.launcher-1.1.5.jar:na]
at com.alibaba.otter.canal.adapter.launcher.loader.CanalAdapterLoader.init(CanalAdapterLoader.java:56) [client-adapter.launcher-1.1.5.jar:na]
...
at com.alibaba.otter.canal.adapter.launcher.CanalAdapterApplication.main(CanalAdapterApplication.java:19) ~[client-adapter.launcher-1.1.5.jar:na]
Caused by: java.lang.RuntimeException: java.lang.ClassCastException: com.alibaba.druid.pool.DruidDataSource cannot be cast to com.alibaba.druid.pool.DruidDataSource
at com.alibaba.otter.canal.client.adapter.es.core.ESAdapter.init(ESAdapter.java:83) ~[client-adapter.es7x-1.1.5-jar-with-dependencies.jar:na]
at com.alibaba.otter.canal.client.adapter.es7x.ES7xAdapter.init(ES7xAdapter.java:52) ~[client-adapter.es7x-1.1.5-jar-with-dependencies.jar:na]
... 42 common frames omitted
Caused by: java.lang.ClassCastException: com.alibaba.druid.pool.DruidDataSource cannot be cast to com.alibaba.druid.pool.DruidDataSource
at com.alibaba.otter.canal.client.adapter.es.core.ESAdapter.addSyncConfigToCache(ESAdapter.java:146) ~[client-adapter.es7x-1.1.5-jar-with-dependencies.jar:na]
at com.alibaba.otter.canal.client.adapter.es.core.ESAdapter.init(ESAdapter.java:75) ~[client-adapter.es7x-1.1.5-jar-with-dependencies.jar:na]
... 43 common frames omitted
在Github issues中找到#3717 解决方案
下载源码压缩包canal-canal-1.1.5.zip,解压后用IDE打开项目,修改canal-canal-1.1.5/client-adapter/escore/pom.xml
依赖
<dependency>
<groupId>com.alibaba</groupId>
<artifactId>druid</artifactId>
<scope>provided</scope>
</dependency>
然后在项目根目录下执行mvn clean package -Dmaven.test.skip=true
,然后到canal-canal-1.1.5\client-adapter\es7x\target\
目录下找到client-adapter.es7x-1.1.5-jar-with-dependencies.jar
将这个文件复制到./canal_adapter/plugin/
下,重启adapter。
2. NullPointerException
2022-03-01 09:45:29.443 [main] INFO c.a.o.canal.adapter.launcher.loader.CanalAdapterLoader - Load canal adapter: es7 succeed
2022-03-01 09:45:29.464 [main] INFO c.alibaba.otter.canal.connector.core.spi.ExtensionLoader - extension classpath dir: /usr/local/canal/canal.adapter-1.1.5/plugin
2022-03-01 09:45:29.524 [main] ERROR c.a.o.canal.adapter.launcher.loader.CanalAdapterService - ## something goes wrong when starting up the canal client adapters:
java.lang.NullPointerException: null
at com.google.common.base.Preconditions.checkNotNull(Preconditions.java:782) ~[guava-22.0.jar:na]
...
at com.alibaba.otter.canal.common.zookeeper.ZkClientx.getZkClient(ZkClientx.java:28) ~[canal.common-1.1.5.jar:na]
at com.alibaba.otter.canal.connector.tcp.consumer.CanalTCPConsumer.init(CanalTCPConsumer.java:57) ~[connector.tcp-1.1.5-jar-with-dependencies.jar:na]
at com.alibaba.otter.canal.adapter.launcher.loader.AdapterProcessor.<init>(AdapterProcessor.java:74) ~[client-adapter.launcher-1.1.5.jar:na]
at com.alibaba.otter.canal.adapter.launcher.loader.CanalAdapterLoader.lambda$init$0(CanalAdapterLoader.java:65) ~[client-adapter.launcher-1.1.5.jar:na]
at java.util.HashMap.computeIfAbsent(HashMap.java:1127) ~[na:1.8.0_161]
at com.alibaba.otter.canal.adapter.launcher.loader.CanalAdapterLoader.init(CanalAdapterLoader.java:60) ~[client-adapter.launcher-1.1.5.jar:na]
at com.alibaba.otter.canal.adapter.launcher.loader.CanalAdapterService.init(CanalAdapterService.java:60) ~[client-adapter.launcher-1.1.5.jar:na]
Google了半天也没找到网上类似的问题,所以怀疑是自己哪里配置文件写错了。经过对比,发现是多删除了application.yml文件的配置信息。
3. Connection refused
2022-03-01 11:21:49.823 [Thread-4] INFO c.a.otter.canal.adapter.launcher.loader.AdapterProcessor - =============> Start to connect destination: example <=============
2022-03-01 11:21:49.824 [Thread-4] ERROR c.a.otter.canal.adapter.launcher.loader.AdapterProcessor - process error!
com.alibaba.otter.canal.protocol.exception.CanalClientException: java.net.ConnectException: Connection refused
at com.alibaba.otter.canal.client.impl.SimpleCanalConnector.doConnect(SimpleCanalConnector.java:198) ~[na:na]
at com.alibaba.otter.canal.client.impl.SimpleCanalConnector.connect(SimpleCanalConnector.java:115) ~[na:na]
at com.alibaba.otter.canal.connector.tcp.consumer.CanalTCPConsumer.connect(CanalTCPConsumer.java:63) ~[na:na]
at com.alibaba.otter.canal.adapter.launcher.loader.AdapterProcessor.process(AdapterProcessor.java:184) ~[client-adapter.launcher-1.1.5.jar:na]
at java.lang.Thread.run(Thread.java:748) ~[na:1.8.0_161]
Caused by: java.net.ConnectException: Connection refused
at sun.nio.ch.Net.connect0(Native Method) ~[na:1.8.0_161]
at sun.nio.ch.Net.connect(Net.java:454) ~[na:1.8.0_161]
at sun.nio.ch.Net.connect(Net.java:446) ~[na:1.8.0_161]
at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:648) ~[na:1.8.0_161]
at com.alibaba.otter.canal.client.impl.SimpleCanalConnector.doConnect(SimpleCanalConnector.java:150) ~[na:na]
canal server中的canal.serverMode配置的是kafka,改为tcp后,一切正常。
解决思路
- 提示拒绝连接,要分析哪个连接被拒绝了;
- 查看服务是否开放了该端口。命令:
netstat -lntp
Kafka模式
配置canal.serverModel为kafka模式后,需要修改application.yml文件和
canal.conf:
mode: kafka #tcp kafka rocketMQ rabbitMQ
kafka.bootstrap.servers: 10.192.1.5:9092,10.192.1.6:9092,10.192.1.7:9092,10.192.1.8:9092
canalAdapters:
- instance: topic_dev # mq topic name
修改映射文件,如vim es7/user.yml
:
dataSourceKey: defaultDS
destination: topic_dev # mq topic name
groupId: g1
esMapping:
数据初始化导入
全量导入:
curl http://127.0.0.1:8081/etl/es7/crm_opportunity.yml -X POST
统计导入结果:
curl http://127.0.0.1:8081/count/es7/crm_opportunity.yml