canal学习之数据同步(一)
一、canal的前世今生
1、阿里巴巴B2B公司,因为业务的特性,卖家主要集中在国内,买家主要集中在国外,所以衍生出了杭州和美国异地机房的需求,从2010年开始,阿里系公司开始逐步的尝试基于数据库的日志解析,获取增量变更进行同步,由此衍生出了增量订阅&消费的业务。
2、canal是用java开发的基于数据库增量日志解析,提供增量数据订阅&消费的中间件。目前,canal主要支持了MySQL的binlog解析,解析完成后才利用canal client 用来处理获得的相关数据。(数据库同步需要阿里的otter中间件,基于canal)。
3、这里我们可以简单地把canal理解为一个用来同步增量数据的一个工具:
canal通过binlog同步拿到变更数据,再发送到存储目的地,比如MySQL,Kafka,Elastic Search等多源同步。
二、canal使用场景
1、原始场景, 阿里otter中间件的一部分
2、更新缓存
3、抓取业务数据新增变化表,用于制作拉链表。( 拉链表:记录每条信息的生命周期,一旦一条记录的生命周期结束,就要重新开始一条新的记录,并把当前日期放入生效的开始日期 )
4、抓取业务表的新增变化数据,用于制作实时统计。
三、mysql数据库的主从同步
1、Master主库将改变记录,写到二进制日志(binary log)中
2、Slave从库向mysql master发送dump协议,将master主库的binary log events拷贝到它的中继日志(relay log);
3、Slave从库读取并重做中继日志中的事件,将改变的数据同步到自己的数据库。
四、canal的运行原理
1、canal的工作原理很简单,就是把自己伪装成slave,假装从master复制数据。
五、安装使用
1、canal可以是单机,也可以是集群(高可用),集群可以采用(rocketMQ和zookeeper)
2、阿里的canal项目地址为:https://github.com/alibaba/canal,下载链接可以在github页面点击右边的release查看各版本下载
3、下载安装包
# 下载安装包 wangting@ops03:/opt/software >wget https://github.com/alibaba/canal/releases/download/canal-1.1.5/canal.deployer-1.1.5.tar.gz wangting@ops03:/opt/software >ll | grep canal -rw-r--r-- 1 wangting wangting 60205298 Aug 17 11:23 canal.deployer-1.1.5.tar.gz
4、解压安装
# 新建canal解压目录【注意】: 官方项目解压出来没有顶级canal目录,所以新建个目录用于解压组件 wangting@ops03:/opt/software >mkdir -p /opt/module/canal wangting@ops03:/opt/software >tar -xf canal.deployer-1.1.5.tar.gz -C /opt/module/canal/
5、修改canal主配置 canal.properties
# tcp bind ip #canal服务id,目前没有实际意义 canal.ip = # register ip to zookeeper canal.register.ip = #canal服务socket监听端口,代码中连接canal-server时,使用此段口连接 canal.port = 11111 canal.metrics.pull.port = 11112 # canal admin config 管理控制台信息 canal.admin.manager = 自定义:8089 canal.admin.port = 11110 canal.admin.user = admin canal.admin.passwd = 自定义 #zookeeper服务地址端口,多个以,分隔 canal.zkServers =xx1:2181,xx2:2181,xx3:2181 #canal持久化数据到zookeeper上的更新频率,单位毫秒 canal.zookeeper.flush.period = 1000 canal.withoutNetty = false # tcp, kafka, rocketMQ, rabbitMQ canal.serverMode = rocketMQ # flush meta cursor/parse position to file canal.file.data.dir = ${canal.conf.dir} canal.file.flush.period = 1000 ## memory store RingBuffer size, should be Math.pow(2,n) canal.instance.memory.buffer.size = 16384 ## memory store RingBuffer used memory unit size , default 1kb canal.instance.memory.buffer.memunit = 1024 ## meory store gets mode used MEMSIZE or ITEMSIZE canal.instance.memory.batch.mode = MEMSIZE canal.instance.memory.rawEntry = true ## detecing config #是否开启心跳检查 canal.instance.detecting.enable = true #canal.instance.detecting.sql = insert into retl.xdual values(1,now()) on duplicate key update x=now() canal.instance.detecting.sql = select 1 #心跳检查频率,单位秒 canal.instance.detecting.interval.time = 3 #心跳检查失败重试次数 canal.instance.detecting.retry.threshold = 3 #心跳检查失败后,是否开启自动mysql自动切换 canal.instance.detecting.heartbeatHaEnable = true # support maximum transaction size, more than the size of the transaction will be cut into multiple transactions delivery canal.instance.transaction.size = 1024 # mysql fallback connected to new master should fallback times canal.instance.fallbackIntervalInSeconds = 60 # network config canal.instance.network.receiveBufferSize = 16384 canal.instance.network.sendBufferSize = 16384 canal.instance.network.soTimeout = 30 # binlog filter config #是否使用druid处理所有的ddl解析来获取库和表名 canal.instance.filter.druid.ddl = true #是否忽略dcl语句 canal.instance.filter.query.dcl = true #是否忽略dml语句 canal.instance.filter.query.dml = true #是否忽略ddl语句 canal.instance.filter.query.ddl = false canal.instance.filter.table.error = false canal.instance.filter.rows = false canal.instance.filter.transaction.entry = false canal.instance.filter.dml.insert = false canal.instance.filter.dml.update = false canal.instance.filter.dml.delete = false # binlog format/image check canal.instance.binlog.format = ROW,STATEMENT,MIXED canal.instance.binlog.image = FULL,MINIMAL,NOBLOB # binlog ddl isolation canal.instance.get.ddl.isolation = false # parallel parser config canal.instance.parser.parallel = true ## concurrent thread number, default 60% available processors, suggest not to exceed Runtime.getRuntime().availableProcessors() #canal.instance.parser.parallelThreadSize = 16 ## disruptor ringbuffer size, must be power of 2 canal.instance.parser.parallelBufferSize = 256 # table meta tsdb info 管理canal配置数据 canal.instance.tsdb.enable = false #canal.instance.tsdb.dir = ${canal.file.data.dir:../conf}/${canal.instance.destination:} canal.instance.tsdb.url = jdbc:mysql://ip:3306/canal_manager canal.instance.tsdb.dbUsername = canal canal.instance.tsdb.dbPassword = canal # dump snapshot interval, default 24 hour 生成快照间隔时间 canal.instance.tsdb.snapshot.interval = 1 # purge snapshot expire , default 360 hour(15 days) 快照过期时间 canal.instance.tsdb.snapshot.expire = 360 ################################################# ######### destinations ############# ################################################# canal.destinations = # conf root dir 配置目录 canal.conf.dir = ../conf canal.log.dir = /home/ops/logs/canal/ # auto scan instance dir add/remove and start/stop instance 自动扫描 canal.auto.scan = true canal.auto.scan.interval = 5 # set this value to 'true' means that when binlog pos not found, skip to latest. # WARN: pls keep 'false' in production env, or if you know what you want. canal.auto.reset.latest.pos.mode = false #canal.instance.tsdb.spring.xml = classpath:spring/tsdb/h2-tsdb.xml canal.instance.tsdb.spring.xml = classpath:spring/tsdb/mysql-tsdb.xml canal.instance.global.mode = manager canal.instance.global.lazy = false canal.instance.global.manager.address = ${canal.admin.manager} #canal.instance.global.spring.xml = classpath:spring/memory-instance.xml #canal.instance.global.spring.xml = classpath:spring/file-instance.xml canal.instance.global.spring.xml = classpath:spring/default-instance.xml ################################################## ######### MQ Properties ############# ################################################## # aliyun ak/sk , support rds/mq canal.aliyun.accessKey = canal.aliyun.secretKey = canal.aliyun.uid= canal.mq.flatMessage = true canal.mq.canalBatchSize = 50 canal.mq.canalGetTimeout = 100 # Set this value to "cloud", if you want open message trace feature in aliyun. canal.mq.accessChannel = local canal.mq.database.hash = true canal.mq.send.thread.size = 30 canal.mq.build.thread.size = 8 ################################################## ######### Kafka ############# ################################################## kafka.bootstrap.servers = 127.0.0.1:6667 kafka.acks = all kafka.compression.type = none kafka.batch.size = 16384 kafka.linger.ms = 1 kafka.max.request.size = 1048576 kafka.buffer.memory = 33554432 kafka.max.in.flight.requests.per.connection = 1 kafka.retries = 0 kafka.kerberos.enable = false kafka.kerberos.krb5.file = "../conf/kerberos/krb5.conf" kafka.kerberos.jaas.file = "../conf/kerberos/jaas.conf" ################################################## ######### RocketMQ ############# ################################################## rocketmq.producer.group = canal-producer rocketmq.enable.message.trace = false rocketmq.customized.trace.topic = rocketmq.namespace = rocketmq.namesrv.addr = ip:9876;ip:9876;ip:9876 rocketmq.retry.times.when.send.failed = 0 rocketmq.vip.channel.enabled = false rocketmq.tag = ################################################## ######### RabbitMQ ############# ################################################## rabbitmq.host = rabbitmq.virtual.host = rabbitmq.exchange = rabbitmq.username = rabbitmq.password = rabbitmq.deliveryMode =
六、页面配置
1、访问地址http://ip:8089/#/canalServer/nodeServers登录
2、新建server
3、新建Instance
################################################# ## mysql serverId , v1.0.26+ will autoGen # canal.instance.mysql.slaveId=0 # enable gtid use true/false 开启使用mysql事务ID canal.instance.gtidon=false # position info 需要监控的数据库地址 canal.instance.master.address= ip:3306 canal.instance.master.journal.name= canal.instance.master.position= # 控制获取日志时候的起始位置 canal.instance.master.timestamp= canal.instance.master.gtid= #canal.instance.standby.address = ip:3306 #canal.instance.standby.journal.name = #canal.instance.standby.position = #canal.instance.standby.timestamp = # rds oss binlog canal.instance.rds.accesskey= canal.instance.rds.secretkey= canal.instance.rds.instanceId= # table meta tsdb info canal存放需要监控的表结构及用户,解决表结构变更出现的冲突问题 canal.instance.tsdb.enable=true canal.instance.tsdb.url=jdbc:mysql://ip:3306/canal_tsdb canal.instance.tsdb.dbUsername=canal canal.instance.tsdb.dbPassword=canal #canal.instance.standby.address = #canal.instance.standby.journal.name = #canal.instance.standby.position = #canal.instance.standby.timestamp = #canal.instance.standby.gtid= # username/password canal.instance.dbUsername=canal canal.instance.dbPassword=canal canal.instance.connectionCharset = UTF-8 # enable druid Decrypt database password canal.instance.enableDruid=false #canal.instance.pwdPublicKey=MFwwDQYJKoZIhvcNAQEBBQADSwAwSAJBALK4BUxdDltRRE5/zXpVEVPUgunvscYFtEip3pmLlhrWpacX7y7GCMo2/JM6LeHmiiNdH1FWgGCpUfircSwlWKUCAwEAAQ== # table regex 需要监控的表 canal.instance.filter.regex=isv_ali_prod_start_store\\.qualification,isv_ali_prod_start_store\\.qualification_attribute # table black regex canal.instance.filter.black.regex= # table field filter(format: schema1.tableName1:field1/field2,schema2.tableName2:field1/field2) #canal.instance.filter.field=test1.t_product:id/subject/keywords,test2.t_company:id/name/contact/ch # table field black filter(format: schema1.tableName1:field1/field2,schema2.tableName2:field1/field2) #canal.instance.filter.black.field=test1.t_product:subject/product_image,test2.t_company:id/name/contact/ch # mq config 监控数据存放topic canal.mq.topic=dev_canal_merchant_binlog_test # dynamic topic route by schema or table regex #canal.mq.dynamicTopic=mytest1.user,mytest2\\..*,.*\\..* #canal.mq.partition=0 # hash partition config canal.mq.partitionsNum=48 canal.mq.partitionHash=.*\\..*:$pk$ #################################################
借鉴博客:https://blog.csdn.net/sinat_27818621/article/details/121357499