Background

Canal is a widely used real-time data synchronization tool for Mysql database. I am developing a multi-tenant database system, in which each tenant owns one database. Canal can be used to aggregate data from tenant's database into a full database.

Install

The latest version of canal can be downloaded from the release page. This post introduce the details to build the system. However, it it not that easy to done by following steps. Some traps are on the way that needs to be noticed, and I would explain them below.

Install canal develop

in conf/example/instance.properties file, you should configure canal.instance.filter.regex with posix grep grammar. Example:

# this must be configured otherwise the client won't update data
canal.instance.mysql.slaveId=1234
# this would match all the tables in the databases start with pentafleet_test
canal.instance.filter.regex=^pentafleet_test.*\\..*

Install canal adapter

application.yml

Some attributes should be configured carefully. See example below.

server:
# set the listen port
  port: 8070
spring:
  jackson:
    date-format: yyyy-MM-dd HH:mm:ss
    time-zone: GMT+8
    default-property-inclusion: non_null

canal.conf:
  mode: tcp
  flatMessage: true
  zookeeperHosts:
  syncBatchSize: 1000
  retries: 0
  timeout:
  accessKey:
  secretKey:
  consumerProperties:
    canal.tcp.server.host: 0.0.0.0:11111
    canal.tcp.zookeeper.hosts:
    canal.tcp.batch.size: 500
    canal.tcp.username:
    canal.tcp.password:

# Here you could configure several data sources.
  srcDataSources:
    DS1:
      url: jdbc:mysql://127.0.0.1:3306/pentafleet_test?useUnicode=true&autoReconnect=true
      username: canal
      password: canal
    DS2:
      url: jdbc:mysql://127.0.0.1:3306/pentafleet_test2?useUnicode=true&autoReconnect=true
      username: canal
      password: canal

  canalAdapters:
  - instance: example # canal instance Name or mq topic name
# Here you could configure several data destination.
    groups:
    - groupId: g1
      outerAdapters:
      - name: logger
      - name: rdb
        key: mysql1
        properties:
          jdbc.driverClassName: com.mysql.jdbc.Driver
          jdbc.url: jdbc:mysql://127.0.0.1:3306/pentafleet_all?useUnicode=true&autoReconnect=true
          # make sure this account have writing permission for the destination database
          jdbc.username: canal
          jdbc.password: canal

# configure to avoid running error 
MODULES_REDIRECT_OUTPUT+x:

Using below command to grant database permission to user 'canal'.

GRANT ALL ON pentafleet_all.* TO canal@'%';

conf/rdb/mytest_user.yml

This file is used to configure table mapping. See example below.

# 'DS1' is datasource configured in application.yml
dataSourceKey: DS1
# 'example' is canal instance name
destination: example
# g1, mysql1 is destination configuration in application.yml
groupId: g1
outerAdapterKey: mysql1
concurrent: true
# can only mapping one table in one file
dbMapping:
#  mirrorDb: true
  database: pentafleet_test
  table: monitor_email
  targetTable: monitor_email
  targetPk:
    id: id
  mapAll: true

You could only map one table in one file. that means if you have 2 source database,each having 6 tables, and 1 destination database. You should create 12 files for mapping.

Full data update

We describe how to configure incremental update above, and here we show how to do full data update. we can only run command in linux.

curl http://127.0.0.1:8070/etl/rdb/mysql1/mytest_user.yml -X POST
  • 127.0.0.1:8070 - the address for canal adapter
  • mytest_user.yml - the file in conf/rdb/ folder.