Datax实现离线Tidb(Mysql)到Elasticsearch
1、下载Datax
cd /data wget http://datax-opensource.oss-cn-hangzhou.aliyuncs.com/datax.tar.gz tar -zxvf datax.tar.gz # 需要删除隐藏文件 (重要) rm -rf /data/datax/plugin/*/._*
2、解压后,查看自己需要转换的源数据源和目标数据源插件是否支持,如果都有的话则会有如下文件夹。
/data/datax/plugin/reader/mysqlreader /data/datax/plugin/writer/elasticsearchwriter
3、通过解压所得我没有 elasticsearchwriter 插件,需自行用源码打包
4、下载Datax源码
https://gitee.com/mirrors/DataX.git
5、去掉自己不需要的 module ,我只留下了自己需要的 elasticsearchwriter
<modules> <module>common</module> <module>core</module> <module>transformer</module> <!-- reader --> <!-- writer --> <module>elasticsearchwriter</module> <!-- common support module --> <module>plugin-rdbms-util</module> <module>plugin-unstructured-storage-util</module> </modules>
6、编译 elasticsearchwriter,在Datax根目录执行
mvn clean install '-Dmaven.test.skip=true'
7、生成的插件包在 DataX\elasticsearchwriter\target\datax\plugin\writer,复制到 /data/datax/plugin/writer 下
8、编写tidb-es.json,其中read中也可以使用querySql来替代,但是 read 中的 column 与 writer 中 column 需对应,最好顺序也一致
{ "job": { "setting": { "speed": { "channel": 8 } }, "content": [ { "reader": { "name": "mysqlreader", "parameter": { "username": "root", "password": "root", "column": [ "id as pk", "id", "indicators_name", "indicators_code", "indicators_region_name", "create_time" ], "where": "", "splitPk": "id", "connection": [ { "jdbcUrl": [ "jdbc:mysql://IP:PORT/schema?characterEncoding=utf-8" ], "table": [ "table_name" ] } ] } }, "writer": { "name": "elasticsearchwriter", "parameter": { "endpoint": "http://IP:PORT", "index": "datax_table_name", "type": "_doc", "settings": { "index": { "number_of_shards": 5, "number_of_replicas": 1 } }, "writeMode": "insert", "cleanup": false, "discovery": false, "batchSize": 10000, "splitter": ",", "column": [ { "name": "pk", "type": "id" }, { "name": "id", "type": "keyword" }, { "name": "indicators_name", "type": "text" }, { "name": "indicators_code", "type": "text" }, { "name": "indicators_region_name", "type": "text" }, { "name": "create_time", "type": "date", "format": "yyyy-MM-dd HH:mm:ss" } ] } } } ] } }
9、执行datax
Datax依赖python环境,如果没有python环境,需自行安装。我centos自带的 Python 2.7.5
python /data/datax/bin/datax.py /data/datax/job/tidb-es.json
10、过程再无问题,在Elasticsearch中查到迁移的数据
## Datax作为离线数据迁移工具,对实时迁移并不友好。如果需要定时,可搭配crontab使用。