Datax实现离线Tidb(Mysql)到Elasticsearch

1、下载Datax

cd /data
wget http://datax-opensource.oss-cn-hangzhou.aliyuncs.com/datax.tar.gz
tar -zxvf datax.tar.gz
# 需要删除隐藏文件 (重要)
rm -rf /data/datax/plugin/*/._* 

2、解压后,查看自己需要转换的源数据源和目标数据源插件是否支持,如果都有的话则会有如下文件夹。

/data/datax/plugin/reader/mysqlreader
/data/datax/plugin/writer/elasticsearchwriter

3、通过解压所得我没有 elasticsearchwriter 插件,需自行用源码打包

4、下载Datax源码

https://gitee.com/mirrors/DataX.git

5、去掉自己不需要的 module ,我只留下了自己需要的 elasticsearchwriter

    <modules>
        <module>common</module>
        <module>core</module>
        <module>transformer</module>

        <!-- reader -->

        <!-- writer -->
        <module>elasticsearchwriter</module>

        <!-- common support module -->
        <module>plugin-rdbms-util</module>
        <module>plugin-unstructured-storage-util</module>

    </modules>

6、编译 elasticsearchwriter,在Datax根目录执行

mvn clean install '-Dmaven.test.skip=true'

7、生成的插件包在 DataX\elasticsearchwriter\target\datax\plugin\writer,复制到 /data/datax/plugin/writer 下

8、编写tidb-es.json,其中read中也可以使用querySql来替代,但是 read 中的 column 与 writer 中 column 需对应,最好顺序也一致

{
    "job": {
        "setting": {
            "speed": {
                "channel": 8
            }
        },
        "content": [
            {
                "reader": {
                    "name": "mysqlreader",
                    "parameter": {
                        "username": "root",
                        "password": "root",
                        "column": [
                            "id as pk",
                            "id",
                            "indicators_name",
                            "indicators_code",
                            "indicators_region_name",
                            "create_time"
                        ],
                        "where": "",
                        "splitPk": "id",
                        "connection": [
                            {
                                "jdbcUrl": [
                                    "jdbc:mysql://IP:PORT/schema?characterEncoding=utf-8"
                                ],
                                "table": [
                                    "table_name"
                                ]
                            }
                        ]
                    }
                },
                "writer": {
                    "name": "elasticsearchwriter",
                    "parameter": {
                        "endpoint": "http://IP:PORT",
                        "index": "datax_table_name",
                        "type": "_doc",
                        "settings": {
                            "index": {
                                "number_of_shards": 5,
                                "number_of_replicas": 1
                            }
                        },
                        "writeMode": "insert",
                        "cleanup": false,
                        "discovery": false,
                        "batchSize": 10000,
                        "splitter": ",",
                        "column": [
                            {
                                "name": "pk",
                                "type": "id"
                            },
                            {
                                "name": "id",
                                "type": "keyword"
                            },
                            {
                                "name": "indicators_name",
                                "type": "text"
                            },
                            {
                                "name": "indicators_code",
                                "type": "text"
                            },
                            {
                                "name": "indicators_region_name",
                                "type": "text"
                            },
                            {
                                "name": "create_time",
                                "type": "date",
                                "format": "yyyy-MM-dd HH:mm:ss"
                            }
                        ]
                    }
                }
            }
        ]
    }
}

9、执行datax

Datax依赖python环境,如果没有python环境,需自行安装。我centos自带的 Python 2.7.5

python /data/datax/bin/datax.py /data/datax/job/tidb-es.json

10、过程再无问题,在Elasticsearch中查到迁移的数据

## Datax作为离线数据迁移工具,对实时迁移并不友好。如果需要定时,可搭配crontab使用。

posted @ 2024-02-28 16:42  蓝色土耳其  阅读(46)  评论(0编辑  收藏  举报