DataX 实战案例 -- 使用datax实现将hdfs数据导入到mysql表中
-
需求: 将hdfs上数据文件
user.txt
导入到mysql数据库的user
表中。 -
1、创建作业的配置文件(json格式)
- 查看配置模板,执行脚本命令
[hadoop@hadoop03 ~]$ cd /bigdata/install/datax [hadoop@hadoop03 datax]$ python bin/datax.py -r hdfsreader -w mysqlwriter DataX (DATAX-OPENSOURCE-3.0), From Alibaba ! Copyright (C) 2010-2017, Alibaba Group. All Rights Reserved. Please refer to the hdfsreader document: https://github.com/alibaba/DataX/blob/master/hdfsreader/doc/hdfsreader.md Please refer to the mysqlwriter document: https://github.com/alibaba/DataX/blob/master/mysqlwriter/doc/mysqlwriter.md Please save the following configuration as a json file and use python {DATAX_HOME}/bin/datax.py {JSON_FILE_NAME}.json to run the job. { "job": { "content": [ { "reader": { "name": "hdfsreader", "parameter": { "column": [], "defaultFS": "", "encoding": "UTF-8", "fieldDelimiter": ",", "fileType": "orc", "path": "" } }, "writer": { "name": "mysqlwriter", "parameter": { "column": [], "connection": [ { "jdbcUrl": "", "table": [] } ], "password": "", "preSql": [], "session": [], "username": "", "writeMode": "" } } } ], "setting": { "speed": { "channel": "" } } } }
- 其中
hdfsreader
插件文档
- 查看配置模板,执行脚本命令
-
2、根据模板写配置文件
- 进入到
/bigdata/install/datax/job
目录,然后创建配置文件hdfs2mysql.json
, 文件内容如下:{ "job": { "setting": { "speed": { "channel":1 } }, "content": [ { "reader": { "name": "hdfsreader", "parameter": { "defaultFS": "hdfs://hadoop01:8020", "path": "/user.txt", "fileType": "text", "encoding": "UTF-8", "fieldDelimiter": "\t", "column": [ { "index": 0, "type": "long" }, { "index": 1, "type": "string" }, { "index": 2, "type": "long" } ] } }, "writer": { "name": "mysqlwriter", "parameter": { "writeMode": "insert", "username": "root", "password": "123456", "column": [ "id", "name", "age" ], "preSql": [ "delete from user" ], "connection": [ { "jdbcUrl": "jdbc:mysql://hadoop02:3306/datax?useUnicode=true&characterEncoding=utf-8", "table": [ "user" ] } ] } } } ] } }
- 进入到
-
3、准备HDFS上测试数据文件
user.txt
- user.txt文件内容如下
1 zhangsan 20 2 lisi 29 3 wangwu 25 4 zhaoliu 35 5 kobe 40
- 文件中每列字段通过
\t
制表符进行分割,上传文件到hdfs上[hadoop@hadoop03 ~]$ hdfs dfs -put user.txt /
- user.txt文件内容如下
-
4、创建目标表
mysql> create table datax.user(id int,name varchar(20),age int);
-
5、启动DataX
[hadoop@hadoop03 ~]$ cd /bigdata/install/datax [hadoop@hadoop03 bin]$ python bin/datax.py job/hdfs2mysql.json
-
6、观察控制台输出结果
同步结束,显示日志如下: 任务启动时刻 : 2021-06-18 12:02:47 任务结束时刻 : 2021-06-18 12:02:58 任务总计耗时 : 11s 任务平均流量 : 4B/s 记录写入速度 : 0rec/s 读出记录总数 : 5 读写失败总数 : 0
-
7、查看
user
表数据