DataX 实战案例 -- 使用datax实现将mysql数据导入到hdfs
-
需求: 将mysql表
student
的数据导入到hdfs的/datax/mysql2hdfs/
路径下面去。 -
1、创建mysql数据库和需要用到的表结构,并导入实战案例需要用到的数据
[hadoop@hadoop02 ~] mysql -uroot -p123456 mysql> create database datax; mysql> use datax; mysql> create table student(id int,name varchar(20),age int,createtime timestamp ); mysql> insert into `student` (`id`, `name`, `age`, `createtime`) values('1','zhangsan','18','2021-05-10 18:10:00'); mysql> insert into `student` (`id`, `name`, `age`, `createtime`) values('2','lisi','28','2021-05-10 19:10:00'); mysql> insert into `student` (`id`, `name`, `age`, `createtime`) values('3','wangwu','38','2021-05-10 20:10:00');
-
2、创建作业的配置文件(json格式)
- 查看配置模板,执行脚本命令
[hadoop@hadoop03 datax]$ cd /bigdata/install/datax [hadoop@hadoop03 datax]$ python bin/datax.py -r mysqlreader -w hdfswriter DataX (DATAX-OPENSOURCE-3.0), From Alibaba ! Copyright (C) 2010-2017, Alibaba Group. All Rights Reserved. Please refer to the mysqlreader document: https://github.com/alibaba/DataX/blob/master/mysqlreader/doc/mysqlreader.md Please refer to the hdfswriter document: https://github.com/alibaba/DataX/blob/master/hdfswriter/doc/hdfswriter.md Please save the following configuration as a json file and use python {DATAX_HOME}/bin/datax.py {JSON_FILE_NAME}.json to run the job. { "job": { "content": [ { "reader": { "name": "mysqlreader", "parameter": { "column": [], "connection": [ { "jdbcUrl": [], "table": [] } ], "password": "", "username": "", "where": "" } }, "writer": { "name": "hdfswriter", "parameter": { "column": [], "compress": "", "defaultFS": "", "fieldDelimiter": "", "fileName": "", "fileType": "", "path": "", "writeMode": "" } } } ], "setting": { "speed": { "channel": "" } } } }
- 其中
hdfswriter
插件文档
- 查看配置模板,执行脚本命令
-
3、根据模板写配置文件
- 进入到
/bigdata/install/datax/job
目录,然后创建配置文件mysql2hdfs.json
, 文件内容如下:{ "job": { "setting": { "speed": { "channel":1 } }, "content": [ { "reader": { "name": "mysqlreader", "parameter": { "username": "root", "password": "123456", "connection": [ { "querySql": [ "select id,name,age,createtime from student where age < 30;" ], "jdbcUrl": [ "jdbc:mysql://hadoop02:3306/datax" ] } ] } }, "writer": { "name": "hdfswriter", "parameter": { "defaultFS": "hdfs://hadoop01:8020", "fileType": "text", "path": "/datax/mysql2hdfs/", "fileName": "student.txt", "column": [ { "name": "id", "type": "INT" }, { "name": "name", "type": "STRING" }, { "name": "age", "type": "INT" }, { "name": "createtime", "type": "TIMESTAMP" } ], "writeMode": "append", "fieldDelimiter": "\t", "compress":"gzip" } } } ] } }
- 进入到
-
4、启HDFS, 创建目标路径
[hadoop@hadoop01 ~]$ start-dfs.sh [hadoop@hadoop01 ~]$ hdfs dfs -mkdir -p /datax/mysql2hdfs
-
5、启动DataX
[hadoop@hadoop03 bin]$ cd /bigdata/install/datax [hadoop@hadoop03 bin]$ python bin/datax.py job/mysql2hdfs.json
-
6、观察控制台输出结果
同步结束,显示日志如下: 2021-06-18 01:41:26.452 [job-0] INFO JobContainer - 任务启动时刻 : 2021-06-18 01:41:14 任务结束时刻 : 2021-06-18 01:41:26 任务总计耗时 : 11s 任务平均流量 : 3B/s 记录写入速度 : 0rec/s 读出记录总数 : 2 读写失败总数 : 0
-
7、查看HDFS上文件生成,并验证结果
将上边结果下载解压后打开,可以看到里面的结果和mysql中结果对比