DataX 实战案例 -- 使用datax实现将mysql数据导入到hdfs

  • 需求: 将mysql表student的数据导入到hdfs的 /datax/mysql2hdfs/ 路径下面去。

  • 1、创建mysql数据库和需要用到的表结构,并导入实战案例需要用到的数据

      [hadoop@hadoop02 ~] mysql -uroot -p123456
      mysql> create database datax;
      mysql> use datax;
      mysql> create table student(id int,name varchar(20),age int,createtime timestamp );
      mysql> insert into `student` (`id`, `name`, `age`, `createtime`) values('1','zhangsan','18','2021-05-10 18:10:00');
      mysql> insert into `student` (`id`, `name`, `age`, `createtime`) values('2','lisi','28','2021-05-10 19:10:00');
      mysql> insert into `student` (`id`, `name`, `age`, `createtime`) values('3','wangwu','38','2021-05-10 20:10:00');
    
  • 2、创建作业的配置文件(json格式)

    • 查看配置模板,执行脚本命令
      [hadoop@hadoop03 datax]$ cd /bigdata/install/datax
      [hadoop@hadoop03 datax]$ python bin/datax.py -r mysqlreader -w hdfswriter
      
      DataX (DATAX-OPENSOURCE-3.0), From Alibaba !
      Copyright (C) 2010-2017, Alibaba Group. All Rights Reserved.
      
      Please refer to the mysqlreader document:
           https://github.com/alibaba/DataX/blob/master/mysqlreader/doc/mysqlreader.md 
      
      Please refer to the hdfswriter document:
           https://github.com/alibaba/DataX/blob/master/hdfswriter/doc/hdfswriter.md 
       
      Please save the following configuration as a json file and  use
           python {DATAX_HOME}/bin/datax.py {JSON_FILE_NAME}.json 
      to run the job.
      
      {
          "job": {
              "content": [
                  {
                      "reader": {
                          "name": "mysqlreader", 
                          "parameter": {
                              "column": [], 
                              "connection": [
                                  {
                                      "jdbcUrl": [], 
                                      "table": []
                                  }
                              ], 
                              "password": "", 
                              "username": "", 
                              "where": ""
                          }
                      }, 
                      "writer": {
                          "name": "hdfswriter", 
                          "parameter": {
                              "column": [], 
                              "compress": "", 
                              "defaultFS": "", 
                              "fieldDelimiter": "", 
                              "fileName": "", 
                              "fileType": "", 
                              "path": "", 
                              "writeMode": ""
                          }
                      }
                  }
              ], 
              "setting": {
                  "speed": {
                      "channel": ""
                  }
              }
          }
      }
      
    • 其中hdfswriter插件文档
  • 3、根据模板写配置文件

    • 进入到 /bigdata/install/datax/job 目录,然后创建配置文件 mysql2hdfs.json, 文件内容如下:
      {
          "job": {
              "setting": {
                  "speed": {
                       "channel":1
                  }
              },
              "content": [
                  {
                      "reader": {
                          "name": "mysqlreader",
                          "parameter": {
                              "username": "root",
                              "password": "123456",
                              "connection": [
                                  {
                                      "querySql": [
                                          "select id,name,age,createtime from student where age < 30;"
                                      ],
                                      "jdbcUrl": [
                                          "jdbc:mysql://hadoop02:3306/datax"
                                      ]
                                  }
                              ]
                          }
                      },
                        "writer": {
                          "name": "hdfswriter",
                          "parameter": {
                              "defaultFS": "hdfs://hadoop01:8020",
                              "fileType": "text",
                              "path": "/datax/mysql2hdfs/",
                              "fileName": "student.txt",
                              "column": [
                                  {
                                      "name": "id",
                                      "type": "INT"
                                  },
                                  {
                                      "name": "name",
                                      "type": "STRING"
                                  },
                                  {
                                      "name": "age",
                                      "type": "INT"
                                  },
                                  {
                                      "name": "createtime",
                                      "type": "TIMESTAMP"
                                  }
                              ],
                              "writeMode": "append",
                              "fieldDelimiter": "\t",
                              "compress":"gzip"
                          }
                      }
                  }
              ]
          }
      }
      
  • 4、启HDFS, 创建目标路径

    [hadoop@hadoop01 ~]$ start-dfs.sh 
    [hadoop@hadoop01 ~]$ hdfs dfs -mkdir -p /datax/mysql2hdfs
    
  • 5、启动DataX

    [hadoop@hadoop03 bin]$ cd /bigdata/install/datax
    [hadoop@hadoop03 bin]$ python bin/datax.py job/mysql2hdfs.json 
    
  • 6、观察控制台输出结果

    同步结束,显示日志如下:
    
    2021-06-18 01:41:26.452 [job-0] INFO  JobContainer - 
    任务启动时刻                    : 2021-06-18 01:41:14
    任务结束时刻                    : 2021-06-18 01:41:26
    任务总计耗时                    :                 11s
    任务平均流量                    :                3B/s
    记录写入速度                    :              0rec/s
    读出记录总数                    :                   2
    读写失败总数                    :                   0
    
  • 7、查看HDFS上文件生成,并验证结果

    将上边结果下载解压后打开,可以看到里面的结果和mysql中结果对比

posted @ 2021-06-19 00:03  Tenic  阅读(1662)  评论(1编辑  收藏  举报