|NO.Z.00040|——————————|BigDataEnd|——|Hadoop&PB级数仓.V07|——|PB数仓.v08|ADS层数据导出|

一、ADS层数据导出(DataX)
### --- 步骤:

~~~     在MySQL创建对应的表
~~~     创建配置文件(json)
~~~     执行命令,使用json配置文件;测试
~~~     编写执行脚本(shell)
~~~     shell脚本的测试
二、在mysql建表
### --- 语法:MySQL 建表

drop table if exists dwads.ads_ad_show_place;

create table dwads.ads_ad_show_place(
ad_action tinyint,
hour varchar(2),
place varchar(20),
product_id int,
cnt int,
dt varchar(10)
);
### --- 实例操作:mysql建表

mysql> drop table if exists dwads.ads_ad_show_place;
mysql> create table dwads.ads_ad_show_place(
    -> ad_action tinyint,
    -> hour varchar(2),
    -> place varchar(20),
    -> product_id int,
    -> cnt int,
    -> dt varchar(10)
    -> );
三、创建json配置文件
### --- 准备资源地址
~~~     # 准备表ads_ad_show_place存储在什么位置

hive (ads)> desc formatted ads.ads_ad_show_place;
Location:               hdfs://hadoop01:9000/user/hive/warehouse/ads.db/ads_ad_show_place
### --- 创建配置文件

[root@hadoop02 ~]# vim /data/yanqidw/script/advertisement/ads_ad_show_place.json
{
  "job": {
    "setting": {
      "speed": {
        "channel": 1
      }
    },
    "content": [
      {
        "reader": {
          "name": "hdfsreader",
          "parameter": {
            "path": "/user/hive/warehouse/ads.db/ads_ad_show_place/dt=$do_date/*",
            "defaultFS": "hdfs://hadoop01:9000",
            "column": [
              {
                "index": 0,
                "type": "string"
              },
              {
                "index": 1,
                "type": "string"
              },
              {
                "index": 2,
                "type": "string"
              },
              {
                "index": 3,
                "type": "string"
              },
              {
                "index": 4,
                "type": "string"
              },
              {
                "type": "string",
                "value": "$do_date"
              }
            ],
            "fileType": "text",
            "encoding": "UTF-8",
            "fieldDelimiter": ","
          }
        },
        "writer": {
          "name": "mysqlwriter",
          "parameter": {
            "writeMode": "insert",
            "username": "hive",
            "password": "12345678",
            "column": [
              "ad_action",
              "hour",
              "place",
              "product_id",
              "cnt",
              "dt"
            ],
            "preSql": [
              "delete from ads_ad_show_place where dt='$do_date'"
            ],
            "connection": [
              {
                "jdbcUrl": "jdbc:mysql://hadoop05:3306/dwads?useUnicode=true&characterEncoding=utf-8",
                "table": [
                  "ads_ad_show_place"
                ]
              }
            ]
          }
        }
      }
    ]
  }
}
### --- 执行命令(测试)

[root@hadoop02 ~]#  python /opt/yanqi/servers/datax/bin/datax.py  \
-p "-Ddo_date=2020-07-21" \
/data/yanqidw/script/advertisement/ads_ad_show_place.json
~~~输出参数
任务启动时刻                    : 2021-10-04 21:46:51
任务结束时刻                    : 2021-10-04 21:47:10
任务总计耗时                    :                 19s
任务平均流量                    :            4.73KB/s
记录写入速度                    :            141rec/s
读出记录总数                    :                1414
读写失败总数                    :                   0
### --- 在mysql下查看表中是否有数据

mysql> select * from dwads.ads_ad_show_place limit 5;
+-----------+------+----------------------+------------+------+------------+
| ad_action | hour | place                | product_id | cnt  | dt         |
+-----------+------+----------------------+------------+------+------------+
|         0 | 00   | placecampaign1_index |         13 |    1 | 2020-07-21 |
|         0 | 00   | placecampaign1_index |         23 |    1 | 2020-07-21 |
|         0 | 00   | placecampaign1_left  |         12 |    1 | 2020-07-21 |
|         0 | 00   | placecampaign1_left  |         38 |    1 | 2020-07-21 |
|         0 | 00   | placecampaign1_left  |         43 |    1 | 2020-07-21 |
+-----------+------+----------------------+------------+------+------------+
mysql> select count(*) from dwads.ads_ad_show_place limit 5;
+----------+
| count(*) |
+----------+
|     1414 |
+----------+
四、编写脚本
### --- 创建导出脚本

[root@hadoop02 ~]# vim /data/yanqidw/script/advertisement/ads_ad_show_place.sh
#!/bin/bash

source /etc/profile

JSON=/data/yanqidw/script

if [ -n "$1" ] ;
then
    do_date=$1
else
    do_date=`date -d "-1 day" +%F`
fi
python $DATAX_HOME/bin/datax.py -p "-Ddo_date=$do_date" $JSON/advertisement/ads_ad_show_place.json
### --- 执行脚本

[root@hadoop02 ~]#sh /data/yanqidw/script/advertisement/ads_ad_show_place.sh 2020-07-21
~~~输出参数
任务启动时刻                    : 2021-10-04 21:56:51
任务结束时刻                    : 2021-10-04 21:57:05
任务总计耗时                    :                 13s
任务平均流量                    :            4.73KB/s
记录写入速度                    :            141rec/s
读出记录总数                    :                1414
读写失败总数                    :                   0
### --- 在mysql下查看数据是否导出成功

mysql> select count(*) from dwads.ads_ad_show_place limit 5;
+----------+
| count(*) |
+----------+
|     1414 |
+----------+

 
 
 
 
 
 
 
 
 

Walter Savage Landor:strove with none,for none was worth my strife.Nature I loved and, next to Nature, Art:I warm'd both hands before the fire of life.It sinks, and I am ready to depart
                                                                                                                                                   ——W.S.Landor

 

 

posted on   yanqi_vip  阅读(14)  评论(0编辑  收藏  举报

相关博文:
阅读排行:
· 无需6万激活码!GitHub神秘组织3小时极速复刻Manus,手把手教你使用OpenManus搭建本
· Manus爆火,是硬核还是营销?
· 终于写完轮子一部分:tcp代理 了,记录一下
· 别再用vector<bool>了!Google高级工程师:这可能是STL最大的设计失误
· 单元测试从入门到精通
< 2025年3月 >
23 24 25 26 27 28 1
2 3 4 5 6 7 8
9 10 11 12 13 14 15
16 17 18 19 20 21 22
23 24 25 26 27 28 29
30 31 1 2 3 4 5

导航

统计

点击右上角即可分享
微信分享提示