|NO.Z.00040|——————————|BigDataEnd|——|Hadoop&PB级数仓.V07|——|PB数仓.v08|ADS层数据导出|
一、ADS层数据导出(DataX)
### --- 步骤:
~~~ 在MySQL创建对应的表
~~~ 创建配置文件(json)
~~~ 执行命令,使用json配置文件;测试
~~~ 编写执行脚本(shell)
~~~ shell脚本的测试
二、在mysql建表
### --- 语法:MySQL 建表
drop table if exists dwads.ads_ad_show_place;
create table dwads.ads_ad_show_place(
ad_action tinyint,
hour varchar(2),
place varchar(20),
product_id int,
cnt int,
dt varchar(10)
);
### --- 实例操作:mysql建表
mysql> drop table if exists dwads.ads_ad_show_place;
mysql> create table dwads.ads_ad_show_place(
-> ad_action tinyint,
-> hour varchar(2),
-> place varchar(20),
-> product_id int,
-> cnt int,
-> dt varchar(10)
-> );
三、创建json配置文件
### --- 准备资源地址
~~~ # 准备表ads_ad_show_place存储在什么位置
hive (ads)> desc formatted ads.ads_ad_show_place;
Location: hdfs://hadoop01:9000/user/hive/warehouse/ads.db/ads_ad_show_place
### --- 创建配置文件
[root@hadoop02 ~]# vim /data/yanqidw/script/advertisement/ads_ad_show_place.json
{
"job": {
"setting": {
"speed": {
"channel": 1
}
},
"content": [
{
"reader": {
"name": "hdfsreader",
"parameter": {
"path": "/user/hive/warehouse/ads.db/ads_ad_show_place/dt=$do_date/*",
"defaultFS": "hdfs://hadoop01:9000",
"column": [
{
"index": 0,
"type": "string"
},
{
"index": 1,
"type": "string"
},
{
"index": 2,
"type": "string"
},
{
"index": 3,
"type": "string"
},
{
"index": 4,
"type": "string"
},
{
"type": "string",
"value": "$do_date"
}
],
"fileType": "text",
"encoding": "UTF-8",
"fieldDelimiter": ","
}
},
"writer": {
"name": "mysqlwriter",
"parameter": {
"writeMode": "insert",
"username": "hive",
"password": "12345678",
"column": [
"ad_action",
"hour",
"place",
"product_id",
"cnt",
"dt"
],
"preSql": [
"delete from ads_ad_show_place where dt='$do_date'"
],
"connection": [
{
"jdbcUrl": "jdbc:mysql://hadoop05:3306/dwads?useUnicode=true&characterEncoding=utf-8",
"table": [
"ads_ad_show_place"
]
}
]
}
}
}
]
}
}
### --- 执行命令(测试)
[root@hadoop02 ~]# python /opt/yanqi/servers/datax/bin/datax.py \
-p "-Ddo_date=2020-07-21" \
/data/yanqidw/script/advertisement/ads_ad_show_place.json
~~~输出参数
任务启动时刻 : 2021-10-04 21:46:51
任务结束时刻 : 2021-10-04 21:47:10
任务总计耗时 : 19s
任务平均流量 : 4.73KB/s
记录写入速度 : 141rec/s
读出记录总数 : 1414
读写失败总数 : 0
### --- 在mysql下查看表中是否有数据
mysql> select * from dwads.ads_ad_show_place limit 5;
+-----------+------+----------------------+------------+------+------------+
| ad_action | hour | place | product_id | cnt | dt |
+-----------+------+----------------------+------------+------+------------+
| 0 | 00 | placecampaign1_index | 13 | 1 | 2020-07-21 |
| 0 | 00 | placecampaign1_index | 23 | 1 | 2020-07-21 |
| 0 | 00 | placecampaign1_left | 12 | 1 | 2020-07-21 |
| 0 | 00 | placecampaign1_left | 38 | 1 | 2020-07-21 |
| 0 | 00 | placecampaign1_left | 43 | 1 | 2020-07-21 |
+-----------+------+----------------------+------------+------+------------+
mysql> select count(*) from dwads.ads_ad_show_place limit 5;
+----------+
| count(*) |
+----------+
| 1414 |
+----------+
四、编写脚本
### --- 创建导出脚本
[root@hadoop02 ~]# vim /data/yanqidw/script/advertisement/ads_ad_show_place.sh
#!/bin/bash
source /etc/profile
JSON=/data/yanqidw/script
if [ -n "$1" ] ;
then
do_date=$1
else
do_date=`date -d "-1 day" +%F`
fi
python $DATAX_HOME/bin/datax.py -p "-Ddo_date=$do_date" $JSON/advertisement/ads_ad_show_place.json
### --- 执行脚本
[root@hadoop02 ~]#sh /data/yanqidw/script/advertisement/ads_ad_show_place.sh 2020-07-21
~~~输出参数
任务启动时刻 : 2021-10-04 21:56:51
任务结束时刻 : 2021-10-04 21:57:05
任务总计耗时 : 13s
任务平均流量 : 4.73KB/s
记录写入速度 : 141rec/s
读出记录总数 : 1414
读写失败总数 : 0
### --- 在mysql下查看数据是否导出成功
mysql> select count(*) from dwads.ads_ad_show_place limit 5;
+----------+
| count(*) |
+----------+
| 1414 |
+----------+
Walter Savage Landor:strove with none,for none was worth my strife.Nature I loved and, next to Nature, Art:I warm'd both hands before the fire of life.It sinks, and I am ready to depart
——W.S.Landor
分类:
bdv014-PB离线数仓
【推荐】国内首个AI IDE,深度理解中文开发场景,立即下载体验Trae
【推荐】编程新体验,更懂你的AI,立即体验豆包MarsCode编程助手
【推荐】抖音旗下AI助手豆包,你的智能百科全书,全免费不限次数
【推荐】轻量又高性能的 SSH 工具 IShell:AI 加持,快人一步
· 无需6万激活码!GitHub神秘组织3小时极速复刻Manus,手把手教你使用OpenManus搭建本
· Manus爆火,是硬核还是营销?
· 终于写完轮子一部分:tcp代理 了,记录一下
· 别再用vector<bool>了!Google高级工程师:这可能是STL最大的设计失误
· 单元测试从入门到精通