|NO.Z.00021|——————————|BigDataEnd|——|Hadoop&PB级数仓.V05|——|PB数仓.v05|会员活跃度分析|DWD建表&加载数据|
一、DWD层建表和数据加载
### --- DWD层建表和数据加载
~~~ 主要任务:ODS(包含json串) => DWD
~~~ json数据解析,丢弃无用数据(数据清洗),保留有效信息,并将数据展开,形成每日启动明细表。
2021-09-28 18:19:32.966 [main] INFO com.yanqi.ecommerce.AppStart -
{"app_active":{"name":"app_active","json":
{"entry":"1","action":"1","error_code":"0"},"time":1596309585861},"attr":
{"area":"绍兴","uid":"2F10092A10","app_v":"1.1.16","event_type":"common","device_id":"1FB872-9A10010","os_type":"3.0","channel":"ML",
"language":"chinese","brand":"Huawei-2"}}
2020-07-21
二、DWD层表创建和加载数据
### --- 创建DWD层表:表的格式:parquet、分区表
~~~ # 语法
use dwd;
drop table if exists dwd.dwd_start_log;
CREATE TABLE dwd.dwd_start_log(
`device_id` string,
`area` string,
`uid` string,
`app_v` string,
`event_type` string,
`os_type` string,
`channel` string,
`language` string,
`brand` string,
`entry` string,
`action` string,
`error_code` string
)
PARTITIONED BY (dt string)
STORED AS parquet;
hive (default)> use dwd;
hive (dwd)> CREATE TABLE dwd.dwd_start_log(
> `device_id` string,
> `area` string,
> `uid` string,
> `app_v` string,
> `event_type` string,
> `os_type` string,
> `channel` string,
> `language` string,
> `brand` string,
> `entry` string,
> `action` string,
> `error_code` string
> )
> PARTITIONED BY (dt string)
> STORED AS parquet;
hive (dwd)> show tables;
dwd_start_log
二、加载DWD层数据
### --- 加载DWD层数据
[root@hadoop02 ~]# vim /data/yanqidw/script/member_active/dwd_load_start.sh
#!/bin/bash
source /etc/profile
# 可以输入日期;如果未输入日期取昨天的时间
if [ -n "$1" ]
then
do_date=$1
else
do_date=`date -d "-1 day" +%F`
fi
# 定义要执行的SQL
sql="with tmp as
(
select split(str, ' ')[7] line
from ods.ods_start_log where dt='$do_date'
)
insert overwrite table dwd.dwd_start_log partition(dt='$do_date')
select get_json_object(line, '$.attr.device_id'),
get_json_object(line, '$.attr.area'),
get_json_object(line, '$.attr.uid'),
get_json_object(line, '$.attr.app_v'),
get_json_object(line, '$.attr.event_type'),
get_json_object(line, '$.attr.os_type'),
get_json_object(line, '$.attr.channel'),
get_json_object(line, '$.attr.language'),
get_json_object(line, '$.attr.brand'),
get_json_object(line, '$.app_active.json.entry'),
get_json_object(line, '$.app_active.json.action'),
get_json_object(line, '$.app_active.json.error_code')
from tmp;
"
hive -e "$sql"
### --- 为脚本授予执行权限
[root@hadoop02 ~]# chmod +x /data/yanqidw/script/member_active/dwd_load_start.sh
三、执行该脚本,验证结果
### --- 执行该脚本,验证结果
~~~ # 加载数据到ODS分区下
~~~ # 确保ods分区下是有数据存在
hive (ods)> show partitions ods.ods_start_log;
OK
partition
dt=2021-09-28
hive (ods)> select * from ods.ods_start_log limit 10;
~~~ # 执行该脚本
[root@hadoop02 ~]# /data/yanqidw/script/member_active/dwd_load_start.sh
~~~输出参数
Total MapReduce CPU Time Spent: 0 msec
OK
_c0 _c1 _c2 _c3 _c4 _c5 _c6 _c7 _c8 _c9 _c10 _c11
~~~ # 查看数据是否通过ODS加载到DWD下
hive (dwd)> show partitions dwd_start_log;
OK
partition
dt=2021-09-28
hive (dwd)> select * from dwd.dwd_start_log limit 5;
OK
dwd_start_log.device_id dwd_start_log.area dwd_start_log.uid dwd_start_log.app_v dwd_start_log.event_type dwd_start_log.os_type dwd_start_log.channel dwd_start_log.language dwd_start_log.brand dwd_start_log.entry dwd_start_log.action dwd_start_log.error_code dwd_start_log.dt
1FB872-9A1001 连云港 2F10092A1 1.1.8 common 0.43 PN chinese iphone-7 1 0 0 2021-09-28
1FB872-9A1002 金昌 2F10092A2 1.1.5 common 5.8.7 OF chinese xiaomi-0 1 1 0 2021-09-28
1FB872-9A1003 句容 2F10092A3 1.1.16 common 0.99 YI chinese iphone-3 1 0 0 2021-09-28
1FB872-9A1004 肇庆 2F10092A4 1.1.6 common 6.6.2 CD chinese xiaomi-6 1 1 0 2021-09-28
1FB872-9A1005 武汉 2F10092A5 1.1.2 common 6.9 WG chinese xiaomi-1 1 1 0 2021-09-28
### --- 日志文件 =》 Flume =》 HDFS =》 ODS =》 DWD ODS =》 DWD;
~~~ json数据的解析;数据清洗
~~~ 下一步任务:DWD(会员的每日启动信息明细) => DWS(如何建表,如何加载数据)
~~~ 活跃会员 ===> 新增会员 ===> 会员留存
Walter Savage Landor:strove with none,for none was worth my strife.Nature I loved and, next to Nature, Art:I warm'd both hands before the fire of life.It sinks, and I am ready to depart
——W.S.Landor
分类:
bdv014-PB离线数仓
【推荐】国内首个AI IDE,深度理解中文开发场景,立即下载体验Trae
【推荐】编程新体验,更懂你的AI,立即体验豆包MarsCode编程助手
【推荐】抖音旗下AI助手豆包,你的智能百科全书,全免费不限次数
【推荐】轻量又高性能的 SSH 工具 IShell:AI 加持,快人一步
· 无需6万激活码!GitHub神秘组织3小时极速复刻Manus,手把手教你使用OpenManus搭建本
· Manus爆火,是硬核还是营销?
· 终于写完轮子一部分:tcp代理 了,记录一下
· 别再用vector<bool>了!Google高级工程师:这可能是STL最大的设计失误
· 单元测试从入门到精通