数据开发(三)
一、Hive概述
Hive是基于Hadoop的一个数据仓库工具,可以将结构化的数据文件映射为一张数据表,并提供类似于SQL(HiveSQL)的操作功能。在Hive中,本质上是将SQL转换成为MapReduce程序。
二、
1、建表语句
CREATE TABLE `app.app_test`( `data_dt` string COMMENT '时间', `start_city_id` string COMMENT '城市ID', `start_city_name` string COMMENT '城市名称', `predict_list` string COMMENT '序列', `order_num` string COMMENT '单量') COMMENT '分拣中心预测表--预处理' PARTITIONED BY ( `dt` string) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' LOCATION 'hdfs://ns14/user/mart_coo/app.db/app_test' TBLPROPERTIES ( 'transient_lastDdlTime'='1527746321')
(1)STORED AS TEXTFILE:默认的是文本格式,当然也可以通过设置hive.default.fileformat修改默认设置;使用DELIMITED语句解析分割文件,通过ESCAPED BY语句设置分隔符的转义符号,当数据中含有分隔符时,就必须转义;通过NULL DEFINED AS语句可以设置一个自定义的空处理格式,默认情况下NULL='\N'
②STORED AS SEQUENCEFILE:压缩序列文件格式
③STORED AS ORC:ORC文件格式
④STORED AS PARQUET
⑤STORED AS AVRO
⑥STORED AS RCFILE:Record Columnar File
⑦STORED BY:non-native table format
⑧INPUTFORMAT and OUTPUTFORMAT
(2)PARTITIONED BY
在创建表时,使用PARTITIONED BY语句,则所创建表就是分区表;分区表可以有一个或多个分区列,给唯一的分区列值创建一个目录。表或分区可以继续通过CLUSTERED BY细分桶,使用SORT BY语句使桶内的数据按升序或降序排序,在某些情况下,可以提升某种查询的性能。
注意点:①分区列式虚拟的,不属于数据本身 ②分区列不能和表结构字段重名,如果需要直接用表结构字段中的名字作为分区列,那必须将表结构中字段重新命名,否则会报错
(3)外部表
创建外部表时,通过指定location语句的路径来覆盖配置文件中指定的文件夹,外部表的特点:当外部表被删除时,其对应的数据并非也从系统中删除。
2、查询表
三、注意编码格式
1、报错
出现这种情况时是格式问题。
重新建个文本,复制粘贴下试试。
四、
1、建表
drop table app.app_basic_dashboard_goods_sale_predict_daily; CREATE TABLE app.app_basic_dashboard_goods_sale_predict_daily ( seller_id bigint COMMENT '商家id', seller_name string COMMENT '商家名称', dept_id bigint COMMENT '事业部门id', dept_no string COMMENT '事业部门编号', dept_name string COMMENT '事业部名字', goods_id bigint COMMENT '商品id', goods_no string COMMENT '商品序号', goods_name string COMMENT '商品名称', warehouse_id bigint COMMENT '入仓id', warehouse_no string COMMENT '入仓编号', warehouse_name string COMMENT '入仓名称', in_warehouse_city string COMMENT '入仓城市', satisfy_alpha string COMMENT '入仓服务水平C', safe_stock_days string COMMENT '安全库存天数', alt string COMMENT '出仓ALT', vlt string COMMENT '出仓VLT', in_stock_safety_num int COMMENT '安全库存', target_stock_days string COMMENT '目标库存天数', bp string COMMENT 'BP', out_warehouse_no string COMMENT '出仓编号', out_warehouse_name string COMMENT '出仓名称', out_warehouse_city string COMMENT '出仓城市', in_stock_max_num int COMMENT '目标库存', bef_sales_1d string COMMENT 'T-1日销量', bef_sales_2d string COMMENT 'T-2日销量', bef_sales_3d string COMMENT 'T-3日销量', bef_sales_4d string COMMENT 'T-4日销量', bef_sales_5d string COMMENT 'T-5日销量', bef_sales_6d string COMMENT 'T-6日销量', bef_sales_7d string COMMENT 'T-7日销量', in_stock_sales_14d string COMMENT '历史14日销量和', in_stock_sales_28d string COMMENT '历史28日销量和', predict_sales_1d string COMMENT 'T+1日预测销量', predict_sales_2d string COMMENT 'T+2日预测销量', predict_sales_3d string COMMENT 'T+3日预测销量', predict_sales_4d string COMMENT 'T+4日预测销量', predict_sales_5d string COMMENT 'T+5日预测销量', predict_sales_6d string COMMENT 'T+6日预测销量', predict_sales_7d string COMMENT 'T+7日预测销量', in_stock_predict_sales_14d string COMMENT '14日预测销量和', in_stock_predict_sales_28d string COMMENT '28日预测销量和', yn int COMMENT '删除标识 1为没删,0为删除', create_pin string COMMENT '创建人', update_pin string COMMENT '更新人', create_time timestamp COMMENT '创建时间', update_time timestamp COMMENT '更新时间', ts timestamp COMMENT '时间戳') COMMENT 'KA商品销量预测接口表' PARTITIONED BY ( dt string) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
2、
2.1只删除表中数据,不删除表
truncate table app.app_basic_dashboard_goods_sale_predict_daily_tem
2.2删除分区
ALTER TABLE app_basic_dashboard_goods_sale_predict_daily_tem DROP partition(dt = '2018-08-16')
2.3 按条件删除
2.3删除表及其数据
drop table app.app_basic_dashboard_goods_sale_predict_daily_tem
显示建表语句
show create app.app_XXXXXXX
3、配置任务
#!/usr/bin/env python3 ################################################################ # AUTHOR: wn # CREATED TIME: 2018-08-09 # MODIFIED BY: # MODIFTED TIME: # REVIEWED BY: # REVIEWED TIME: # COMMENTS: goods ################################################################ #=============================================================================== # FILE: exe_app_basic_goods_stock_num_daily_da_d.py # USAGE: ./exe_app_basic_goods_stock_num_daily_da_d.py # SRC_TABLE: # TGT_TABLE: app_basic_goods_stock_num_daily_da #=============================================================================== import sys import os import time import datetime import logging import calendar sys.path.append(os.getenv('HIVE_TASK')) from HiveTask import HiveTask ht = HiveTask() today = ht.oneday(1)[0:10] yesterday = ht.oneday(0)[0:10] sql1 = """ use app; insert overwrite table app.app_basic_dashboard_goods_sale_predict_daily_tem partition ( dt = '"""+yesterday+"""' ) select c.seller_id, --商家编号 c.seller_name, --商家名称 c.dept_id, --事业部id c.dept_no, --事业部编号 c.dept_name, --事业部名称 --c.seller_no, --供应商编号 --c.seller_name, --供应商名称 c.goods_id, --商品id c.goods_no, --商品编号 c.goods_name, --商品名称 c.warehouse_id, --入仓id c.warehouse_no, --入仓编号 c.warehouse_name, --入仓名称 h.loc_city_name as in_warehouse_city, --入仓城市 case when d.satisfy_alpha is null then '0.8' else d.satisfy_alpha end as satisfy_alpha, -- 出仓服务水平c (basis) case when d.safe_stock_days is null then '5' else d.safe_stock_days end as safe_stock_days, -- 安全库存天数(basis) case when d.alt is null then '10' else d.alt end as alt, -- 出仓alt(小时)(basis) case when j.totaltime is null then '36' else j.totaltime end as vlt, --运输时间 f.in_stock_safety_num as in_stock_safety_num, --安全库存(计划调拨表取数) case when d.target_stock_days is null then '10' else d.target_stock_days end as target_stock_days, -- 目标库存天数(basis) case when d.bp is null then '20' else d.bp end as BP, -- BP f.out_warehouse_no as out_warehouse_no, --配出仓编码 f.out_warehouse_name as out_warehouse_name, --配出仓名称 i.loc_city_name as out_warehouse_city, --入仓城市 f.in_stock_max_num as in_stock_max_num, --目标库存(计划调拨表取数) case when split(g.sales_week_detail, ',') [6] is NULL then '0' else split(g.sales_week_detail, ',') [6] end as bef_sales_1d, --T-1日销量(逆向) case when split(g.sales_week_detail, ',') [5] is NULL then '0' else split(g.sales_week_detail, ',') [5] end as bef_sales_2d, --T-2日销量 case when split(g.sales_week_detail, ',') [4] is NULL then '0' else split(g.sales_week_detail, ',') [4] end as bef_sales_3d, --T-3日销量 case when split(g.sales_week_detail, ',') [3] is NULL then '0' else split(g.sales_week_detail, ',') [3] end as bef_sales_4d, --T-4日销量 case when split(g.sales_week_detail, ',') [2] is NULL then '0' else split(g.sales_week_detail, ',') [2] end as bef_sales_5d, --T-5日销量 case when split(g.sales_week_detail, ',') [1] is NULL then '0' else split(g.sales_week_detail, ',') [1] end as bef_sales_6d, --T-6日销量 case when split(g.sales_week_detail, ',') [0] is NULL then '0' else split(g.sales_week_detail, ',') [0] end as bef_sales_7d, --T-7日销量 case when g.sales_14d is NULL then '0' else g.sales_14d end as in_stock_sales_14d, --14日销量 case when g.sales_28d is NULL then '0' else g.sales_28d end as in_stock_sales_28d, --28日销量 case when split(g.predict_sales_week_detail, ',')[0] is NULL and m.avg_year_sellnum is NULL --都空为0 then '0' when split(g.predict_sales_week_detail, ',')[0] is NULL and m.avg_year_sellnum is not NULL --没有预测值,按销量平均值计 then round(m.avg_year_sellnum) else split(g.predict_sales_week_detail, ',') [0] --预测值计 end as predict_sales_1d, --T+1日预测销量(正向) case when split(g.predict_sales_week_detail, ',')[1] is NULL and m.avg_year_sellnum is NULL then '0' when split(g.predict_sales_week_detail, ',')[1] is NULL and m.avg_year_sellnum is not NULL then round(m.avg_year_sellnum) else split(g.predict_sales_week_detail, ',') [1] end as predict_sales_2d, --T+2日预测销量 case when split(g.predict_sales_week_detail, ',')[2] is NULL and m.avg_year_sellnum is NULL then '0' when split(g.predict_sales_week_detail, ',')[2] is NULL and m.avg_year_sellnum is not NULL then round(m.avg_year_sellnum) else split(g.predict_sales_week_detail, ',') [2] end as predict_sales_3d, --T+3日预测销量 case when split(g.predict_sales_week_detail, ',')[3] is NULL and m.avg_year_sellnum is NULL then '0' when split(g.predict_sales_week_detail, ',')[3] is NULL and m.avg_year_sellnum is not NULL then round(m.avg_year_sellnum) else split(g.predict_sales_week_detail, ',') [3] end as predict_sales_4d, --T+4日预测销量 case when split(g.predict_sales_week_detail, ',')[4] is NULL and m.avg_year_sellnum is NULL then '0' when split(g.predict_sales_week_detail, ',')[4] is NULL and m.avg_year_sellnum is not NULL then round(m.avg_year_sellnum) else split(g.predict_sales_week_detail, ',') [4] end as predict_sales_5d, --T+5日预测销量 case when split(g.predict_sales_week_detail, ',')[5] is NULL and m.avg_year_sellnum is NULL then '0' when split(g.predict_sales_week_detail, ',')[5] is NULL and m.avg_year_sellnum is not NULL then round(m.avg_year_sellnum) else split(g.predict_sales_week_detail, ',') [5] end as predict_sales_6d, --T+6日预测销量 case when split(g.predict_sales_week_detail, ',')[6] is NULL and m.avg_year_sellnum is NULL then '0' when split(g.predict_sales_week_detail, ',')[6] is NULL and m.avg_year_sellnum is not NULL then round(m.avg_year_sellnum) else split(g.predict_sales_week_detail, ',') [6] end as predict_sales_7d, --T+7日预测销量 case when g.predict_sales_14d is NULL and m.avg_year_sellnum is NULL then '0' when g.predict_sales_14d is NULL and m.avg_year_sellnum is not NULL then round(m.avg_year_sellnum*14) else g.predict_sales_14d end as in_stock_predict_sales_14d, --14日预测销量 case when g.predict_sales_28d is NULL and m.avg_year_sellnum is NULL then '0' when g.predict_sales_28d is NULL and m.avg_year_sellnum is not NULL then round(m.avg_year_sellnum*28) else g.predict_sales_28d end as in_stock_predict_sales_28d, --28日预测销量 1 AS yn, --删除标识 1为没删,0为删除 'plumber' AS create_pin, --创建人 'plumber' AS update_pin , --更新人 current_timestamp AS create_time, --创建时间 current_timestamp AS update_time, --更新时间 current_timestamp AS ts --时间戳 from fdm.fdm_eclp_stock1_saleable_warehouse_stock_chain c --可销售库存表 join ( select a.dept_no, a.goods_id, a.warehouse_no, max(a.start_date) as start_date from fdm.fdm_eclp_stock1_saleable_warehouse_stock_chain a where a.dept_no in ( select dept_no from app.app_basic_dashboard_goods_seller --商户名称表 ) and a.dp = 'ACTIVE' and a.yn = 1 group by a.dept_no, a.warehouse_no, a.goods_id ) b on c.dept_no = b.dept_no and c.warehouse_no = b.warehouse_no and c.goods_id = b.goods_id and c.start_date = b.start_date left join fdm.fdm_log_scm_ka_allot_sys_pre_allot d --参数表 on c.dept_no = d.dept_no and c.warehouse_no = d.warehouse_no and c.goods_no = d.goods_id and d.yn = 1 and d.dt = sysdate( - 1) left join ( select * from fdm.fdm_log_scm_ka_allot_allot_plan --调拨计划表 where id in ( select max(e.id) from fdm.fdm_log_scm_ka_allot_allot_plan e where e.dt = sysdate( - 1) group by e.dept_name, e.in_warehouse_no, e.goods_no ) and dt = sysdate( - 1) ) f on f.dept_no = c.dept_no and f.in_warehouse_no = c.warehouse_no and f.goods_no = c.goods_no left join app.app_ka_predict_sales_da g ---线下门店销量预测表 on c.dept_no = g.dept_no and c.goods_no = g.goods_no and c.warehouse_no = g.warehouse_no and g.dt = sysdate( - 1) left join dim.dim_wms_store h --库房维表,取出入仓所属城市 on c.warehouse_name = h.dim_store_name left join dim.dim_wms_store i --库房维表,取出仓所属城市 on f.out_warehouse_name = i.dim_store_name left join ( select startcityid, startcityname, endcityid, endcityname, totalaging, totaltime, yn from ( select case when ( startprovinceid in(1, 2, 3, 4) ) then startprovinceid else startcityid end as startcityid, case when ( startprovinceid in(1, 2, 3, 4) ) then concat(startprovincename, '市') else startcityname end as startcityname, case when ( endprovinceid in(1, 2, 3, 4) ) then endprovinceid else endcityid end as endcityid, case when ( endprovinceid in(1, 2, 3, 4) ) then concat(endprovincename, '市') else endcityname end as endcityname, ceil(avg(totalaging)) as totalaging, ceil(avg(totaltime)) as totaltime, max(1) as yn from fdm.fdm_staticroutebatchgenerate_staticroutebatchgenerate_chain where dp = 'ACTIVE' and yn = 1 group by case when ( startprovinceid in(1, 2, 3, 4) ) then startprovinceid else startcityid end, case when ( startprovinceid in(1, 2, 3, 4) ) then concat(startprovincename, '市') else startcityname end, case when ( endprovinceid in(1, 2, 3, 4) ) then endprovinceid else endcityid end, case when ( endprovinceid in(1, 2, 3, 4) ) then concat(endprovincename, '市') else endcityname end ) route ) j --vlt on h.loc_city_name = j.startcityname and i.loc_city_name = j.endcityname left join app.app_basic_goods_stock_num_daily_da m on c.seller_id=m.seller_id and c.warehouse_id=m.warehouse_id and c.goods_id=m.goods_id and m.dt = sysdate( - 1) """ ht.exec_sql(schema_name = 'app', table_name = 'app_basic_dashboard_goods_sale_predict_daily_tem', sql = sql1, merge_flag = True)
4、
#!/usr/bin/env python3 ################################################################ # AUTHOR: wn # CREATED TIME: 2018-08-09 # MODIFIED BY: # MODIFTED TIME: # REVIEWED BY: # REVIEWED TIME: # COMMENTS: goods ################################################################ #=============================================================================== # FILE: exe_app_basic_dashboard_goods_sale_predict_daily.py # USAGE: ./exe_app_basic_dashboard_goods_sale_predict_daily.py # SRC_TABLE: # TGT_TABLE: app.app_basic_dashboard_goods_sale_predict_daily #=============================================================================== import sys import os import time import datetime import logging import calendar sys.path.append(os.getenv('HIVE_TASK')) from HiveTask import HiveTask ht = HiveTask() today = ht.oneday(1)[0:10] yesterday = ht.oneday(0)[0:10] sql1 = """ use app; insert overwrite table app.app_basic_dashboard_goods_sale_predict_daily partition ( dt = '"""+yesterday+"""' ) select k.seller_id, --商家编号 k.seller_name, --商家名称 k.dept_id, --事业部id k.dept_no, --事业部编号 k.dept_name, --事业部名称 k.goods_id, --商品id k.goods_no, --商品编号 k.goods_name, --商品名称 k.warehouse_id, --入仓id k.warehouse_no, --入仓编号 k.warehouse_name, --入仓名称 k.in_warehouse_city, --入仓城市 k.satisfy_alpha, -- 出仓服务水平c (basis) k.safe_stock_days, -- 安全库存天数(basis) k.alt, -- 出仓alt(小时)(basis) k.vlt, --运输时间 case when k.in_stock_safety_num is null and ( m.safe_stock_days + m.alt / 24 + m.vlt / 24 ) >= 0 and ( m.safe_stock_days + m.alt / 24 + m.vlt / 24 ) <= 1 then round(m.satisfy_alpha *(m.safe_stock_days + m.alt / 24 + m.vlt / 24) * m.predict_sales_1d) when k.in_stock_safety_num is null and ( m.safe_stock_days + m.alt / 24 + m.vlt / 24 ) > 1 and ( m.safe_stock_days + m.alt / 24 + m.vlt / 24 ) <= 2 then round(m.satisfy_alpha *(m.safe_stock_days + m.alt / 24 + m.vlt / 24) *(m.predict_sales_1d + m.predict_sales_2d) / 2) when k.in_stock_safety_num is null and ( m.safe_stock_days + m.alt / 24 + m.vlt / 24 ) > 2 and ( m.safe_stock_days + m.alt / 24 + m.vlt / 24 ) <= 3 then round(m.satisfy_alpha *(m.safe_stock_days + m.alt / 24 + m.vlt / 24) *(m.predict_sales_1d + m.predict_sales_2d + m.predict_sales_3d) / 3) when k.in_stock_safety_num is null and ( m.safe_stock_days + m.alt / 24 + m.vlt / 24 ) > 3 and ( m.safe_stock_days + m.alt / 24 + m.vlt / 24 ) <= 4 then round(m.satisfy_alpha *(m.safe_stock_days + m.alt / 24 + m.vlt / 24) *(m.predict_sales_1d + m.predict_sales_2d + m.predict_sales_3d + m.predict_sales_4d) / 4) when k.in_stock_safety_num is null and ( m.safe_stock_days + m.alt / 24 + m.vlt / 24 ) > 4 and ( m.safe_stock_days + m.alt / 24 + m.vlt / 24 ) <= 5 then round(m.satisfy_alpha *(m.safe_stock_days + m.alt / 24 + m.vlt / 24) *(m.predict_sales_1d + m.predict_sales_2d + m.predict_sales_3d + m.predict_sales_4d + m.predict_sales_5d) / 5) when k.in_stock_safety_num is null and ( m.safe_stock_days + m.alt / 24 + m.vlt / 24 ) > 5 and ( m.safe_stock_days + m.alt / 24 + m.vlt / 24 ) <= 6 then round(m.satisfy_alpha *(m.safe_stock_days + m.alt / 24 + m.vlt / 24) *(m.predict_sales_1d + m.predict_sales_2d + m.predict_sales_3d + m.predict_sales_4d + m.predict_sales_5d + m.predict_sales_6d) / 6) when k.in_stock_safety_num is null and ( m.safe_stock_days + m.alt / 24 + m.vlt / 24 ) > 6 and ( m.safe_stock_days + m.alt / 24 + m.vlt / 24 ) <= 7 then round(m.satisfy_alpha *(m.safe_stock_days + m.alt / 24 + m.vlt / 24) *(m.predict_sales_1d + m.predict_sales_2d + m.predict_sales_3d + m.predict_sales_4d + m.predict_sales_5d + m.predict_sales_6d + m.predict_sales_7d) / 7) when k.in_stock_safety_num is null and ( m.safe_stock_days + m.alt / 24 + m.vlt / 24 ) > 7 and ( m.safe_stock_days + m.alt / 24 + m.vlt / 24 ) <= 14 then round(m.satisfy_alpha *(m.safe_stock_days + m.alt / 24 + m.vlt / 24) *(m.in_stock_predict_sales_14d) / 14) when k.in_stock_safety_num is null and ( m.safe_stock_days + m.alt / 24 + m.vlt / 24 ) > 14 then round(m.satisfy_alpha *(m.safe_stock_days + m.alt / 24 + m.vlt / 24) *(m.in_stock_predict_sales_28d) / 28) else round(k.in_stock_safety_num) end as in_stock_safety_num, --安全库存 k.target_stock_days, -- 目标库存天数(basis) k.BP, -- BP k.out_warehouse_no, --配出仓编码 k.out_warehouse_name, --配出仓名称 k.out_warehouse_city, --出仓城市 case when k.in_stock_max_num is null then round(m.satisfy_alpha * m.target_stock_days * m.in_stock_predict_sales_14d / 14 + m.BP) else round(k.in_stock_max_num) end as in_stock_max_num, --目标库存 k.bef_sales_1d, --T-1日销量(逆向) k.bef_sales_2d, --T-2日销量 k.bef_sales_3d, --T-3日销量 k.bef_sales_4d, --T-4日销量 k.bef_sales_5d, --T-5日销量 k.bef_sales_6d, --T-6日销量 k.bef_sales_7d, --T-7日销量 k.in_stock_sales_14d, --14日销量 k.in_stock_sales_28d, --28日销量 k.predict_sales_1d, --T+1日预测销量(正向) k.predict_sales_2d, --T+2日预测销量 k.predict_sales_3d, --T+3日预测销量 k.predict_sales_4d, --T+4日预测销量 k.predict_sales_5d, --T+5日预测销量 k.predict_sales_6d, --T+6日预测销量 k.predict_sales_7d, --T+7日预测销量 k.in_stock_predict_sales_14d, --14日预测销量 k.in_stock_predict_sales_28d, --28日预测销量 1 AS yn, --删除标识 1为没删,0为删除 'plumber' AS create_pin, --创建人 'plumber' AS update_pin, --更新人 current_timestamp AS create_time, --创建时间 current_timestamp AS update_time, --更新时间 current_timestamp AS ts --时间戳 from ( select * from app.app_basic_dashboard_goods_sale_predict_daily_tem where dt = sysdate( - 1) ) k join ( select * from app.app_basic_dashboard_goods_sale_predict_daily_tem where dt = sysdate( - 1) ) m on k.dept_no = m.dept_no and k.goods_no = m.goods_no and k.warehouse_no = m.warehouse_no """ ht.exec_sql(schema_name = 'app', table_name = 'app_basic_dashboard_goods_sale_predict_daily', sql = sql1, merge_flag = True)