Hive 练习 简单任务处理
1、2018年4月份的用户数、订单量、销量、GMV (不局限与这些统计量,你也可以自己想一些)
-- -- -- 2018年4月份的用户数量 select count(a.user_id) as user_nums from ( select user_id from app_jypt_m04_ord_det_di where dt >= '2018-04-01' and sale_ord_dt <= '2018-04-30' and sale_ord_dt >= '2018-04-01' group by user_id ) a; -- 2018年4月份的订单量 select count(a.sale_ord_id) as sale_nums from ( select sale_ord_id from app_jypt_m04_ord_det_di where dt >= '2018-04-01' and sale_ord_dt <= '2018-04-30' and sale_ord_dt >= '2018-04-01' group by sale_ord_id ) a; -- -- 2018年4月份的销量 select sum(COALESCE(sale_qtty, 0)) as xiaoliang from app_jypt_m04_ord_det_di where dt >= '2018-04-01' and sale_ord_dt <= '2018-04-30' and sale_ord_dt >= '2018-04-01'; -- -- -- 2018年4月份的销售额GMV -- user_payable_pay_amount 用户应付金额 select sum(user_payable_pay_amount) as xiaoshoujine from app_jypt_m04_ord_det_di where dt >= '2018-04-01' and sale_ord_dt <= '2018-04-30' and sale_ord_dt >= '2018-04-01';
PS:
- 订单数就是卖了几单 ;
- 销量就是卖了多少件,一个订单中可能卖出一件或多件;
- GMV: Gross Merchandise Volume,是成交总额(一定时间段内)的意思。
- 在电商网站定义里面是网站成交金额。这个实际指的是拍下订单金额, 包含付款和未付款的部分。
2、上述这些变化量相对3月份的变化
3、计算2018年4月1号的新用户数量(之前半年未购买的用户为新用户)
-- 计算2018年4月1号的新用户数量(之前半年未购买的用户为新用户) -- 首先找出4月1号的用户的xxx,然后统计半年内有过购买记录的用户yyy。 -- select distinct user_id as xxx from gdm_m04_ord_det_sum where dt>='2018-04-01' and sale_ord_dt='2018-04-01'; -- select distinct user_id as yyy from gdm_m04_ord_det_sum where dt>='2017-10-01' and sale_ord_dt<='2018-03-31' and sale_ord_dt>='2017-10-01'; -- 用xxx-yyy,然后count()计算数量; -- 两种方法,一种用not in ,一种用not exists -- not in 方法 select distinct user_id from gdm_m04_ord_det_sum where user_id not in (select distinct user_id from gdm_m04_ord_det_sum where dt>='2017-10-01' and sale_ord_dt<='2018-03-31' and sale_ord_dt>='2017-10-01'); -- not exists 方法 select distinct user_id from gdm_m04_ord_det_sum where dt>='2018-04-01' and sale_ord_dt='2018-04-01' where not exists (select distinct user_id from gdm_m04_ord_det_sum where dt>='2017-10-01' and sale_ord_dt<='2018-03-31' and sale_ord_dt>='2017-10-01' where gdm_m04_ord_det_sum.user_id=gdm_m04_ord_det_sum.user_id); -- 另一种 left outer join 这样效率更高 语法有问题?? select distinct user_id from gdm_m04_ord_det_sum where dt>='2018-04-01' and sale_ord_dt='2018-04-01' a left outer join (select distinct user_id from gdm_m04_ord_det_sum where dt>='2017-10-01' and sale_ord_dt<='2018-03-31' and sale_ord_dt>='2017-10-01' b) on a.user_id=b.user_id where b.user_id is null;
正确方法:
select count(a.id1) as user_new_nums from ( select distinct user_id as id1 from app_jypt_m04_ord_det_di where dt >= '2018-04-01' and sale_ord_dt = '2018-04-01' ) a left outer join ( select distinct user_id as id2 from app_jypt_m04_ord_det_di where dt >= '2017-10-01' and sale_ord_dt <= '2018-03-31' and sale_ord_dt >= '2017-10-01' ) b on a.id1 = b.id2 where b.id2 is null;