Hive练习--蚂蚁森林习题一
问题:
假设2017年1月1日开始记录低碳数据(user_low_carbon),假设2017年10月1日之前满足申领条件的用户都申领了一颗p004-胡杨,剩余的能量全部用来领取“p002-沙柳” 。统计在10月1日累计申领“p002-沙柳” 排名前10的用户信息;以及他比后一名多领了几颗沙柳。
数据的下载链接:https://files.cnblogs.com/files/yxym2016/%E8%9A%82%E8%9A%81%E6%A3%AE%E6%9E%97%E6%95%B0%E6%8D%AE.zip
得到的统计结果如下表样式:
user_id plant_count less_count(比后一名多领了几颗沙柳) u_101 1000 100 u_088 900 400 u_103 500 …
一、准备工作
1、创建表
create table user_low_carbon(user_id String,data_dt String,low_carbon int) row format delimited fields terminated by '\t'; create table plant_carbon(plant_id string,plant_name String,low_carbon int) row format delimited fields terminated by '\t';
2、加载数据
load data local inpath "/opt/module/data/low_carbon.txt" into table user_low_carbon; load data local inpath "/opt/module/data/plant_carbon.txt" into table plant_carbon;
二、解决步骤
1、统计每个用户在2017年10月1日之前收集的总碳量,并按照碳量进行倒序排序
select user_id sum(low_carbon) as sum_low_carbon from user_low_carbon where data_dt < '2017/10/1' group by user_id order by sum_low_carbon desc limit 11; a1 # 优化操作,过滤掉无关的数据,减少数据量
输出结果:
user_id sum_low_carbon u_007 1470 u_013 1430 u_008 1240 u_005 1100 u_010 1080 u_014 1060 u_011 960 u_009 930 u_006 830 u_002 659 u_004 640 u_003 620 u_001 475 u_015 290 u_012 250
2、每个用户的总碳量减去一颗胡杨的碳量,然后用剩余的碳量除以沙柳的碳量
# 获取每颗胡杨所需要的碳量 select low_carbon from plant_carbon where plant_id = 'p004'; a2 215 # 获取每颗沙柳所需要的碳量 select low_carbon from plant_carbon where plant_id = 'p002'; a3 19 select user_id,(a1.sum_low_carbon - a2.low_carbon)/a3.low_carbon as plant_count from a1,a2,a3; a4
整合一下:
select user_id,(a1.sum_low_carbon - a2.low_carbon)/a3.low_carbon as plant_count from (select user_id,sum(low_carbon) as sum_low_carbon from user_low_carbon where data_dt < '2017/10/1' group by user_id order by sum_low_carbon desc) a1, (select low_carbon from plant_carbon where plant_id = 'p004') a2, (select low_carbon from plant_carbon where plant_id = 'p002') a3; a4
输出结果:
user_id plant_count u_007 66.05263157894737 u_013 63.94736842105263 u_008 53.94736842105263 u_005 46.578947368421055 u_010 45.526315789473685 u_014 44.473684210526315 u_011 39.21052631578947 u_009 37.63157894736842 u_006 32.36842105263158 u_002 23.36842105263158 u_004 22.36842105263158 u_003 21.31578947368421 u_001 13.68421052631579 u_015 3.9473684210526314 u_012 1.8421052631578947
3、按照每个人领取的沙柳棵数进行倒序排序,并获取当前记录的下一条记录所领取的沙柳的棵数
select user_id,plant_count,lead(plant_count,1,'') over(order by plant_count desc) next_count from a4;a5
将a4带入上面的SQL语句:
select user_id,plant_count,lead(plant_count,1,'0') over(order by plant_count desc) next_count from ( select user_id,(a1.sum_low_carbon - a2.low_carbon)/a3.low_carbon as plant_count from (select user_id,sum(low_carbon) as sum_low_carbon from user_low_carbon where data_dt < '2017/10/1' group by user_id order by sum_low_carbon desc) a1, (select low_carbon from plant_carbon where plant_id = 'p004') a2, (select low_carbon from plant_carbon where plant_id = 'p002') a3) a4 limit 10;
输出结果:
user_id plant_count next_count u_007 66.05263157894737 63.94736842105263 u_013 63.94736842105263 53.94736842105263 u_008 53.94736842105263 46.578947368421055 u_005 46.578947368421055 45.526315789473685 u_010 45.526315789473685 44.473684210526315 u_014 44.473684210526315 39.21052631578947 u_011 39.21052631578947 37.63157894736842 u_009 37.63157894736842 32.36842105263158 u_006 32.36842105263158 23.36842105263158 u_002 23.36842105263158 22.36842105263158 u_004 22.36842105263158 21.31578947368421 u_003 21.31578947368421 13.68421052631579 u_001 13.68421052631579 3.9473684210526314 u_015 3.9473684210526314 1.8421052631578947 u_012 1.8421052631578947 NULL
4、将每一条记录对应的用户领取的沙柳棵数和排在他下面的用户领取的棵数,进行相减,就是当前用户比他后一名所多的棵数
select user_id,plant_count,(plant_count-next_count) less_count from a5;
将a5表带入上面的sql语句:
select user_id,plant_count,(plant_count-next_count) less_count from ( select user_id,plant_count,lead(plant_count,1,'0') over(order by plant_count desc) next_count from ( select user_id,(a1.sum_low_carbon - a2.low_carbon)/a3.low_carbon as plant_count from (select user_id,sum(low_carbon) as sum_low_carbon from user_low_carbon where data_dt < '2017/10/1' group by user_id order by sum_low_carbon desc) a1, (select low_carbon from plant_carbon where plant_id = 'p004') a2, (select low_carbon from plant_carbon where plant_id = 'p002') a3) a4) a5;
输出结果:
# 输出结果 user_id plant_count less_count u_007 66.05263157894737 2.10526315789474 u_013 63.94736842105263 10.0 u_008 53.94736842105263 7.368421052631575 u_005 46.578947368421055 1.05263157894737 u_010 45.526315789473685 1.05263157894737 u_014 44.473684210526315 5.2631578947368425 u_011 39.21052631578947 1.5789473684210549 u_009 37.63157894736842 5.263157894736835 u_006 32.36842105263158 9.000000000000004 u_002 23.36842105263158 1.0 u_004 22.36842105263158 1.05263157894737 u_003 21.31578947368421 7.6315789473684195 u_001 13.68421052631579 9.736842105263158 u_015 3.9473684210526314 2.1052631578947367 u_012 1.8421052631578947 1.8421052631578947
5、使用floor函数,对进行取整操作,我们只需在获取棵数那里进行取整即可,因为那里涉及了除的操作,所以会产生小数,所以在这里进行取整即可。
select user_id,plant_count,(plant_count-next_count) less_count from ( select user_id,plant_count,lead(plant_count,1,'0') over(order by plant_count desc) next_count from ( select user_id,floor((a1.sum_low_carbon - a2.low_carbon)/a3.low_carbon) as plant_count from (select user_id,sum(low_carbon) as sum_low_carbon from user_low_carbon where data_dt < '2017/10/1' group by user_id order by sum_low_carbon desc) a1, (select low_carbon from plant_carbon where plant_id = 'p004') a2, (select low_carbon from plant_carbon where plant_id = 'p002') a3) a4) a5 limit 10;
输出结果:
user_id plant_count less_count u_007 66 3 u_013 63 10 u_008 53 7 u_005 46 1 u_010 45 1 u_014 44 5 u_011 39 2 u_009 37 5 u_006 32 9 u_002 23 1