hive 剔除极值后取均值问题
需求描述
有一张记录用户购物信息表,第一个字段 shop_id 订单ID,第二个字段 commodity_id 物品ID,第三个字段 sale 物品金额,
110 1 10 订单ID110 物品ID1 金额10
现在要求出每个订单中剔除物品金额最大最小后的平均值。
原始数据
shop_id
|
commodity_id
|
sale
|
110
|
1
|
10
|
110
|
2
|
20
|
110
|
3
|
30
|
110
|
4
|
50
|
110
|
5
|
60
|
110
|
6
|
20
|
110
|
7
|
80
|
111
|
1
|
90
|
111
|
2
|
80
|
111
|
3
|
50
|
111
|
4
|
70
|
111
|
5
|
20
|
111
|
6
|
10
|
解题思路
根据需求描述可以得知 需要按订单分组,并且每个订单中的最大值和最小值要剔除然后再取平均值
1、先找出订单中的极值
select *, dense_rank() over (partition by shop_id order by sale) rn, dense_rank() over (partition by shop_id order by sale desc) rn_ from temp_shop_info;
shop_id
|
commodity_id
|
sale
|
rn
|
rn_
|
110
|
7
|
80
|
6
|
1
|
110
|
5
|
60
|
5
|
2
|
110
|
4
|
50
|
4
|
3
|
110
|
3
|
30
|
3
|
4
|
110
|
2
|
20
|
2
|
5
|
110
|
6
|
20
|
2
|
5
|
110
|
1
|
10
|
1
|
6
|
111
|
1
|
90
|
6
|
1
|
111
|
2
|
80
|
5
|
2
|
111
|
4
|
70
|
4
|
3
|
111
|
3
|
50
|
3
|
4
|
111
|
5
|
20
|
2
|
5
|
111
|
6
|
10
|
1
|
6
|
2、剔除极值后的结果如下
select * from (select *, dense_rank() over (partition by shop_id order by sale) rn, dense_rank() over (partition by shop_id order by sale desc) rn_ from temp_shop_info) u where rn > 1 and rn_ > 1;
shop_id
|
commodity_id
|
sale
|
rn
|
rn_
|
110
|
5
|
60
|
5
|
2
|
110
|
4
|
50
|
4
|
3
|
110
|
3
|
30
|
3
|
4
|
110
|
2
|
20
|
2
|
5
|
110
|
6
|
20
|
2
|
5
|
111
|
2
|
80
|
5
|
2
|
111
|
4
|
70
|
4
|
3
|
111
|
3
|
50
|
3
|
4
|
111
|
5
|
20
|
2
|
5
|
3、数据范围确定后,根据要求取出均值
select shop_id, cast(avg(sale) as decimal(10, 2)) avg_sale from (select *, dense_rank() over (partition by shop_id order by sale) rn, dense_rank() over (partition by shop_id order by sale desc) rn_ from temp_shop_info) u where rn > 1 and rn_ > 1 group by shop_id;
shop_id
|
avg_sale
|
111
|
55
|
110
|
36
|