Hive 刷题——恶意取消订单用户统计
问题描述
给出了订单表数据,找出恶意退单用户(30 min 内两次取消订单用户明细)
示例数据
order_id user_id order_status operate_time 1101 a 已支付 2024-03-01 10:00:00 1102 a 已取消 2024-03-01 10:10:00 1103 a 待支付 2024-03-01 10:20:00 1104 b 已取消 2024-03-01 10:30:00 1105 a 待确认 2024-03-01 10:50:00 1106 a 已取消 2024-03-01 11:00:00 1107 b 已取消 2024-03-01 11:40:00 1108 b 已取消 2024-03-01 11:50:00 1109 b 已支付 2024-03-01 12:00:00 1110 b 已取消 2024-03-01 12:11:00 1111 c 已取消 2024-03-01 12:20:00 1112 c 已取消 2024-03-01 12:30:00 1113 c 已取消 2024-03-01 12:55:00 1114 c 已取消 2024-03-01 13:00:00
参考实现
with temp as (select 1101 as order_id, 'a' as user_id, "已支付" as order_status, "2023-01-01 10:00:00" as operate_time union all select 1102 as order_id, 'a' as user_id, "已取消" as order_status, "2023-01-01 10:10:00" as operate_time union all select 1103 as order_id, 'a' as user_id, "待支付" as order_status, "2023-01-01 10:20:00" as operate_time union all select 1104 as order_id, 'b' as user_id, "已取消" as order_status, "2023-01-01 10:30:00" as operate_time union all select 1105 as order_id, 'a' as user_id, "待确认" as order_status, "2023-01-01 10:50:00" as operate_time union all select 1106 as order_id, 'a' as user_id, "已取消" as order_status, "2023-01-01 11:00:00" as operate_time union all select 1107 as order_id, 'b' as user_id, "已取消" as order_status, "2023-01-01 11:40:00" as operate_time union all select 1108 as order_id, 'b' as user_id, "已取消" as order_status, "2023-01-01 11:50:00" as operate_time union all select 1109 as order_id, 'b' as user_id, "已支付" as order_status, "2023-01-01 12:00:00" as operate_time union all select 1110 as order_id, 'b' as user_id, "已取消" as order_status, "2023-01-01 12:11:00" as operate_time union all select 1111 as order_id, 'c' as user_id, "已取消" as order_status, "2023-01-01 12:20:00" as operate_time union all select 1112 as order_id, 'c' as user_id, "已取消" as order_status, "2023-01-01 12:30:00" as operate_time union all select 1113 as order_id, 'c' as user_id, "已取消" as order_status, "2023-01-01 12:55:00" as operate_time union all select 1114 as order_id, 'c' as user_id, "已取消" as order_status, "2023-01-01 13:00:00" as operate_time) select user_id from (select user_id, sum(1) over (partition by user_id order by unix_timestamp(operate_time) range between 1800 preceding and current row ) cnt from temp where order_status = '已取消') t where cnt >= 2 group by user_id;