Hive 刷题——恶意取消订单用户统计

问题描述

给出了订单表数据,找出恶意退单用户(30 min 内两次取消订单用户明细)

示例数据

order_id    user_id    order_status     operate_time
1101         a         已支付        2024-03-01 10:00:00
1102         a         已取消        2024-03-01 10:10:00
1103         a         待支付        2024-03-01 10:20:00
1104         b         已取消        2024-03-01 10:30:00
1105         a         待确认        2024-03-01 10:50:00
1106         a         已取消        2024-03-01 11:00:00
1107         b         已取消        2024-03-01 11:40:00
1108         b         已取消        2024-03-01 11:50:00
1109         b         已支付        2024-03-01 12:00:00
1110         b         已取消        2024-03-01 12:11:00
1111         c         已取消        2024-03-01 12:20:00
1112         c         已取消        2024-03-01 12:30:00
1113         c         已取消        2024-03-01 12:55:00
1114         c         已取消        2024-03-01 13:00:00

参考实现

with temp as (select 1101 as order_id, 'a' as user_id, "已支付" as order_status, "2023-01-01 10:00:00" as operate_time
              union all
              select 1102 as order_id, 'a' as user_id, "已取消" as order_status, "2023-01-01 10:10:00" as operate_time
              union all
              select 1103 as order_id, 'a' as user_id, "待支付" as order_status, "2023-01-01 10:20:00" as operate_time
              union all
              select 1104 as order_id, 'b' as user_id, "已取消" as order_status, "2023-01-01 10:30:00" as operate_time
              union all
              select 1105 as order_id, 'a' as user_id, "待确认" as order_status, "2023-01-01 10:50:00" as operate_time
              union all
              select 1106 as order_id, 'a' as user_id, "已取消" as order_status, "2023-01-01 11:00:00" as operate_time
              union all
              select 1107 as order_id, 'b' as user_id, "已取消" as order_status, "2023-01-01 11:40:00" as operate_time
              union all
              select 1108 as order_id, 'b' as user_id, "已取消" as order_status, "2023-01-01 11:50:00" as operate_time
              union all
              select 1109 as order_id, 'b' as user_id, "已支付" as order_status, "2023-01-01 12:00:00" as operate_time
              union all
              select 1110 as order_id, 'b' as user_id, "已取消" as order_status, "2023-01-01 12:11:00" as operate_time
              union all
              select 1111 as order_id, 'c' as user_id, "已取消" as order_status, "2023-01-01 12:20:00" as operate_time
              union all
              select 1112 as order_id, 'c' as user_id, "已取消" as order_status, "2023-01-01 12:30:00" as operate_time
              union all
              select 1113 as order_id, 'c' as user_id, "已取消" as order_status, "2023-01-01 12:55:00" as operate_time
              union all
              select 1114 as order_id, 'c' as user_id, "已取消" as order_status, "2023-01-01 13:00:00" as operate_time)
select user_id
from (select user_id,
             sum(1)
                 over (partition by user_id order by unix_timestamp(operate_time) range between 1800 preceding and current row ) cnt
      from temp
      where order_status = '已取消') t
where cnt >= 2
group by user_id;

 

posted @ 2024-03-26 06:41  晓枫的春天  阅读(7)  评论(0编辑  收藏  举报