hive 向用户推荐好友收藏的电影

需求

向用户 user_id = 1 推荐其关注的人喜欢的音乐

有如下三个表
1、用户关注表
user_id
follow_id
1
7
1
5
1
4
1
3

2、collect_movies
user_id
movice_id
1
1001
1
1004
1
1002
3
1001
3
1005
3
1006
4
1004
4
1007
5
1002
5
1006
7
1005
3、电影码表
movice_id
movice_name
1001
笑傲镜湖
1002
倚天屠龙记
1003
神雕侠侣
1004
孔雀翎
1005
流星蝴蝶剑
1006
萍踪侠影
1007
侠客行

数据准备

use default;
set mapreduce.map.memory.mb=81920;
set mapreduce.reduce.memory.mb=81920;
create table user_follow_0207
(
    user_id     int,
    follower_id int
) row format delimited
    fields terminated by '\t';

insert into  user_follow_0207 values (1, 7);
insert into  user_follow_0207 values (1, 4);
insert into  user_follow_0207 values (1, 5);
insert into  user_follow_0207 values (1, 3);

create table collect_movies_0207
(
    user_id  int,
    movice_id int
) row format delimited
    fields terminated by '\t';

insert into collect_movies_0207 values (1  ,1001);
insert into collect_movies_0207 values (1  ,1004);
insert into collect_movies_0207 values (1  ,1002);
insert into collect_movies_0207 values (3  ,1001);
insert into collect_movies_0207 values (3  ,1005);
insert into collect_movies_0207 values (3  ,1006);
insert into collect_movies_0207 values (4  ,1004);
insert into collect_movies_0207 values (4  ,1007);
insert into collect_movies_0207 values (5  ,1002);
insert into collect_movies_0207 values (5  ,1006);
insert into collect_movies_0207 values (7  ,1005);
 create table movies
(
    id         int,
    muvie_name varchar(50)
) row format delimited
    fields terminated by '\t';

insert into movies values (1001, "笑傲镜湖");
insert into movies values (1002    ,"倚天屠龙记");
insert into movies values (1003    ,"神雕侠侣");
insert into movies values (1004    ,"孔雀翎");
insert into movies values (1005,"流星蝴蝶剑");
insert into movies values (1006    ,"萍踪侠影");
insert into movies values (1007,"侠客行");

需求处理

 1、首先将收藏表和电影码表进行关联

select t.user_id,t.movice_id,t1.muvie_name 
from collect_movies_0207 t
         left join movies t1
                   on t.movice_id = t1.id;
user_id
movice_id
muvie_name
1
1001
笑傲镜湖
1
1004
孔雀翎
7
1005
流星蝴蝶剑
1
1002
倚天屠龙记
3
1001
笑傲镜湖
3
1005
流星蝴蝶剑
3
1006
萍踪侠影
4
1004
孔雀翎
4
1007
侠客行
5
1002
倚天屠龙记
5
1006
萍踪侠影

 

2、拿用户关注表和第一步处理的结果集进行关联 
select *
from user_follow_0207 t
         left join
     (select t.user_id, t.movice_id, t1.music_name
      from collect_movies_0207 t
               left join movies t1
                         on t.movice_id = t1.id) t1
     on t.follower_id = t1.user_id
user_id
follow_id
user_id
movice_id
muvie_name
1
7
7
1005
流星蝴蝶剑
1
4
4
1004
孔雀翎
1
4
4
1007
侠客行
1
5
5
1002
倚天屠龙记
1
5
5
1006
萍踪侠影
1
3
3
1001
笑傲镜湖
1
3
3
1006
萍踪侠影
1
3
3
1005
流星蝴蝶剑

 3、根据步骤2结果集,向用户ID为1推荐好友,采用collect_set()函数进行聚合,通过collect_set()函数也可以将重复的电影进行去重 

select t.user_id,
concat_ws("-", collect_set(t1.muvie_name)) muvie_names
from (select * from user_follow_0207 where user_id = 1) t
         left join
     (select t.user_id, t.movice_id, u.muvie_name
      from collect_movies_0207 t
               left join movies u on t.movice_id = u.id) t1
     on t.follower_id = t1.user_id
group by t.user_id;
user_id
muvie_names
1
笑傲镜湖,萍踪侠影,倚天屠龙记,侠客行,孔雀翎,流星蝴蝶剑

小结

 

这个需求主要是考察了 collect_set函数的用法。

posted @ 2022-02-08 14:57  晓枫的春天  阅读(84)  评论(0编辑  收藏  举报