hive 向用户推荐好友收藏的电影
需求
向用户 user_id = 1 推荐其关注的人喜欢的音乐
有如下三个表
1、用户关注表
user_id
|
follow_id
|
1
|
7
|
1
|
5
|
1
|
4
|
1
|
3
|
2、collect_movies
user_id
|
movice_id
|
1
|
1001
|
1
|
1004
|
1
|
1002
|
3
|
1001
|
3
|
1005
|
3
|
1006
|
4
|
1004
|
4
|
1007
|
5
|
1002
|
5
|
1006
|
7
|
1005
|
3、电影码表
movice_id
|
movice_name
|
1001
|
笑傲镜湖
|
1002
|
倚天屠龙记
|
1003
|
神雕侠侣
|
1004
|
孔雀翎
|
1005
|
流星蝴蝶剑
|
1006
|
萍踪侠影
|
1007
|
侠客行
|
数据准备
use default; set mapreduce.map.memory.mb=81920; set mapreduce.reduce.memory.mb=81920; create table user_follow_0207 ( user_id int, follower_id int ) row format delimited fields terminated by '\t'; insert into user_follow_0207 values (1, 7); insert into user_follow_0207 values (1, 4); insert into user_follow_0207 values (1, 5); insert into user_follow_0207 values (1, 3); create table collect_movies_0207 ( user_id int, movice_id int ) row format delimited fields terminated by '\t'; insert into collect_movies_0207 values (1 ,1001); insert into collect_movies_0207 values (1 ,1004); insert into collect_movies_0207 values (1 ,1002); insert into collect_movies_0207 values (3 ,1001); insert into collect_movies_0207 values (3 ,1005); insert into collect_movies_0207 values (3 ,1006); insert into collect_movies_0207 values (4 ,1004); insert into collect_movies_0207 values (4 ,1007); insert into collect_movies_0207 values (5 ,1002); insert into collect_movies_0207 values (5 ,1006); insert into collect_movies_0207 values (7 ,1005); create table movies ( id int, muvie_name varchar(50) ) row format delimited fields terminated by '\t'; insert into movies values (1001, "笑傲镜湖"); insert into movies values (1002 ,"倚天屠龙记"); insert into movies values (1003 ,"神雕侠侣"); insert into movies values (1004 ,"孔雀翎"); insert into movies values (1005,"流星蝴蝶剑"); insert into movies values (1006 ,"萍踪侠影"); insert into movies values (1007,"侠客行");
需求处理
1、首先将收藏表和电影码表进行关联
select t.user_id,t.movice_id,t1.muvie_name from collect_movies_0207 t left join movies t1 on t.movice_id = t1.id;
user_id
|
movice_id
|
muvie_name
|
1
|
1001
|
笑傲镜湖
|
1
|
1004
|
孔雀翎
|
7
|
1005
|
流星蝴蝶剑
|
1
|
1002
|
倚天屠龙记
|
3
|
1001
|
笑傲镜湖
|
3
|
1005
|
流星蝴蝶剑
|
3
|
1006
|
萍踪侠影
|
4
|
1004
|
孔雀翎
|
4
|
1007
|
侠客行
|
5
|
1002
|
倚天屠龙记
|
5
|
1006
|
萍踪侠影
|
2、拿用户关注表和第一步处理的结果集进行关联
select * from user_follow_0207 t left join (select t.user_id, t.movice_id, t1.music_name from collect_movies_0207 t left join movies t1 on t.movice_id = t1.id) t1 on t.follower_id = t1.user_id
user_id
|
follow_id
|
user_id
|
movice_id
|
muvie_name
|
1
|
7
|
7
|
1005
|
流星蝴蝶剑
|
1
|
4
|
4
|
1004
|
孔雀翎
|
1
|
4
|
4
|
1007
|
侠客行
|
1
|
5
|
5
|
1002
|
倚天屠龙记
|
1
|
5
|
5
|
1006
|
萍踪侠影
|
1
|
3
|
3
|
1001
|
笑傲镜湖
|
1
|
3
|
3
|
1006
|
萍踪侠影
|
1
|
3
|
3
|
1005
|
流星蝴蝶剑
|
3、根据步骤2结果集,向用户ID为1推荐好友,采用collect_set()函数进行聚合,通过collect_set()函数也可以将重复的电影进行去重
select t.user_id, concat_ws("-", collect_set(t1.muvie_name)) muvie_names from (select * from user_follow_0207 where user_id = 1) t left join (select t.user_id, t.movice_id, u.muvie_name from collect_movies_0207 t left join movies u on t.movice_id = u.id) t1 on t.follower_id = t1.user_id group by t.user_id;
user_id
|
muvie_names
|
1
|
笑傲镜湖,萍踪侠影,倚天屠龙记,侠客行,孔雀翎,流星蝴蝶剑
|
小结
这个需求主要是考察了 collect_set函数的用法。