MySQL实战总结
一、求每个分组内前五条记录
这是今天美团优选数据研发岗的其中一道面试题记录。
有一张student_subject表。记录了每个学生的课程的分数
name | subject | score |
---|---|---|
张三 | 语文 | 84 |
李四 | 数学 | 28 |
王五 | 英语 | 67 |
张三 | 数学 | 96 |
李四 | 语文 | 14 |
王五 | 语文 | 90 |
... | ... | ... |
(1) 求出每门课程的最高的分数
这个解决办法很简单, 直接分组取分组内最大值即可。
SELECT subject, max(score) AS max_score
FROM student_subject
GROUP BY subject
ORDER BY subject;
(2) 求出每门课程分数的前五名
子查询
额外增加一列表示分组内的排名。求分组内排名可以采用子查询的方式。查询条件为类型相同的, 找score大于当前值的记录总数, 在其基础上加一就是排名
SELECT t3.*
FROM (
SELECT t2.*
, (
-- 查询比当前score(t2.score)(子查询, 将此t2.score看做常量)更大的记录数,
-- 在此基础上加一就是其排名
SELECT count(*) + 1
FROM student_subject t1
WHERE t1.subject = t2.subject
AND t1.score > t2.score
) AS top
FROM student_subject t2
) t3
WHERE top <= 5
-- 按科目、分数顺序排序,
ORDER BY t3.subject, top;
还可以换一种写法如下, 但是依旧是子查询
SELECT t2.*
FROM student_subject t2
WHERE (
SELECT count(*) + 1
FROM student_subject t1
WHERE t1.subject = t2.subject
AND t1.score > t2.score
) <= 5
ORDER BY subject;
还可以修改为下一种写法
SELECT t2.*
FROM student_subject t2
WHERE EXISTS (
SELECT count(*) + 1
FROM student_subject t1
WHERE t1.subject = t2.subject
AND t1.score > t2.score
HAVING count(*) + 1 <= 5
)
ORDER BY subject;
上面这三种写法基本相同。
左连接
SELECT t3.subject, t3.name, t3.score
FROM (
SELECT t1.subject, t1.name, t1.score
FROM student_subject t1
LEFT JOIN student_subject t2
-- 左连接的条件是科目相同, 并且别人的分数不低于自己的分数(=可以让自己与自己左连接)
ON t1.subject = t2.subject
AND t1.score <= t2.score
) t3
-- 对每个科目的每个人做group by, count(*) 就是这个人的排名
GROUP BY t3.subject, t3.name, t3.score
-- 把排名前三的筛选出来
HAVING count(t3.name) <= 3
ORDER BY t3.subject, count(t3.name);
HIVE_SQL
Hive 里可以使用分析函数(窗口函数)
SELECT subject, name, score
FROM (
SELECT subject, name, score, ROW_NUMBER() OVER (PARTITION BY settop ORDER BY score DESC) AS rn
FROM student_subject
)
WHERE rn <= 5
ORDER BY subject;
二、多分组组内排序第一
这是面试字节抖音数仓团队的一道面试题。
有一张成绩表如下:
id | u_id | exam_id | course_id | score |
---|---|---|---|---|
1 | 20211013001 | 1 | 1 | 56 |
2 | 20211013001 | 2 | 1 | 64 |
3 | 20211013002 | 1 | 1 | 54 |
4 | 20211013001 | 1 | 2 | 87 |
5 | 20211013002 | 2 | 1 | 88 |
6 | 20211013001 | 2 | 2 | 55 |
7 | 20211013002 | 1 | 2 | 45 |
8 | 20211013002 | 2 | 2 | 99 |
求每个学生每一门课程的最高成绩。要求使用窗口函数实现。实现起来非常简单。如下:
SELECT t1.u_id, t1.course_id, t1.score
FROM (
SELECT u_id, course_id, score,
ROW_NUMBER() OVER (PARTITION BY u_id, course_id ORDER BY score DESC) AS rt
FROM `tb_score`
) t1
WHERE t1.rt = 1;
但是注意需要在外层再套一层SELECT, 不能直接一层SELECT ... WHERE rt = 1;