SQL Basic Notes
🥥 Table of Content
Basic SQL
- 01 - Order of SQL Execution
- 02 - JOIN
- 03 - WHERE & HAVING
- 04 - LIKE & REGEXP
- 05 - GROUP BY
- 06 - Aggregate Function(5)
- 07 - DISTINCT
Intermediate SQL
- 01 - Window Function
- 02 - Handling NULL value
- 03 - Handling DATETIME value
- 04 - Handling VARCHAR value
Advanced SQL
- Query Optimization
- Complex Function
- Recursive CTE
Interview Hot Questions
🥑 Get Started!
Basic SQL
01 - Order of SQL execution
FROM(JOIN) |
Choose and join tables to get base data |
---|---|
WHERE |
Filters the base data (by rows) |
GROUP BY |
Aggregates the base data |
HAVING |
Filters the aggregated data |
SELECT |
Returns the final data |
ORDER BY |
Sorts the final data |
LIMIT |
Limits the returned data to a row count |
02 - JOIN
Resource 1: Visual JOINs
Resource 2: SQL JOINs Cheatsheet | Datacamp
- <1> CROSS JOIN
- <2> 4 Types of JOIN, based on the CROSS JOIN
- <3> INNER JOIN vs. CROSS JOIN + WHERE
- <4> ON vs. WHERE
- <5> SELF JOIN
JOIN
的最终结果是生成临时表
<1> CROSS JOIN
CROSS JOIN
:单纯将table a和table b的每一条记录连接
<2> 4 Types of JOIN, based on the CROSS JOIN
Step1: 将table a和table b的每一条记录连接
Step2: 根据ON
条件进行筛选
INNER JOIN
:根据ON clause返回intersection area,不具有保留左右表的功能。可以理解为左表 右表LEFT JOIN
:返回左表的所有信息、右表中与ON
匹配的信息(不匹配的信息返回NULL)。可以理解为intersection 左表。RIGHT JOIN
:与上述相似FULL JOIN
:返回根据ON
匹配后的左右表共有信息 + 保留左右表的所有。可以理解为左表 右表。
<3> INNER JOIN vs. CROSS JOIN + WHERE
## Scenario 1
# CROSS JOIN + WHERE
SELECT *
FROM table1 t1, table2 t2
WHERE t1.price = t2.price
# INNER JOIN
SELECT *
FROM table1 t1 JOIN table2 t2 ON t1.price = t2.price
# LEFT JOIN
SELECT *
FROM table1 t1 LEFT JOIN table2 t2 ON t1.price = t2.price
Now, there are two tables, the goal for you is to find the product_id, price and its product_name.
If we use CROSS JOIN
+ WHERE
, the process and result is like...
If we use INNER JOIN
, the process and result is like...
If we compare with other JOIN
, the process and result is like...
<4> ON vs. WHERE
ON
是在生成临时表时使用的条件,会根据JOIN
类型进行保留WHERE
是生成临时表后使用的条件,整行进行筛选,不考虑保留any other info from 左右表
## Scenario 2
# CROSS JOIN + WHERE
SELECT *
FROM table1 t1, table2 t2
WHERE t1.price = t2.price AND t2.product_name='A'
# INNER JOIN
SELECT *
FROM table1 t1 JOIN table2 t2 ON t1.price = t2.price AND t2.product_name='A'
# INNER JOIN
SELECT *
FROM table1 t1 JOIN table2 t2 ON t1.price = t2.price
WHERE t2.product_name='A'
# LEFT JOIN
SELECT *
FROM table1 t1 LEFT JOIN table2 t2 ON t1.price = t2.price AND t2.product_name='A'
# LEFT JOIN
SELECT *
FROM table1 t1 LEFT JOIN table2 t2 ON t1.price = t2.price
WHERE t2.product_name='A'
Now, there are two tables, the goal for you is to find the product_id and price which product name is A.
If we use CROSS JOIN
+ WHERE
, the process and result is like...
If we use INNER JOIN
and put the WHERE
clause on the ON
clause, the process and result is like...
If we use INNER JOIN
+ WHERE
, the process and result is like...
If we use other JOIN
and put the WHERE
clause on the ON
clause, the process and result is like...
If we use other JOIN
+ WHERE
, the process and result is like...
<5> SELF JOIN
- 经理问题(谁归谁管,此时只有一个员工表)
Employees Earning More Than Their Managers - 时间错位(同列不同行的比较)-- 优化方法:LAG( ) / LEAD( )
03 - WHERE & HAVING
WHERE
是筛选分组前的数据,HAVING
是筛选分组后的数据WHERE
和HAVING
后必须是原始列名,不能是列的别名(AS
后的别名)WHERE
和HAVING
后跟的是condition(condition是=
,!=
,>
,>=
,<
,<=
,LIKE
,NOT LIKE
,IS NULL
,IS NOT NULL
与OR
,AND
的组合)WHERE
的condition不能包含Aggregate_function()
HAVING
的condition只能包含Aggregate_function()
或GROUP BY
后的字段
04 - LIKE & REGEXP
Resource: MySQL Regular Expression | 菜鸟教程
LIKE 'c%'
: 以c开头的LIKE '%c'
: 以c结尾的LIKE '%c%'
: 包含c的(不限位置)LIKE '%%'
: 包含''的
SELECT *
FROM Customers
- WHERE Name LIKE 'F%'
- WHERE Name REGEXP '^F'
05 - GROUP BY
GROUP BY
按column_name 或 (column_name_1, ... , column_name_N)对数据进行分组,假想压缩成一行数据GROUP BY
搭配SELECT
Aggregate_fuction(column_name)
,根据Aggregate_fuction()
输出一条记录GROUP BY
不搭配SELECT
Aggregate_fuction(column_name)
,默认取分组结果的第一条记录GROUP BY
筛选的不是完整的一行记录,WHERE
筛选的是完整的一行记录
06 - Five Aggregate Function(5)
<1>Common Aggregate Function
MIN(column_name)
: 可对文本、日期进行操作MAX(column_name)
: 同上SUM(column_name)
AVG(column_name)
COUNT(column_name)
,COUNT(*)
,COUNT(1)
<2>AVG(column_name)
vs. SUM(column_name)
/COUNT(column_name)
SELECT
meal_category,
AVG(no_of_views_lst_mth) AS average_views
FROM recipes
GROUP BY meal_category
Hence, for the real value, we need to ...
- Change the method to SUM(column_name)/COUNT(*)
or - Convert the NULL value to 0
But you need to double check the value if it equals to real value.
07 - DISTINCT
使用范围:SELECT clause
用法一:SELECT DISTINCT column_name -- 返回一列不同名称
用法二:SELECT COUNT(DISTINCT column_name) -- 统计该列不同的名称个数
用法三:SELECT COUNT(DISTINCT column_name1,column_name2) -- 统计该两列不同的组合名称个数
Intermediate SQL
01 - Window Function
- <1> Basic Syntax & Key Point
- <2> ROW_NUMBER() & DENSE_RANK() & RANK()
- <3> Aggeragate Function in Window Function
- <4> LAG() & LEAD()
<1> Basic Syntax & Key Point
<window_function> OVER(PARTITION BY <expr_clause> ORDER BY <order_clause> <frame_clause>)
Key Point: 窗口函数的核心思想是新增一列
Note 1:frame_clause不适用于窗口排名函数,适用于窗口聚合函数。
Note 2:PARTITION BY
在窗口排名函数和窗口聚合函数中都可省略。
<2> ROW_NUMBER() & DENSE_RANK() & RANK()
ROW_NUMBER() OVER(PARTITION BY <partition_clause> ORDER BY <order_clause>)
DENSE_RANK() OVER(PARTITION BY <partition_clause> ORDER BY <order_clause>)
RANK() OVER(PARTITION BY <partition_clause> ORDER BY <order_clause>)
There are three methods for creating the ranking column in Window Function.
value | ROW_NUMBER() | DENSE_RANK() | RANK() |
---|---|---|---|
5 | 1 | 1 | 1 |
4.9 | 2 | 2 | 2 |
4.9 | 3 | 2 | 2 |
4.8 | 4 | 3 | 4 |
If there is no ties, the three methods are same for rank.
ROW_NUMBER()
: It's like index.DENSE_RANK()
: e.g. return employee names which has third high salary(4.8).RANK()
💡 窗口排名函数中PARTITION BY
的细节
- 窗口排名函数,不可省略
ORDER BY
。 - 窗口排名函数不使用
PARTITION BY
进行不分组的总体排名,并打印排名。 - 窗口排名函数使用
PARTITION BY
进行分组的组内排名,并打印排名。 - 外部
ORDER BY
和窗口排名函数DENSE_RANK()
中不使用PARTITION BY
的ORDER BY
输出的总体数据排序相同,后者多了一列排名。 - 使用窗口函数后如需继续使用
WHERE
或HAVING
筛选行,必须外层嵌套,因为窗口函数的操作顺序在WHERE
和GROUP BY
之后。
<3> Aggeragate Function in Window Function
💡 窗口聚合函数中ORDER BY
的细节
- 无
ORDER BY
,无<frame_clause>:窗口聚合范围 默认组内所有行 - 有
ORDER BY
,无<frame_clause>:窗口聚合范围 默认从第一行到当前行 - 有
ORDER BY
,有<frame_clause>:窗口聚合范围 <frame_clause>
i.窗口聚合范围:从第一行到当前行
“累计”:累计求和、累计计数、累计平均数、累计最小值、累计最大值
<aggeragate_function> OVER(ORDER BY <order_clause>)
<aggeragate_function> OVER(ORDER BY <order_clause> ROWS UNBOUNDED PRECEDING AND CURRENT ROW)
ii.窗口聚合范围:组内所有行
<aggeragate_function> OVER()
<aggeragate_function> OVER(ORDER BY <order_clause> ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)
iii.窗口聚合范围:frame_clause
<aggeragate_function> OVER(ORDER BY <order_clause> <frame_clause>)
<4> LAG() & LEAD()
LAG(column_name, offset) OVER(PARTITION BY <partition_clause> ORDER BY <order_clause>)
LEAD(column_name, offset) OVER(PARTITION BY <partition_clause> ORDER BY <order_clause>)
SELECT
year,
total_sale,
LAG(total_sale) OVER(ORDER BY year) AS previous_year_total_sale,
total_sale - LAG(total_sale) OVER(ORDER BY year) AS total_sale_difference
FROM sales
'''
time complexity: O(N)
space complexity: O(N)
'''
收益率 =


💡Compared with Self Join
SELECT
t1.year,
t1.total_sale - COALESCE(t2.total_sale,0) AS total_sale_diff
FROM sales t1 LEFT JOIN sales t2 ON t1.year=t2.year+INTERVAL 1 DAY
ORDER BY t1.year
'''
time complexity: O(NlogN)
space complexity: O(NlogN)
'''
02 - Handle NULL value
CASE WHEN column_name IS NULL THEN 0 ELSE column_name END
COALESCE(first_choice_value, second_choice_value)
,依次考虑是否为NULL
,非NULL
就返回value
,都是NULL
最终返回NULL
IFNULL(x,value)
# is null条件判断
SELECT
meal_category,
AVG(CASE WHEN no_of_views_lst_mth IS NULL
THEN 0
ELSE no_of_views_lst_mth END) AS average_views
FROM recipes
GROUP BY meal_category
# coalesce转换
SELECT
meal_category,
AVG(COALESCE(no_of_views_lst_mth, 0)) AS average_views
FROM recipes
GROUP BY meal_category
# ifnull转换
SELECT
meal_category,
AVG(IFNULL(no_of_views_lst_mth, 0)) AS average_views
FROM recipes
GROUP BY meal_category
03 - Handling DATETIME value
YEAR(column_name)
TIMESTAMPDIFF(unit,start,end)
DATEDIFF(column_name)
04 - Handling VARCHAR value
VARCHAR_LENGTH()
LENGTH()
SUM IF & COUNT IF
对条件进行计数:SUM(条件)等价于SUM(if(条件,1,0))
对条件进行计数:COUNT(IF(条件,1,null))
注:COUNT(条件)等价于COUNT(条件,true,null),条件满足返回true,否则返回null
DATEDIFF()
DATEDIFF(x1,x2) -- 返回x1-x2的时间差值,其中x1,x2为column_name或时间字符串
IFNULL()
IFNULL(x,value) -- 如果x为NULL,返回value
注:指标计算中,需要外部加IFNULL()。例如IFNULL(COUNT()/COUNT(),0)
ROUND()
ROUND(x,d):四舍五入保留x的d位小数
-
查询近30天活跃用户数
-
好友申请 I:总体通过率
Interview Hot Questions
Top N
最大值问题的三种思路
Solution1: Subquery(WHERE
is not required)
SELECT ...
FROM ...
(WHERE) ...
GROUP BY 关键字段
HAVING Aggregate_function(字段) >= ALL(SELECT Aggregate_function(字段)
FROM ...
(WHERE) ...
GROUP BY 关键字段)
)
Solution2: Windows_function overall rank(do not use PARTITION BY
)
SELECT ...
FROM (SELECT *,
<窗口排名函数> OVER(ORDER BY 字段 DESC) AS ranking
FROM ...
)cte
WHERE ranking = 1
Solution3: ORDER BY
(only for the simple question)
ORDER BY 字段 DESC,LIMIT 1
最小值问题的三种思路
Solution1: Subquery(WHERE
is not required)
SELECT ...
FROM ...
(WHERE) ...
GROUP BY 关键字段
HAVING Aggregate_function(字段) <= ALL(SELECT Aggregate_function(字段)
FROM ...
(WHERE) ...
GROUP BY 关键字段)
)
Solution2: Windows function overall rank(do not use PARTITION BY
)
SELECT ...
FROM (SELECT *,
<窗口排名函数> OVER(ORDER BY 字段 ASC) AS ranking
FROM ...
)cte
WHERE ranking = 1
Solution3: ORDER BY (only for the simple question)
ORDER BY 字段 ASC,LIMIT 1
Ratios
用户留存率Users Retention Rate
复购率
Consecutive
Types of Problems
- 用户最长连续登录的天数
- Consecutive N Days Login Users 连续登录N天的用户数
- 连续N天交易额超过X的店铺数
Solution
- 自连接解法
- ROW_NUMBER()常规解法
- LEAD()滑动窗口解法
Median
# Solution 1: 正反序
WITH cte AS(
SELECT
*,
ROW_NUMBER() OVER(PARTITION BY company ORDER BY salary,id ASC) AS rnk1,
ROW_NUMBER() OVER(PARTITION BY company ORDER BY salary DESC,id DESC) AS rnk2,
COUNT(id) OVER(PARTITION BY company) AS cnt
FROM
Employee
)
SELECT
id,
company,
salary
FROM cte
WHERE
rnk1>=cnt/2 AND rnk2>=cnt/2
# Solution 2: 讨论排序和n的关系
SELECT
Id,Company,Salary
FROM
(
SELECT
a.*,
ROW_NUMBER() OVER(PARTITION BY Company ORDER BY Salary) AS rnk,
COUNT(Salary) OVER(PARTITION BY Company) AS cnt
FROM Employee a
)t
WHERE rnk IN (cnt/2,cnt/2+1,(cnt+1)/2)
【推荐】编程新体验,更懂你的AI,立即体验豆包MarsCode编程助手
【推荐】凌霞软件回馈社区,博客园 & 1Panel & Halo 联合会员上线
【推荐】抖音旗下AI助手豆包,你的智能百科全书,全免费不限次数
【推荐】博客园社区专享云产品让利特惠,阿里云新客6.5折上折
【推荐】轻量又高性能的 SSH 工具 IShell:AI 加持,快人一步
· DeepSeek “源神”启动!「GitHub 热点速览」
· 微软正式发布.NET 10 Preview 1:开启下一代开发框架新篇章
· 我与微信审核的“相爱相杀”看个人小程序副业
· C# 集成 DeepSeek 模型实现 AI 私有化(本地部署与 API 调用教程)
· spring官宣接入deepseek,真的太香了~