MySQL查询每个店铺销售额前三的商品
1.前言
最近面试数据分析岗,两家公司都问到了这个题目,一个是用SQL查询每家店铺销售额前三的商品,一个是用Python统计每家店铺销售额前三的商品;而且在leetcode的数据库题库中,“部门工资前三高的所有员工”属于同样的类型,在所有题目中出现频率排名第一,今天先进行SQL解题方法的复盘总结。
2. 题目
sales表包含所有的订单信息,每个订单有对应的订单id orderid,店铺id shopid,商品id goodid,销售数量 salenum,销售单价 price,下单日期 orderdate;
shop表包含店铺信息,店铺id shopid,店铺名称 shopname;
goods表包含商品信息,商品id goodsid,商品名称 goodsname;
1)基础版题目:编写一个SQL查询,找出每个店铺在2020年Q1销售额(销售数据*销售数量)排名前三的商品。例如,根据上述给定的表,查询结果应返回:
2)进阶版题目:编写一个SQL查询,找出每个店铺在距今三个月内销售额(销售数据*销售数量)排名前三的商品,分列展示。例如,根据上述给定的表,查询结果应返回:
附建表语句和插入数据语句:
# 创建表sales并插入数据 CREATE TABLE `sales` ( `orderid` int NOT NULL AUTO_INCREMENT, `shopid` int NOT NULL, `goodsid` int NOT NULL, `salenum` int NOT NULL, `price` int NOT NULL, `orderdate` date NOT NULL, PRIMARY KEY(`orderid`) ); INSERT INTO `sales` (`shopid`, `goodsid`, `salenum`, `price`, `orderdate`) VALUES (1, 10001, 1, 90, '2020-01-15'), (1, 10002, 1, 50, '2020-02-23'), (2, 10004, 2, 120, '2020-01-18'), (1, 10003, 3, 60, '2020-01-19'), (2, 10002, 1, 50, '2020-02-23'), (1, 10002, 1, 40, '2020-03-01'), (1, 10004, 3, 20, '2020-02-14'), (1, 10003, 1, 10, '2020-03-01'), (2, 10002, 1, 50, '2020-02-02'), (2, 10001, 1, 40, '2020-02-09'); # 创建表shop并插入数据 CREATE TABLE `shop` ( `shopid` int NOT NULL, `shopname` varchar(10) NOT NULL ); INSERT INTO `shop` VALUES (1, 'SexyBaby'), (2, 'AngelCity'); # 创建表goods并插入数据 CREATE TABLE `goods` ( `goodsid` int NOT NULL, `goodsname` varchar(10) NOT NULL ); INSERT INTO `goods` VALUES (10001, 'dress'), (10002, 'shirt'), (10003, 'coat'), (10004, 'blouse');
3.使用窗口函数解题
注:MySQL从8.0版本开始支持窗口函数。
既然要分组统计每个店铺、每个商品的数据,先回忆一下具有分组统计功能的group by 和 partition by的区别:group by具有汇总的功能,只保留参与分组的字段和聚合函数的结果; 而partition by 能够保留全部数据,只对其中某些字段做分组统计,常与排序函数连用(注意将聚合函数用在partition后的结果集上时,聚合函数是逐条累积计算值的,具体可参考博客:https://www.cnblogs.com/hello-yz/p/9962356.html)。
基础版题目解题思路:
1.使用where筛选2020年 Q1的订单数据;
2.因为一个店铺中的同一个商品可能会存在多条订单记录,所以使用groupby聚合得到每个店铺中每个商品的销售额sumprice;
3.通过使用row_number() over (partition by ……),对每个店铺内的商品销售额进行降序排序,得到每个店铺内商品的销售额排名sumprice_rank;
4.将查询的结果与shop表和goods表join,得到shopname和goodsname,再在外层使用where sumprice_rank <= 3得到每个店铺内销售额排名前三的商品。
SELECT shop.shopname, goods.goodsname, a.sumprice, a.sumprice_rank FROM
(SELECT shopid, goodsid, SUM(salenum * price) AS sumprice, ROW_NUMBER() OVER (PARTITION BY shopid ORDER BY SUM(salenum * price) DESC) AS sumprice_rank FROM sales WHERE orderdate > '2020-01-01' AND orderdate < '2020-03-31' GROUP BY shopid, goodsid) a
LEFT JOIN shop ON a.shopid = shop.shopid LEFT JOIN goods ON a.goodsid = goods.goodsid
WHERE a.sumprice_rank <= 3 ORDER BY shopname, sumprice_rank;
进阶版题目解题思路:
1. 在上一版的基础上,日期筛选条件为近三个月
2. 行转列操作,注意此处为字符型数据行转列
(日期筛选近*天/月/年参考博客:https://blog.csdn.net/weixin_33739523/article/details/85820328
行转列方法参考博客:https://www.cnblogs.com/hiwuchong/p/10080215.html)
SELECT shopname, MAX(CASE WHEN sumprice_rank = 1 THEN t.goodsname ELSE '' END) AS goodsname1, MAX(CASE WHEN sumprice_rank = 2 THEN t.goodsname ELSE '' END) AS goodsname2, MAX(CASE WHEN sumprice_rank = 3 THEN t.goodsname ELSE '' END) AS goodsname3 FROM
(SELECT shop.shopname, goods.goodsname, a.sumprice, a.sumprice_rank FROM (SELECT shopid, goodsid, SUM(salenum * price) AS sumprice, ROW_NUMBER() OVER (PARTITION BY shopid ORDER BY SUM(salenum * price) DESC) AS sumprice_rank FROM sales WHERE DATE_SUB(CURDATE(), INTERVAL 3 MONTH) <= date(orderdate) GROUP BY shopid, goodsid) a LEFT JOIN shop ON a.shopid = shop.shopid LEFT JOIN goods ON a.goodsid = goods.goodsid WHERE a.sumprice_rank <= 3 ORDER BY shopname, sumprice_rank) t GROUP BY shopname;
4.使用基本语法解题
基础版题目期待结果集的最后一列sumpricerank,如果不使用窗口函数的话,需要赋值变量,这里先不额外展开,重点梳理使用基本语法查询分组中top值的方法。
基础版题目解题思路:
1.同上使用窗口函数解题思路中的1和2,先做筛选和聚合得到2020年Q1每个店铺中每个商品的销售额sumprice, 在此表基础上继续;
2.为找每个店铺的销售额前三的商品,用上一步得到的表做自连接,连接条件是
t1.sumprice < t2.sumprice AND t1.shopid = t2.shopid
然后对满足条件的商品进行计数
COUNT(t2.goodsid) < 3
如果数量小于3,那这个商品即为店铺内销售额前三的商品;
3. 将内层查询的结果与shop表和goods表join,得到shopname和goodsname,再进行外层查询得到需要的字段。
SELECT shop.shopname, goods.goodsname, t1.sumprice FROM
(SELECT shopid, goodsid, sum(salenum * price) AS sumprice FROM sales WHERE orderdate > '2020-01-01' and orderdate < '2020-03-31'GROUP BY shopid, goodsid) t1 LEFT JOIN shop ON t1.shopid = shop.shopid LEFT JOIN goods ON t1.goodsid = goods.goodsid
WHERE
(SELECT COUNT(t2.goodsid) FROM (SELECT shopid, goodsid, sum(salenum * price) AS sumprice FROM sales WHERE orderdate > '2020-01-01' and orderdate < '2020-03-31'GROUP BY shopid, goodsid) t2 WHERE t1.sumprice < t2.sumprice AND t1.shopid = t2.shopid) < 3
ORDER BY t1.shopid, t1.sumprice DESC;
本人数据分析,机器学习初学者一枚,如果任何疑问,欢迎评论区交流讨论,期待与大家共同进步。