MySQL查询每个店铺销售额前三的商品

1.前言

最近面试数据分析岗,两家公司都问到了这个题目,一个是用SQL查询每家店铺销售额前三的商品,一个是用Python统计每家店铺销售额前三的商品;而且在leetcode的数据库题库中,“部门工资前三高的所有员工”属于同样的类型,在所有题目中出现频率排名第一,今天先进行SQL解题方法的复盘总结。

 

2. 题目

sales表包含所有的订单信息,每个订单有对应的订单id orderid,店铺id shopid,商品id goodid,销售数量 salenum,销售单价 price,下单日期 orderdate;

shop表包含店铺信息,店铺id shopid,店铺名称 shopname;

 

goods表包含商品信息,商品id goodsid,商品名称 goodsname;

1)基础版题目:编写一个SQL查询,找出每个店铺在2020年Q1销售额(销售数据*销售数量)排名前三的商品。例如,根据上述给定的表,查询结果应返回:

2)进阶版题目:编写一个SQL查询,找出每个店铺在距今三个月内销售额(销售数据*销售数量)排名前三的商品,分列展示。例如,根据上述给定的表,查询结果应返回:

 

 

附建表语句和插入数据语句:

# 创建表sales并插入数据
CREATE TABLE `sales` 
( 
`orderid` int NOT NULL AUTO_INCREMENT, 
`shopid` int NOT NULL, 
`goodsid` int NOT NULL, 
`salenum` int NOT NULL, 
`price` int NOT NULL, 
`orderdate` date NOT NULL, 
PRIMARY KEY(`orderid`) 
);
INSERT INTO `sales` 
(`shopid`, `goodsid`, `salenum`, `price`, `orderdate`)
VALUES
(1, 10001, 1, 90, '2020-01-15'),
(1, 10002, 1, 50, '2020-02-23'),
(2, 10004, 2, 120, '2020-01-18'),
(1, 10003, 3, 60, '2020-01-19'),
(2, 10002, 1, 50, '2020-02-23'),
(1, 10002, 1, 40, '2020-03-01'),
(1, 10004, 3, 20, '2020-02-14'),
(1, 10003, 1, 10, '2020-03-01'),
(2, 10002, 1, 50, '2020-02-02'),
(2, 10001, 1, 40, '2020-02-09');

# 创建表shop并插入数据
CREATE TABLE `shop`
(
`shopid` int NOT NULL,
`shopname` varchar(10) NOT NULL
);
INSERT INTO `shop` VALUES
(1, 'SexyBaby'),
(2, 'AngelCity');

# 创建表goods并插入数据
CREATE TABLE `goods`
(
`goodsid` int NOT NULL,
`goodsname` varchar(10) NOT NULL
);
INSERT INTO `goods` VALUES
(10001, 'dress'),
(10002, 'shirt'),
(10003, 'coat'),
(10004, 'blouse');

 

 

 

 

3.使用窗口函数解题

注:MySQL从8.0版本开始支持窗口函数。

      既然要分组统计每个店铺、每个商品的数据,先回忆一下具有分组统计功能的group by 和 partition by的区别:group by具有汇总的功能,只保留参与分组的字段和聚合函数的结果; 而partition by 能够保留全部数据,只对其中某些字段做分组统计,常与排序函数连用(注意将聚合函数用在partition后的结果集上时,聚合函数是逐条累积计算值的,具体可参考博客:https://www.cnblogs.com/hello-yz/p/9962356.html)。

基础版题目解题思路:

1.使用where筛选2020年 Q1的订单数据;

2.因为一个店铺中的同一个商品可能会存在多条订单记录,所以使用groupby聚合得到每个店铺中每个商品的销售额sumprice;

3.通过使用row_number() over (partition by ……),对每个店铺内的商品销售额进行降序排序,得到每个店铺内商品的销售额排名sumprice_rank;

4.将查询的结果与shop表和goods表join,得到shopname和goodsname,再在外层使用where sumprice_rank <= 3得到每个店铺内销售额排名前三的商品。

 

SELECT shop.shopname, goods.goodsname, a.sumprice, a.sumprice_rank FROM
(SELECT shopid, goodsid, SUM(salenum * price) AS sumprice, ROW_NUMBER() OVER (PARTITION BY shopid ORDER BY SUM(salenum * price) DESC) AS sumprice_rank FROM sales WHERE orderdate > '2020-01-01' AND orderdate < '2020-03-31' GROUP BY shopid, goodsid) a
LEFT JOIN shop ON a.shopid = shop.shopid LEFT JOIN goods ON a.goodsid = goods.goodsid
WHERE a.sumprice_rank <= 3 ORDER BY shopname, sumprice_rank;

 

进阶版题目解题思路:

1. 在上一版的基础上,日期筛选条件为近三个月

2. 行转列操作,注意此处为字符型数据行转列

(日期筛选近*天/月/年参考博客:https://blog.csdn.net/weixin_33739523/article/details/85820328

行转列方法参考博客:https://www.cnblogs.com/hiwuchong/p/10080215.html

SELECT shopname,
       MAX(CASE WHEN sumprice_rank = 1 THEN t.goodsname ELSE '' END) AS goodsname1,
       MAX(CASE WHEN sumprice_rank = 2 THEN t.goodsname ELSE '' END) AS goodsname2,
       MAX(CASE WHEN sumprice_rank = 3 THEN t.goodsname ELSE '' END) AS goodsname3
FROM
(SELECT shop.shopname, goods.goodsname, a.sumprice, a.sumprice_rank FROM (SELECT shopid, goodsid, SUM(salenum * price) AS sumprice, ROW_NUMBER() OVER (PARTITION BY shopid ORDER BY SUM(salenum * price) DESC) AS sumprice_rank FROM sales WHERE DATE_SUB(CURDATE(), INTERVAL 3 MONTH) <= date(orderdate) GROUP BY shopid, goodsid) a LEFT JOIN shop ON a.shopid = shop.shopid LEFT JOIN goods ON a.goodsid = goods.goodsid WHERE a.sumprice_rank <= 3 ORDER BY shopname, sumprice_rank) t GROUP BY shopname;

 

 

4.使用基本语法解题

基础版题目期待结果集的最后一列sumpricerank,如果不使用窗口函数的话,需要赋值变量,这里先不额外展开,重点梳理使用基本语法查询分组中top值的方法。

基础版题目解题思路:

 

1.同上使用窗口函数解题思路中的1和2,先做筛选和聚合得到2020年Q1每个店铺中每个商品的销售额sumprice, 在此表基础上继续

 

2.为找每个店铺的销售额前三的商品,用上一步得到的表做自连接,连接条件是

t1.sumprice < t2.sumprice AND t1.shopid = t2.shopid

然后对满足条件的商品进行计数

COUNT(t2.goodsid) < 3

如果数量小于3,那这个商品即为店铺内销售额前三的商品;

3. 将内层查询的结果与shop表和goods表join,得到shopname和goodsname,再进行外层查询得到需要的字段。

SELECT shop.shopname, goods.goodsname, t1.sumprice
FROM
(SELECT shopid, goodsid, sum(salenum * price) AS sumprice FROM sales WHERE orderdate > '2020-01-01' and orderdate < '2020-03-31'GROUP BY shopid, goodsid) t1 LEFT JOIN shop ON t1.shopid = shop.shopid LEFT JOIN goods ON t1.goodsid = goods.goodsid
WHERE
(SELECT COUNT(t2.goodsid) FROM (SELECT shopid, goodsid, sum(salenum * price) AS sumprice FROM sales WHERE orderdate > '2020-01-01' and orderdate < '2020-03-31'GROUP BY shopid, goodsid) t2 WHERE t1.sumprice < t2.sumprice AND t1.shopid = t2.shopid) < 3
ORDER BY t1.shopid, t1.sumprice DESC;

 

本人数据分析,机器学习初学者一枚,如果任何疑问,欢迎评论区交流讨论,期待与大家共同进步。

 

posted @ 2020-03-11 17:21  sweet_landscape  阅读(13387)  评论(0编辑  收藏  举报