CMU15-445/645 2021 HW1
Q1
要求从Category表中按照字母顺序列出所有CategoryName内容
sqlite> SELECT CategoryName FROM Category ORDER BY CategoryName;
Beverages
Condiments
Confections
Dairy Products
Grains/Cereals
Meat/Poultry
Produce
Seafood
Q2
要求从Order表中按照字母顺序列出所有包含-的内容,且不能有重复内容
- 要求必须包含-,那么我们需要like '%-%'来筛选
- 不能有重复内容我们可以用DISTINCT关键字
- 字母顺序使用ORDER BY关键字
坑点是查询的时候记得FROM "Order"
,因为Order是关键字,不加引号会出现语法错误
sqlite> SELECT DISTINCT ShipName FROM "Order" WHERE ShipName like '%-%' ORDER BY ShipName;
Bottom-Dollar Markets
Chop-suey Chinese
GROSELLA-Restaurante
HILARION-Abastos
Hungry Owl All-Night Grocers
LILA-Supermercado
LINO-Delicateses
QUICK-Stop
Save-a-lot Markets
Q3
要求从Order表中查询Id和ShipCountry按照id递增输出从15445开始的20条信息,并且如果ShipCountry是北美的三个国家('USA', 'Mexico', 'Canada')那么显示'NorthAmerica',否则是'OtherPlace'
- 查询Id和ShipCountry直接
SELECT Id, ShipCountry
- 对ShipCountry进行判断并添加为一列的方法是使用case when方法
SELECT WHEN ShipCountry in when ShipCountry in ("USA", "Canada", "Mexico") then 'NorthAmerica' else 'OtherPlace' end
- 从ID为15445开始输出是WHERE Id >= 15445
- 按照ID递增输出是ORDER BY ID ASC
- 只输出前20条信息是LIMIT 20
sqlite> SELECT Id, ShipCountry,
( case
when ShipCountry in ("USA", "Canada", "Mexico")
then 'NorthAmerica'
else 'OtherPlace'
end
)
FROM "Order"
WHERE Id >= 15445
ORDER BY Id ASC
LIMIT 20;
15445|France|OtherPlace
15446|Italy|OtherPlace
15447|Portugal|OtherPlace
15448|Argentina|OtherPlace
15449|Portugal|OtherPlace
15450|Venezuela|OtherPlace
15451|Brazil|OtherPlace
15452|France|OtherPlace
15453|France|OtherPlace
15454|Canada|NorthAmerica
15455|USA|NorthAmerica
15456|France|OtherPlace
15457|Mexico|NorthAmerica
15458|USA|NorthAmerica
15459|Germany|OtherPlace
15460|Argentina|OtherPlace
15461|Austria|OtherPlace
15462|Austria|OtherPlace
15463|Finland|OtherPlace
15464|Brazil|OtherPlace
Q4
要求查询每家公司的名字和不准时的订单的比率,并递减顺序输出
- 要求按照每家公司查询,所以我们需要GROUP BY分组
- 要求查询不准时的订单比率,只需要用每家公司的不准时订单数量除以每家公司的总订单数量即可,这样的话就是从两个表中统计。即SELECT * FROM (SELECT ...) INNER JOIN (SELECT ...) ON ...
- 要求对计算的百分率四舍五入保留两位小数,需要ROUND(x, n)函数即对x保留n位小数
sqlite> SELECT CompanyName, ROUND(OvertimeCnt.cnt * 100.0 / TotalCnt.cnt, 2) AS cnt_rate
FROM
(SELECT ShipVia, COUNT(*) AS cnt FROM "Order" WHERE ShippedDate > RequiredDate GROUP BY ShipVia) AS OvertimeCnt
INNER JOIN (SELECT ShipVia, COUNT(*) AS cnt FROM "Order" GROUP BY ShipVia) AS TotalCnt ON OvertimeCnt.ShipVia = TotalCnt.ShipVia
INNER JOIN Shipper ON Shipper.Id = OvertimeCnt.ShipVia
ORDER BY cnt_rate DESC;
Federal Shipping|23.61
Speedy Express|23.46
United Package|23.44
Q5
要求查询每种类别的产品数量,平均单位价格,最小单位价格,最大单位价格,在订单中的总数,并且要保证类别数量大于10
这个还是有蛮多知识点需要学习的:
- 首先是WHERE后面不能使用聚合函数,必须使用HAVING语句。
- 语句的执行先后顺序是 WHERE -> GROUP BY -> HAVING -> ORDER BY
- 注意COUNT和SUM聚合函数的区别,COUNT是统计表中某些列的个数之和,SUM是统计表中某些列的值之和
- COUNT(*), COUNT(SomeCol)两者的区别是前者把null的行也会统计进个数,而后者不会
SELECT CategoryName, COUNT(CategoryId) AS number, ROUND(AVG(UnitPrice), 2),MIN(UnitPrice),MAX(UnitPrice), SUM(UnitsOnOrder)
FROM Product INNER JOIN Category ON Product.CategoryId = Category.Id
GROUP BY CategoryId
HAVING number > 10
ORDER BY CategoryId;
Beverages|12|37.98|4.5|263.5|60
Condiments|12|23.06|10|43.9|170
Confections|13|25.16|9.2|81|180
Seafood|12|20.68|6|62.5|120
Q6
要求查询曾经购买过8款停售商品,且是最早购买的顾客公司信息和联系人信息。并按照商品名从小到大的字母序输出 商品名|公司信息|联系人信息
- 要求查询商品与联系人之间的关系,那么我们必须将Product,OrderDetail,Order,Customer表连接到一块才能查询
- 8款停售商品是在Product里面Discontinued值为1的商品
- 最后只需要按照Product分类,并用聚合函数MIN筛出日期最小的一次订单即可。
sqlite> SELECT ProductName, CompanyName, ContactName FROM Product INNER JOIN OrderDetail ON Product.Id = OrderDetail.ProductId INNER JOIN "Order" ON OrderDetail.OrderId = "Order".Id INNER JOIN Customer ON "Order".CustomerId = Customer.Id WHERE Product.Id IN (SELECT Id FROM Product WHERE Discontinued = 1) GROUP BY ProductName HAVING MIN(OrderDate);
Alice Mutton|Consolidated Holdings|Elizabeth Brown
Chef Anton's Gumbo Mix|Piccolo und mehr|Georg Pipps
Guaraná Fantástica|Piccolo und mehr|Georg Pipps
Mishi Kobe Niku|Old World Delicatessen|Rene Phillips
Perth Pasties|Piccolo und mehr|Georg Pipps
Rössle Sauerkraut|Piccolo und mehr|Georg Pipps
Singaporean Hokkien Fried Mee|Vins et alcools Chevalier|Paul Henriot
Thüringer Rostbratwurst|Piccolo und mehr|Georg Pipps
Q7
要求查询CustomerId为BLONP的用户,以OrderDate升序取前10个订单,每个订单与上一次订单的间隔天数(保留两位小数)。结果以OrderId
,OrderDate
,previousOrderDate
,CalcInterval
输出
- 要求取previous必须得使用lag函数进行处理,这里简单说明一下这个windows函数的参数。
LAG(expression [,offset[, default ]]) OVER (
PARTITION BY expression1, expression2,...
ORDER BY expression1 [ASC | DESC], expression2,...
)
expression是你的源数据,offset是偏移值(必须是非负值),default是如果这个位置为NULL的话应该填充什么。后面的OVER是课上讲过的windows函数就不赘述了。
2. 计算两个订单的间隔日期需要用JulianDay(NowTime) - JulianDay(LastTime)
的方法去计算,同时保留两位小数,没有什么难点。
sqlite> SELECT *, ROUND(JulianDay(NowTime) - JulianDay(LastTime), 2) FROM (SELECT Id, OrderDate AS NowTime, LAG(OrderDate, 1, OrderDate) OVER(ORDER BY OrderDate) AS LastTime FROM "Order" WHERE CustomerId = 'BLONP' ORDER BY OrderDate ASC LIMIT 10);
16766|2012-07-22 23:11:15|2012-07-22 23:11:15|0.0
10265|2012-07-25|2012-07-22 23:11:15|2.03
12594|2012-08-16 12:35:15|2012-07-25|22.52
20249|2012-08-16 16:52:23|2012-08-16 12:35:15|0.18
20882|2012-08-18 19:11:48|2012-08-16 16:52:23|2.1
18443|2012-08-28 05:34:03|2012-08-18 19:11:48|9.43
10297|2012-09-04|2012-08-28 05:34:03|6.77
11694|2012-09-17 00:27:14|2012-09-04|13.02
25613|2012-09-18 22:37:15|2012-09-17 00:27:14|1.92
17361|2012-09-19 12:13:21|2012-09-18 22:37:15|0.57
Q8
对于每个customer,需要你找出ta的公司所有的订购支出,并按照CompanyName|CustomerID|TotalExpenditures结构,先以TotalExpenditures递增顺序划分为4个bucket(利用NTILE函数),最后输出第一个bucket的内容.如果某一个Order中的Customer没有在对应的customer表中查找到,那么仍需要将其改为'MISSING_NAME'
- 首先肯定是以Order表中的CustomerId为出发点并分组,从OrderDetail表中计算该CustomerId所有的UnitPrice * Quantity之和
- 同时将Order中的CustomerId与Customer表中的Id进行关联,如果Order中存在但是Customer中不存在,那么需要将CustomerId对应的CompanyName置为MISSING_NAME,所以需要将Order表与Customer表进行LEFT JOIN
- LEFT JOIN过后,如果表的位置为NULL,需要使用IFNULL语句把NULL替换为MISSING_NAME, IFNULL的语法是
IFNULL(expression, set_content)
即若expression为NULL则将其替换为set_content,否则保持不变 - 同时将计算好的这张表按照total_cost递增的顺序分成4组
- 输出第一组即可
WITH total_cost AS (
SELECT
IFNULL(CompanyName, "MISSING_NAME") AS cn,
CustomerId AS ci,
ROUND(SUM(UnitPrice * Quantity), 2) AS cost
FROM 'Order' AS o
INNER JOIN OrderDetail AS od ON o.Id = od.OrderId
LEFT JOIN Customer AS c ON o.CustomerId = c.Id
GROUP BY o.CustomerId
),
divide_bucket AS (
SELECT *, NTILE(4) OVER(ORDER BY cost ASC) AS bucket_number
FROM total_cost
)
SELECT cn, ci, cost
FROM divide_bucket
WHERE bucket_number = 1;
MISSING_NAME|DUMO|1615.9
MISSING_NAME|OCEA|3460.2
MISSING_NAME|ANTO|7515.35
MISSING_NAME|QUEE|30226.1
Trail's Head Gourmet Provisioners|TRAIH|3874502.02
Blondesddsl père et fils|BLONP|3879728.69
Around the Horn|AROUT|4395636.28
Hungry Owl All-Night Grocers|HUNGO|4431457.1
Bon app|BONAP|4485708.49
Bólido Comidas preparadas|BOLID|4520121.88
Galería del gastrónomo|GALED|4533089.9
FISSA Fabrica Inter. Salchichas S.A.|FISSA|4554591.02
Maison Dewey|MAISD|4555931.37
Cactus Comidas para llevar|CACTU|4559046.87
Spécialités du monde|SPECD|4571764.89
Magazzini Alimentari Riuniti|MAGAA|4572382.35
Toms Spezialitäten|TOMSP|4628403.36
Split Rail Beer & Ale|SPLIR|4641383.53
Santé Gourmet|SANTG|4647668.15
Morgenstern Gesundkost|MORGK|4676234.2
White Clover Markets|WHITC|4681531.74
La corne d'abondance|LACOR|4724494.22
Victuailles en stock|VICTE|4726476.33
Lonesome Pine Restaurant|LONEP|4735780.66
吐槽一下这个题目的描述:"Output the bottom quartile of Customers, as measured by total expenditures."读完题目一直以为quartile是箱线图的那种四分位点的最下面那个部分,是不是要去求解总花费的这个四分位点。瞅了半天感觉和那个NTILE函数没啥关系啊。无奈打开solution研读了一番,发现是让你按照total expenditures递增顺序用NTILE函数把这个表分成4个bucket,输出第一个bucket包含的内容。
Q9
找出在每个区域服务过最年轻的员工,输出
Region Description
|First Name
|Last Name
|Birth Date
- 先把所有表连接起来,然后按照Reion分个组
- 再用HAVING过滤出每个组出生日期最大的员工即可
sqlite> SELECT RegionDescription, FirstName, LastName, BirthDate FROM Employee INNER JOIN EmployeeTerritory ON Employee.Id = EmployeeTerritory.EmployeeId INNER JOIN Territory ON EmployeeTerritory.TerritoryId = Territory.Id INNER JOIN Region ON Territory.RegionId = Region.Id GROUP BY Region.Id HAVING MAX(Employee.BirthDate);
Eastern|Steven|Buchanan|1987-03-04
Western|Michael|Suyama|1995-07-02
Northern|Anne|Dodsworth|1998-01-27
Southern|Janet|Leverling|1995-08-30
Q10
将'Queen Cozinha'公司在2014-12-25购买过的产品名拼接为规定的字符串格式
如果是将所有购买过的产品名全部输出还是比较简单的,直接
SELECT ProductName
FROM Product
WHERE Id IN
(SELECT DISTINCT(ProductId) FROM OrderDetail WHERE OrderId IN
(SELECT Id FROM 'Order' WHERE CustomerId IN
(SELECT Id FROM Customer WHERE CompanyName = "Queen Cozinha")
AND OrderDate like "2014-12-25%")
)
ORDER BY Id ASC;
但是题目要求用递归将所有语句拼接并输出,所以得整清楚这个递归的过程是啥样子,要怎么弄。
先看一下递归的语法要求
WITH name_cte AS (
SELECT statement /* non-recursive statement */
UNION [ALL]
SELECT statement /*recursive statement referencing the above select statement */
)
SELECT * FROM name_cte;
而递归的执行过程是:
- 执行非递归部分并创建临时表
- 执行递归部分并增加结果至临时表
- 重复第二步直到工作表为空
和数学归纳法有的一拼,先建立初始的信息,然后使用递归部分拼接结果
WITH start AS (
SELECT ProductName as Name, Product.Id AS id
FROM Product
WHERE Id IN
(SELECT DISTINCT(ProductId) FROM OrderDetail WHERE OrderId IN
(SELECT Id FROM 'Order' WHERE CustomerId IN
(SELECT Id FROM Customer WHERE CompanyName = "Queen Cozinha")
AND OrderDate like "2014-12-25%")
)
ORDER BY Id ASC
),
sort_list AS (
SELECT row_number() over (ORDER BY start.id ASC) AS seqnum, start.name AS name
from start
),
ans AS (
SELECT seqnum, name
FROM sort_list
WHERE seqnum = 1
UNION ALL
SELECT sort_list.seqnum, a.name || ', ' || sort_list.name
FROM sort_list JOIN
ans AS a ON sort_list.seqnum = a.seqnum + 1
)
SELECT name FROM ans ORDER BY seqnum desc LIMIT 1;