SQL学习之数据转--行列转换
翻译自https://mode.com/sql-tutorial/sql-pivot-table
所有的SQL语句都可以在此执行https://app.mode.com/editor/hanmin_cui/reports/019e59e2b506
SQL中的数据转换-行转列
这个课程将会教你如何从格式化数据中提取分析用的数据,并转换成展示用或图表用格式。我们将使用下面的数据库:
将它变成像下面这样的:
在这个例子中,我们将使用在CASE lesson中使用的同样的数据库--College Football。你可以直接在这里查看数据。像在inner join lesson中第一个例子,让我们先聚合展示在每个联盟中各个年级运动员的数量:
SELECT teams.conference AS conference, players.year, COUNT(1) AS players FROM benn.college_football_players players JOIN benn.college_football_teams teams ON teams.school_name = players.school_name GROUP BY 1,2 ORDER BY 1,2
为了转换数据,我们需要把上面的查询放入一个子查询中。在开始转换前,创建子查询并选取所有列是有帮助的。像这样在增加的步骤里重跑查询,如果查询不能运行,debug会更容易些。你可以在子查询中去除ORDER BY子句因为我们将在外部查询中重排序结果。
SELECT * FROM ( SELECT teams.conference AS conference, players.year, COUNT(1) AS players FROM benn.college_football_players players JOIN benn.college_football_teams teams ON teams.school_name = players.school_name GROUP BY 1,2 ) sub
假设上面查询像预想中那样运行(结果跟第一个查询记过一模一样),是时候将结果中不同年级值转换成列。在SELECT表达式中的每一项将创建一个列,所以你必须为每一个年级创建一个单独的列:
SELECT sub.conference, sum(case when sub.year='FR' then sub.players else null end) as fr, sum(case when sub.year='SO' then sub.players else null end) as so, sum(case when sub.year='JR' then sub.players else null end) as jr, sum(case when sub.year='SR' then sub.players else null end) as sr FROM ( SELECT teams.conference AS conference, players.year, COUNT(1) AS players FROM benn.college_football_players players JOIN benn.college_football_teams teams ON teams.school_name = players.school_name GROUP BY 1,2 ) sub group by 1
order by 1
技术上,你已经完成了教程上目标。但是可以变得更好一点。你注意到一点,上面的查询产生了以Conference列进行字母排序的列表。增加一列“total players”并以它排序(从大到小)也许会更加完整:
SELECT conference,
SUM(players) AS total_players,
SUM(CASE WHEN year = 'FR' THEN players ELSE NULL END) AS fr,
SUM(CASE WHEN year = 'SO' THEN players ELSE NULL END) AS so,
SUM(CASE WHEN year = 'JR' THEN players ELSE NULL END) AS jr,
SUM(CASE WHEN year = 'SR' THEN players ELSE NULL END) AS sr
FROM (
SELECT teams.conference AS conference,
players.year,
COUNT(1) AS players
FROM benn.college_football_players players
JOIN benn.college_football_teams teams
ON teams.school_name = players.school_name
GROUP BY 1,2
) sub
GROUP BY 1
ORDER BY 2 DESC
你完成了!在Mode中查看。
这里还有其他两种形式的行转列形式可查看:https://www.cnblogs.com/heisenburg/p/12116193.html
SQL中的数据转换-列转行
网上很多你发现的数据的格式都是为了观察用的,分析不适用。拿这个例子来说,这个表展示了2000-2012年间世界范围内发生地震的数量:
用这个格式的数据要回答“一个地震的平均震级是多少?”是困难的。如果数据被分成3列展示:“magnitude”,“year”,“number of earthquakes",将会容易的多。下面是怎样将数据转换成那样的格式:
首先,在Mode中观察原始数据:
SELECT *
FROM tutorial.worldwide_earthquakes
注:列名以“year_”开头,是因为Mode要求列名以字母开头。
需要做的第一件事就是创建一个表,列出原始表格的全部列,作为新表中的行。除非你有很多列需要转换,最简单的方式通常就是在一个子查询中列出所有的列:
SELECT year FROM (VALUES (2000),(2001),(2002),(2003),(2004),(2005),(2006), (2007),(2008),(2009),(2010),(2011),(2012)) v(year)
Once you've got this, you can cross join it with theworldwide_earthquakes
table to create an expanded view:
一旦你获得这些值,你可以通过交叉匹配worldwide_earthquakes表去创建一个扩展视图:
SELECT years.*,
earthquakes.*
FROM tutorial.worldwide_earthquakes earthquakes
CROSS JOIN (
SELECT year
FROM (VALUES (2000),(2001),(2002),(2003),(2004),(2005),(2006),
(2007),(2008),(2009),(2010),(2011),(2012)) v(year)
) years
观察,worldwide_earthquake表中的每一行被重复了13次。最后的要做的事情就是使用case表达式修正这个问题,使用case表达式根据year列的值从worldwide_earthquakes表中正确的列中拉取数据。
SELECT years.*,
earthquakes.magnitude,
CASE year
WHEN 2000 THEN year_2000
WHEN 2001 THEN year_2001
WHEN 2002 THEN year_2002
WHEN 2003 THEN year_2003
WHEN 2004 THEN year_2004
WHEN 2005 THEN year_2005
WHEN 2006 THEN year_2006
WHEN 2007 THEN year_2007
WHEN 2008 THEN year_2008
WHEN 2009 THEN year_2009
WHEN 2010 THEN year_2010
WHEN 2011 THEN year_2011
WHEN 2012 THEN year_2012
ELSE NULL END
AS number_of_earthquakes
FROM tutorial.worldwide_earthquakes earthquakes
CROSS JOIN (
SELECT year
FROM (VALUES (2000),(2001),(2002),(2003),(2004),(2005),(2006),
(2007),(2008),(2009),(2010),(2011),(2012)) v(year)
) years