SQL学习之数据转--行列转换

翻译自https://mode.com/sql-tutorial/sql-pivot-table

所有的SQL语句都可以在此执行https://app.mode.com/editor/hanmin_cui/reports/019e59e2b506

SQL中的数据转换-行转列

 

这个课程将会教你如何从格式化数据中提取分析用的数据,并转换成展示用或图表用格式。我们将使用下面的数据库:

将它变成像下面这样的:

在这个例子中,我们将使用在CASE lesson中使用的同样的数据库--College Football。你可以直接在这里查看数据。像在inner join lesson中第一个例子,让我们先聚合展示在每个联盟中各个年级运动员的数量:

SELECT teams.conference AS conference,
       players.year,
       COUNT(1) AS players
  FROM benn.college_football_players players
  JOIN benn.college_football_teams teams
    ON teams.school_name = players.school_name
 GROUP BY 1,2
 ORDER BY 1,2

在MODE中查看结果

为了转换数据,我们需要把上面的查询放入一个子查询中。在开始转换前,创建子查询并选取所有列是有帮助的。像这样在增加的步骤里重跑查询,如果查询不能运行,debug会更容易些。你可以在子查询中去除ORDER BY子句因为我们将在外部查询中重排序结果。

SELECT * 
  FROM  
    ( SELECT teams.conference AS conference,
             players.year,
             COUNT(1) AS players 
      FROM benn.college_football_players players 
      JOIN benn.college_football_teams teams 
      ON teams.school_name = players.school_name 
      GROUP BY 1,2 ) sub

假设上面查询像预想中那样运行(结果跟第一个查询记过一模一样),是时候将结果中不同年级值转换成列。在SELECT表达式中的每一项将创建一个列,所以你必须为每一个年级创建一个单独的列: 

SELECT sub.conference,
            sum(case when sub.year='FR' then sub.players else null end) as fr,
            sum(case when sub.year='SO' then sub.players else null end) as so,
            sum(case when sub.year='JR' then sub.players else null end) as jr,
            sum(case when sub.year='SR' then sub.players else null end) as sr
  FROM  
    ( SELECT teams.conference AS conference,
             players.year,
             COUNT(1) AS players 
      FROM benn.college_football_players players 
      JOIN benn.college_football_teams teams 
      ON teams.school_name = players.school_name 
      GROUP BY 1,2 ) sub
  group by 1
order by 1

技术上,你已经完成了教程上目标。但是可以变得更好一点。你注意到一点,上面的查询产生了以Conference列进行字母排序的列表。增加一列“total players”并以它排序(从大到小)也许会更加完整: 

SELECT conference,
       SUM(players) AS total_players,
       SUM(CASE WHEN year = 'FR' THEN players ELSE NULL END) AS fr,
       SUM(CASE WHEN year = 'SO' THEN players ELSE NULL END) AS so,
       SUM(CASE WHEN year = 'JR' THEN players ELSE NULL END) AS jr,
       SUM(CASE WHEN year = 'SR' THEN players ELSE NULL END) AS sr
  FROM (
        SELECT teams.conference AS conference,
               players.year,
               COUNT(1) AS players
          FROM benn.college_football_players players
          JOIN benn.college_football_teams teams
            ON teams.school_name = players.school_name
         GROUP BY 1,2
       ) sub
 GROUP BY 1
 ORDER BY 2 DESC

你完成了!在Mode中查看

这里还有其他两种形式的行转列形式可查看:https://www.cnblogs.com/heisenburg/p/12116193.html

SQL中的数据转换-列转行

 

网上很多你发现的数据的格式都是为了观察用的,分析不适用。拿这个例子来说,这个表展示了2000-2012年间世界范围内发生地震的数量:

用这个格式的数据要回答“一个地震的平均震级是多少?”是困难的。如果数据被分成3列展示:“magnitude”,“year”,“number of earthquakes",将会容易的多。下面是怎样将数据转换成那样的格式:

首先,在Mode中观察原始数据:

SELECT *
  FROM tutorial.worldwide_earthquakes

注:列名以“year_”开头,是因为Mode要求列名以字母开头。

需要做的第一件事就是创建一个表,列出原始表格的全部列,作为新表中的行。除非你有很多列需要转换,最简单的方式通常就是在一个子查询中列出所有的列:

SELECT year
  FROM (VALUES (2000),(2001),(2002),(2003),(2004),(2005),(2006),
               (2007),(2008),(2009),(2010),(2011),(2012)) v(year)Once you've got this, you can cross join it with the worldwide_earthquakes table to create an expanded view: 

一旦你获得这些值,你可以通过交叉匹配worldwide_earthquakes表去创建一个扩展视图:

SELECT years.*,
       earthquakes.*
  FROM tutorial.worldwide_earthquakes earthquakes
 CROSS JOIN (
       SELECT year
         FROM (VALUES (2000),(2001),(2002),(2003),(2004),(2005),(2006),
                      (2007),(2008),(2009),(2010),(2011),(2012)) v(year)
       ) years

观察,worldwide_earthquake表中的每一行被重复了13次。最后的要做的事情就是使用case表达式修正这个问题,使用case表达式根据year列的值从worldwide_earthquakes表中正确的列中拉取数据。

SELECT years.*,
       earthquakes.magnitude,
       CASE year
         WHEN 2000 THEN year_2000
         WHEN 2001 THEN year_2001
         WHEN 2002 THEN year_2002
         WHEN 2003 THEN year_2003
         WHEN 2004 THEN year_2004
         WHEN 2005 THEN year_2005
         WHEN 2006 THEN year_2006
         WHEN 2007 THEN year_2007
         WHEN 2008 THEN year_2008
         WHEN 2009 THEN year_2009
         WHEN 2010 THEN year_2010
         WHEN 2011 THEN year_2011
         WHEN 2012 THEN year_2012
         ELSE NULL END
         AS number_of_earthquakes
  FROM tutorial.worldwide_earthquakes earthquakes
 CROSS JOIN (
       SELECT year
         FROM (VALUES (2000),(2001),(2002),(2003),(2004),(2005),(2006),
                      (2007),(2008),(2009),(2010),(2011),(2012)) v(year)
       ) years

在Mode中查看最终的产品

posted @ 2019-10-05 20:19  Hansenburg  阅读(512)  评论(0编辑  收藏  举报