关于 SQL window function 的一点使用记录

 上一篇讲了导航函数的使用,这一部分我将记录一下使用 window function 的例子以供我自己后续查阅搜索。毕竟之前做 TP 任务比较多,对于 AP 各种复杂的 SQL 灵活的使用还有一些不习惯。。。话说最近数据分析和处理任务相对多起来了才发现 SQL 真的如此强大。一个支持 SQL API 的查询引擎仿佛有无限可能性一般。。。灵活得让人害怕- -

和 navigation function 不同的是,我们经常使用开窗函数来给我们本来的数据加上一列来表达其分组后的顺序,方便我们后续的处理。

 

下面先是一些待处理的数据

WITH Produce AS
 (SELECT 'kale' as item, 23 as purchases, 'vegetable' as category
  UNION ALL SELECT 'banana', 2, 'fruit'
  UNION ALL SELECT 'cabbage', 9, 'vegetable'
  UNION ALL SELECT 'apple', 8, 'fruit'
  UNION ALL SELECT 'leek', 2, 'vegetable'
  UNION ALL SELECT 'lettuce', 10, 'vegetable')
SELECT * FROM Produce
+-------------------------------------+
| item      | category   | purchases  |
+-------------------------------------+
| kale      | vegetable  | 23         |
| banana    | fruit      | 2          |
| cabbage   | vegetable  | 9          |
| apple     | fruit      | 8          |
| leek      | vegetable  | 2          |
| lettuce   | vegetable  | 10         |
+-------------------------------------+

WITH Employees AS
 (SELECT 'Isabella' as name, 2 as department, DATE(1997, 09, 28) as start_date
  UNION ALL SELECT 'Anthony', 1, DATE(1995, 11, 29)
  UNION ALL SELECT 'Daniel', 2, DATE(2004, 06, 24)
  UNION ALL SELECT 'Andrew', 1, DATE(1999, 01, 23)
  UNION ALL SELECT 'Jacob', 1, DATE(1990, 07, 11)
  UNION ALL SELECT 'Jose', 2, DATE(2013, 03, 17))
SELECT * FROM Employees
+-------------------------------------+
| name      | department | start_date |
+-------------------------------------+
| Isabella  | 2          | 1997-09-28 |
| Anthony   | 1          | 1995-11-29 |
| Daniel    | 2          | 2004-06-24 |
| Andrew    | 1          | 1999-01-23 |
| Jacob     | 1          | 1990-07-11 |
| Jose      | 2          | 2013-03-17 |
+-------------------------------------+

WITH Farm AS
 (SELECT 'cat' as animal, 23 as population, 'mammal' as category
  UNION ALL SELECT 'duck', 3, 'bird'
  UNION ALL SELECT 'dog', 2, 'mammal'
  UNION ALL SELECT 'goose', 1, 'bird'
  UNION ALL SELECT 'ox', 2, 'mammal'
  UNION ALL SELECT 'goat', 2, 'mammal')
SELECT * FROM Farm
+-------------------------------------+
| animal    | category   | population |
+-------------------------------------+
| cat       | mammal     | 23         |
| duck      | bird       | 3          |
| dog       | mammal     | 2          |
| goose     | bird       | 1          |
| ox        | mammal     | 2          |
| goat      | mammal     | 2          |
+-------------------------------------+

 

计算总计

蛮有意思在不分组的情况下,我们可以根据所有数据行开窗多生成一列总和

WITH Produce AS
 (SELECT 'kale' as item, 23 as purchases, 'vegetable' as category
  UNION ALL SELECT 'banana', 2, 'fruit'
  UNION ALL SELECT 'cabbage', 9, 'vegetable'
  UNION ALL SELECT 'apple', 8, 'fruit'
  UNION ALL SELECT 'leek', 2, 'vegetable'
  UNION ALL SELECT 'lettuce', 10, 'vegetable'
  ),
Employees AS
 (SELECT 'Isabella' as name, 2 as department, DATE(1997, 09, 28) as start_date
  UNION ALL SELECT 'Anthony', 1, DATE(1995, 11, 29)
  UNION ALL SELECT 'Daniel', 2, DATE(2004, 06, 24)
  UNION ALL SELECT 'Andrew', 1, DATE(1999, 01, 23)
  UNION ALL SELECT 'Jacob', 1, DATE(1990, 07, 11)
  UNION ALL SELECT 'Jose', 2, DATE(2013, 03, 17)),
Farm AS
 (SELECT 'cat' as animal, 23 as population, 'mammal' as category
  UNION ALL SELECT 'duck', 3, 'bird'
  UNION ALL SELECT 'dog', 2, 'mammal'
  UNION ALL SELECT 'goose', 1, 'bird'
  UNION ALL SELECT 'ox', 2, 'mammal'
  UNION ALL SELECT 'goat', 2, 'mammal')

SELECT item, purchases, category, SUM(purchases)
  OVER () AS total_purchases
FROM Produce
+-------------------------------------------------------+
| item      | purchases  | category   | total_purchases |
+-------------------------------------------------------+
| banana    | 2          | fruit      | 54              |
| leek      | 2          | vegetable  | 54              |
| apple     | 8          | fruit      | 54              |
| cabbage   | 9          | vegetable  | 54              |
| lettuce   | 10         | vegetable  | 54              |
| kale      | 23         | vegetable  | 54              |
+-------------------------------------------------------+

可以看到我们无需对整个表数据进行额外的 group by,通过对字段 purchases 开窗就可以得到聚合数据信息。

 

计算分类小计

 

WITH Produce AS
 (SELECT 'kale' as item, 23 as purchases, 'vegetable' as category
  UNION ALL SELECT 'banana', 2, 'fruit'
  UNION ALL SELECT 'cabbage', 9, 'vegetable'
  UNION ALL SELECT 'apple', 8, 'fruit'
  UNION ALL SELECT 'leek', 2, 'vegetable'
  UNION ALL SELECT 'lettuce', 10, 'vegetable'
  ),
Employees AS
 (SELECT 'Isabella' as name, 2 as department, DATE(1997, 09, 28) as start_date
  UNION ALL SELECT 'Anthony', 1, DATE(1995, 11, 29)
  UNION ALL SELECT 'Daniel', 2, DATE(2004, 06, 24)
  UNION ALL SELECT 'Andrew', 1, DATE(1999, 01, 23)
  UNION ALL SELECT 'Jacob', 1, DATE(1990, 07, 11)
  UNION ALL SELECT 'Jose', 2, DATE(2013, 03, 17)),
Farm AS
 (SELECT 'cat' as animal, 23 as population, 'mammal' as category
  UNION ALL SELECT 'duck', 3, 'bird'
  UNION ALL SELECT 'dog', 2, 'mammal'
  UNION ALL SELECT 'goose', 1, 'bird'
  UNION ALL SELECT 'ox', 2, 'mammal'
  UNION ALL SELECT 'goat', 2, 'mammal')

-- SELECT item, purchases, category, SUM(purchases)
--   OVER () AS total_purchases
-- FROM Produce

select 
  item, 
  purchases, 
  category,
  SUM(purchases) OVER(partition by category) as group_purchases
from Produce 

 

+-------------------------------------------------------+
| item      | purchases  | category   | total_purchases |
+-------------------------------------------------------+
| banana    | 2          | fruit      | 10              |
| apple     | 8          | fruit      | 10              |
| kale      | 23         | vegetable  | 44              |
| cabbage   | 9          | vegetable  | 44              |
| leek      | 2          | vegetable  | 44              |
| lettuce   | 10         | vegetable  | 44              |
+-------------------------------------------------------+

其实我参考的 BigQuery 文档上对这个小计的要求并没有要求有顺序,所以我没有使用 order by 子句。

 

计算累计总和

累计总和的意思是,到当前条目为止的时候,我们需要开窗的部分的值是多少。上面我们计算的范围都是 rows between unbounded preceding and unbounded following, 我们可以使用 unbounded preceding and current row 来实现这个要求。

下面贴下 SQL 不写 ORDER BY 后面的部分 range 部分是因为可以依靠默认值。

WITH Produce AS
 (SELECT 'kale' as item, 23 as purchases, 'vegetable' as category
  UNION ALL SELECT 'banana', 2, 'fruit'
  UNION ALL SELECT 'cabbage', 9, 'vegetable'
  UNION ALL SELECT 'apple', 8, 'fruit'
  UNION ALL SELECT 'leek', 2, 'vegetable'
  UNION ALL SELECT 'lettuce', 10, 'vegetable'
  ),
Employees AS
 (SELECT 'Isabella' as name, 2 as department, DATE(1997, 09, 28) as start_date
  UNION ALL SELECT 'Anthony', 1, DATE(1995, 11, 29)
  UNION ALL SELECT 'Daniel', 2, DATE(2004, 06, 24)
  UNION ALL SELECT 'Andrew', 1, DATE(1999, 01, 23)
  UNION ALL SELECT 'Jacob', 1, DATE(1990, 07, 11)
  UNION ALL SELECT 'Jose', 2, DATE(2013, 03, 17)),
Farm AS
 (SELECT 'cat' as animal, 23 as population, 'mammal' as category
  UNION ALL SELECT 'duck', 3, 'bird'
  UNION ALL SELECT 'dog', 2, 'mammal'
  UNION ALL SELECT 'goose', 1, 'bird'
  UNION ALL SELECT 'ox', 2, 'mammal'
  UNION ALL SELECT 'goat', 2, 'mammal')

-- SELECT item, purchases, category, SUM(purchases)
--   OVER () AS total_purchases
-- FROM Produce

select 
  item, 
  purchases, 
  category,
  SUM(purchases) OVER(partition by category order by purchases) as group_purchases
from Produce 

 

计算移动平均值

WITH Produce AS
 (SELECT 'kale' as item, 23 as purchases, 'vegetable' as category
  UNION ALL SELECT 'banana', 2, 'fruit'
  UNION ALL SELECT 'cabbage', 9, 'vegetable'
  UNION ALL SELECT 'apple', 8, 'fruit'
  UNION ALL SELECT 'leek', 2, 'vegetable'
  UNION ALL SELECT 'lettuce', 10, 'vegetable'
  ),
Employees AS
 (SELECT 'Isabella' as name, 2 as department, DATE(1997, 09, 28) as start_date
  UNION ALL SELECT 'Anthony', 1, DATE(1995, 11, 29)
  UNION ALL SELECT 'Daniel', 2, DATE(2004, 06, 24)
  UNION ALL SELECT 'Andrew', 1, DATE(1999, 01, 23)
  UNION ALL SELECT 'Jacob', 1, DATE(1990, 07, 11)
  UNION ALL SELECT 'Jose', 2, DATE(2013, 03, 17)),
Farm AS
 (SELECT 'cat' as animal, 23 as population, 'mammal' as category
  UNION ALL SELECT 'duck', 3, 'bird'
  UNION ALL SELECT 'dog', 2, 'mammal'
  UNION ALL SELECT 'goose', 1, 'bird'
  UNION ALL SELECT 'ox', 2, 'mammal'
  UNION ALL SELECT 'goat', 2, 'mammal')

-- SELECT item, purchases, category, SUM(purchases)
--   OVER () AS total_purchases
-- FROM Produce

select 
  item, 
  purchases, 
  category,
  avg(purchases) OVER( order by purchases rows between 1 preceding and 1 following) as group_purchases
from Produce 

根据前一行和后一行计算移动平均,注意这一次没有分组。

+-------------------------------------------------------+
| item      | purchases  | category   | avg_purchases   |
+-------------------------------------------------------+
| banana    | 2          | fruit      | 2               |
| leek      | 2          | vegetable  | 4               |
| apple     | 8          | fruit      | 6.33333         |
| cabbage   | 9          | vegetable  | 9               |
| lettuce   | 10         | vegetable  | 14              |
| kale      | 23         | vegetable  | 16.5            |
+-------------------------------------------------------+

 

计算某个范围内的项数

这次我看要求是用 Fame 表,大概是我们需要获取 Farm 表中总数相近的动物数量。这个还蛮有意思的。Fame 表里有三个字段  animal, population, category 这里的要求你是总数相近,我们可以根据数量排个序 也就是根据 population 排个序。然后取相近的定义,这里我们理解相近就是比较逻辑上前面一个和后面一个值就行。

WITH Produce AS
 (SELECT 'kale' as item, 23 as purchases, 'vegetable' as category
  UNION ALL SELECT 'banana', 2, 'fruit'
  UNION ALL SELECT 'cabbage', 9, 'vegetable'
  UNION ALL SELECT 'apple', 8, 'fruit'
  UNION ALL SELECT 'leek', 2, 'vegetable'
  UNION ALL SELECT 'lettuce', 10, 'vegetable'
  ),
Employees AS
 (SELECT 'Isabella' as name, 2 as department, DATE(1997, 09, 28) as start_date
  UNION ALL SELECT 'Anthony', 1, DATE(1995, 11, 29)
  UNION ALL SELECT 'Daniel', 2, DATE(2004, 06, 24)
  UNION ALL SELECT 'Andrew', 1, DATE(1999, 01, 23)
  UNION ALL SELECT 'Jacob', 1, DATE(1990, 07, 11)
  UNION ALL SELECT 'Jose', 2, DATE(2013, 03, 17)),
Farm AS
 (SELECT 'cat' as animal, 23 as population, 'mammal' as category
  UNION ALL SELECT 'duck', 3, 'bird'
  UNION ALL SELECT 'dog', 2, 'mammal'
  UNION ALL SELECT 'goose', 1, 'bird'
  UNION ALL SELECT 'ox', 2, 'mammal'
  UNION ALL SELECT 'goat', 2, 'mammal')

-- SELECT item, purchases, category, SUM(purchases)
--   OVER () AS total_purchases
-- FROM Produce

select 
  animal,
  population, 
  category,
  count(*) OVER(order by population range between 1 preceding and 1 following) as similar_population
from Farm 
+----------------------------------------------------------+
| animal    | population | category   | similar_population |
+----------------------------------------------------------+
| goose     | 1          | bird       | 4                  |
| dog       | 2          | mammal     | 5                  |
| ox        | 2          | mammal     | 5                  |
| goat      | 2          | mammal     | 5                  |
| duck      | 3          | bird       | 4                  |
| cat       | 23         | mammal     | 1                  |
+----------------------------------------------------------+

这里解释一下。因为我们用的 range,上一篇文章讲解了 range 是逻辑上的范围,跟 order by 字段相关。

我们取 population - 1 and population + 1 第一条也就是[0, 2] 的范围得到4条记录所以是4. 第二条记录是取 [1, 3] 的记录所以得到的是 5条。以此类推。

 

计算每个类别中最受欢迎的项

使用 Produce 计算每个类别中最受欢迎的项。

WITH Produce AS
 (SELECT 'kale' as item, 23 as purchases, 'vegetable' as category
  UNION ALL SELECT 'banana', 2, 'fruit'
  UNION ALL SELECT 'cabbage', 9, 'vegetable'
  UNION ALL SELECT 'apple', 8, 'fruit'
  UNION ALL SELECT 'leek', 2, 'vegetable'
  UNION ALL SELECT 'lettuce', 10, 'vegetable'
  ),
Employees AS
 (SELECT 'Isabella' as name, 2 as department, DATE(1997, 09, 28) as start_date
  UNION ALL SELECT 'Anthony', 1, DATE(1995, 11, 29)
  UNION ALL SELECT 'Daniel', 2, DATE(2004, 06, 24)
  UNION ALL SELECT 'Andrew', 1, DATE(1999, 01, 23)
  UNION ALL SELECT 'Jacob', 1, DATE(1990, 07, 11)
  UNION ALL SELECT 'Jose', 2, DATE(2013, 03, 17)),
Farm AS
 (SELECT 'cat' as animal, 23 as population, 'mammal' as category
  UNION ALL SELECT 'duck', 3, 'bird'
  UNION ALL SELECT 'dog', 2, 'mammal'
  UNION ALL SELECT 'goose', 1, 'bird'
  UNION ALL SELECT 'ox', 2, 'mammal'
  UNION ALL SELECT 'goat', 2, 'mammal')

select 
  item,
  purchases, 
  category,
  FIRST_VALUE(item) OVER(partition by category order by purchases desc rows between unbounded preceding and unbounded following) as most_populcation
from Produce 
+----------------------------------------------------+
| item      | purchases  | category   | most_popular |
+----------------------------------------------------+
| banana    | 2          | fruit      | apple        |
| apple     | 8          | fruit      | apple        |
| leek      | 2          | vegetable  | kale         |
| cabbage   | 9          | vegetable  | kale         |
| lettuce   | 10         | vegetable  | kale         |
| kale      | 23         | vegetable  | kale         |
+----------------------------------------------------+

 

排名函数的使用

总算要使用到 rank 函数了。这里我将会讲解几个开窗常用的 rank() 函数。

  • RANK
  • ROW_NUMBER
  • DENSE_RANK
  • PERCENT_RANK
  • CUME_DIST
  • NTILE

rank 给记录排序赋值在重复的值下会有重复的Rank, 但是排名会向后顺延,例如:

WITH Numbers AS
 (SELECT 1 as x
  UNION ALL SELECT 2
  UNION ALL SELECT 2
  UNION ALL SELECT 5
  UNION ALL SELECT 8
  UNION ALL SELECT 10
  UNION ALL SELECT 10
)
SELECT x,
  RANK() OVER (ORDER BY x ASC) AS rank
FROM Numbers

+-------------------------+
| x          | rank       |
+-------------------------+
| 1          | 1          |
| 2          | 2          |
| 2          | 2          |
| 5          | 4          |
| 8          | 5          |
| 10         | 6          |
| 10         | 6          |
+-------------------------+

 

row_number 的话就是最符合直觉的排序编号。遇到重复的值也会增加 rank 值

WITH Numbers AS
 (SELECT 1 as x
  UNION ALL SELECT 2
  UNION ALL SELECT 2
  UNION ALL SELECT 5
  UNION ALL SELECT 8
  UNION ALL SELECT 10
  UNION ALL SELECT 10
)
SELECT x,
  ROW_NUMBER() OVER (ORDER BY x) AS row_num
FROM Numbers

+-------------------------+
| x          | row_num    |
+-------------------------+
| 1          | 1          |
| 2          | 2          |
| 2          | 3          |
| 5          | 4          |
| 8          | 5          |
| 10         | 6          |
| 10         | 7          |
+-------------------------+

 

dense_rank 的话就是排序 rank 函数的变种,rank 函数遇到重复的值会是同样的值但是后面排序会顺延跳过重复值,但是 dense_rank 不会跳过编号例如

WITH Numbers AS
 (SELECT 1 as x
  UNION ALL SELECT 2
  UNION ALL SELECT 2
  UNION ALL SELECT 5
  UNION ALL SELECT 8
  UNION ALL SELECT 10
  UNION ALL SELECT 10
)
SELECT x,
  DENSE_RANK() OVER (ORDER BY x ASC) AS dense_rank
FROM Numbers

+-------------------------+
| x          | dense_rank |
+-------------------------+
| 1          | 1          |
| 2          | 2          |
| 2          | 2          |
| 5          | 3          |
| 8          | 4          |
| 10         | 5          |
| 10         | 5          |
+-------------------------+

 

percent_rank 这个函数我平时比较少见到,定义为 (RN-1)/(NR-1) ,其中 RK 是 RANK 值, NR 是分区内的行数。如果 NR=1 那么则返回 0 举个例子

WITH finishers AS
 (SELECT 'Sophia Liu' as name,
  TIMESTAMP '2016-10-18 2:51:45' as finish_time,
  'F30-34' as division
  UNION ALL SELECT 'Lisa Stelzner', TIMESTAMP '2016-10-18 2:54:11', 'F35-39'
  UNION ALL SELECT 'Nikki Leith', TIMESTAMP '2016-10-18 2:59:01', 'F30-34'
  UNION ALL SELECT 'Lauren Matthews', TIMESTAMP '2016-10-18 3:01:17', 'F35-39'
  UNION ALL SELECT 'Desiree Berry', TIMESTAMP '2016-10-18 3:05:42', 'F35-39'
  UNION ALL SELECT 'Suzy Slane', TIMESTAMP '2016-10-18 3:06:24', 'F35-39'
  UNION ALL SELECT 'Jen Edwards', TIMESTAMP '2016-10-18 3:06:36', 'F30-34'
  UNION ALL SELECT 'Meghan Lederer', TIMESTAMP '2016-10-18 2:59:01', 'F30-34')
SELECT name,
  finish_time,
  division,
  PERCENT_RANK() OVER (PARTITION BY division ORDER BY finish_time ASC) AS finish_rank
FROM finishers;

+-----------------+------------------------+----------+---------------------+
| name            | finish_time            | division | finish_rank         |
+-----------------+------------------------+----------+---------------------+
| Sophia Liu      | 2016-10-18 09:51:45+00 | F30-34   | 0                   |
| Meghan Lederer  | 2016-10-18 09:59:01+00 | F30-34   | 0.33333333333333331 |
| Nikki Leith     | 2016-10-18 09:59:01+00 | F30-34   | 0.33333333333333331 |
| Jen Edwards     | 2016-10-18 10:06:36+00 | F30-34   | 1                   |
| Lisa Stelzner   | 2016-10-18 09:54:11+00 | F35-39   | 0                   |
| Lauren Matthews | 2016-10-18 10:01:17+00 | F35-39   | 0.33333333333333331 |
| Desiree Berry   | 2016-10-18 10:05:42+00 | F35-39   | 0.66666666666666663 |
| Suzy Slane      | 2016-10-18 10:06:24+00 | F35-39   | 1                   |
+-----------------+------------------------+----------+---------------------+

 

cume_dist 平时同样感觉用得比较少,跟 percent_rank 相比计算公式不太一样,cume_dist 是 NP/NR NP 是闭区间的当前行以及当前行相等的行以及当前行前面所有的行。 NR 是分区内总行数。看例子

WITH finishers AS
 (SELECT 'Sophia Liu' as name,
  TIMESTAMP '2016-10-18 2:51:45' as finish_time,
  'F30-34' as division
  UNION ALL SELECT 'Lisa Stelzner', TIMESTAMP '2016-10-18 2:54:11', 'F35-39'
  UNION ALL SELECT 'Nikki Leith', TIMESTAMP '2016-10-18 2:59:01', 'F30-34'
  UNION ALL SELECT 'Lauren Matthews', TIMESTAMP '2016-10-18 3:01:17', 'F35-39'
  UNION ALL SELECT 'Desiree Berry', TIMESTAMP '2016-10-18 3:05:42', 'F35-39'
  UNION ALL SELECT 'Suzy Slane', TIMESTAMP '2016-10-18 3:06:24', 'F35-39'
  UNION ALL SELECT 'Jen Edwards', TIMESTAMP '2016-10-18 3:06:36', 'F30-34'
  UNION ALL SELECT 'Meghan Lederer', TIMESTAMP '2016-10-18 2:59:01', 'F30-34')
SELECT name,
  finish_time,
  division,
  CUME_DIST() OVER (PARTITION BY division ORDER BY finish_time ASC) AS finish_rank
FROM finishers;

+-----------------+------------------------+----------+-------------+
| name            | finish_time            | division | finish_rank |
+-----------------+------------------------+----------+-------------+
| Sophia Liu      | 2016-10-18 09:51:45+00 | F30-34   | 0.25        |
| Meghan Lederer  | 2016-10-18 09:59:01+00 | F30-34   | 0.75        |
| Nikki Leith     | 2016-10-18 09:59:01+00 | F30-34   | 0.75        |
| Jen Edwards     | 2016-10-18 10:06:36+00 | F30-34   | 1           |
| Lisa Stelzner   | 2016-10-18 09:54:11+00 | F35-39   | 0.25        |
| Lauren Matthews | 2016-10-18 10:01:17+00 | F35-39   | 0.5         |
| Desiree Berry   | 2016-10-18 10:05:42+00 | F35-39   | 0.75        |
| Suzy Slane      | 2016-10-18 10:06:24+00 | F35-39   | 1           |
+-----------------+------------------------+----------+-------------+

 

ntile 该函数的功能就是给数据进行分桶处理,我找了个场景使用到这个函数。

WITH finishers AS
 (SELECT 'Sophia Liu' as name,
  TIMESTAMP '2016-10-18 2:51:45' as finish_time,
  'F30-34' as division
  UNION ALL SELECT 'Lisa Stelzner', TIMESTAMP '2016-10-18 2:54:11', 'F35-39'
  UNION ALL SELECT 'Nikki Leith', TIMESTAMP '2016-10-18 2:59:01', 'F30-34'
  UNION ALL SELECT 'Lauren Matthews', TIMESTAMP '2016-10-18 3:01:17', 'F35-39'
  UNION ALL SELECT 'Desiree Berry', TIMESTAMP '2016-10-18 3:05:42', 'F35-39'
  UNION ALL SELECT 'Suzy Slane', TIMESTAMP '2016-10-18 3:06:24', 'F35-39'
  UNION ALL SELECT 'Jen Edwards', TIMESTAMP '2016-10-18 3:06:36', 'F30-34'
  UNION ALL SELECT 'Meghan Lederer', TIMESTAMP '2016-10-18 2:59:01', 'F30-34')
SELECT name,
  finish_time,
  division,
  NTILE(3) OVER (PARTITION BY division ORDER BY finish_time ASC) AS finish_rank
FROM finishers;

+-----------------+------------------------+----------+-------------+
| name            | finish_time            | division | finish_rank |
+-----------------+------------------------+----------+-------------+
| Sophia Liu      | 2016-10-18 09:51:45+00 | F30-34   | 1           |
| Meghan Lederer  | 2016-10-18 09:59:01+00 | F30-34   | 1           |
| Nikki Leith     | 2016-10-18 09:59:01+00 | F30-34   | 2           |
| Jen Edwards     | 2016-10-18 10:06:36+00 | F30-34   | 3           |
| Lisa Stelzner   | 2016-10-18 09:54:11+00 | F35-39   | 1           |
| Lauren Matthews | 2016-10-18 10:01:17+00 | F35-39   | 1           |
| Desiree Berry   | 2016-10-18 10:05:42+00 | F35-39   | 2           |
| Suzy Slane      | 2016-10-18 10:06:24+00 | F35-39   | 3           |
+-----------------+------------------------+----------+-------------+

我来解释一下这个代码, NTILE(3) 表示我们要在当前分组里面分3个桶。那么数据不平衡的部分我们会增加 1 这个值。所以看到 1 出现了两次。

 

 

Reference:

https://cloud.google.com/bigquery/docs/reference/standard-sql/window-function-calls#filter_window_results

 

posted @ 2022-12-09 20:49  piperck  阅读(31)  评论(0编辑  收藏  举报