【PostgreSQL】pg 窗口函数总结
窗口函数基于结果集进行计算,将计算出的结果合并到输出的结果集上,并返回多行。使用窗口函数能大幅度简化SQL代码。
gaussdb提供内置的窗口函数,例如row_num()、rank()、lag()等,除了内置的窗口函数外,聚合函数、自定义函数后接OVER属性也可以作为窗口函数。
1,创建测试表并插入数据。
postgres=# DROP TABLE IF EXISTS scores; NOTICE: table "scores" does not exist, skipping DROP TABLE postgres=# CREATE TABLE scores(id serial PRIMARY KEY,subject varchar(32),stu_name varchar(32),score numeric(3,0)); CREATE TABLE postgres=# INSERT INTO scores(subject,stu_name,score) VALUES('Chinese','user1',80),('Chinese','user2',90),('Chinese','user3',90),('math','user1',90),('math','user2',80),('math','user3',100),('English','user1',80),('English','user2',90),('English','user3',70); INSERT 0 9 postgres=# SELECT * FROM scores; id | subject | stu_name | score ----+---------+----------+------- 1 | Chinese | user1 | 80 2 | Chinese | user2 | 90 3 | Chinese | user3 | 90 4 | math | user1 | 90 5 | math | user2 | 80 6 | math | user3 | 100 7 | English | user1 | 80 8 | English | user2 | 90 9 | English | user3 | 70 (9 rows)
2,avg() OVER()计算分组后数据的平均值。
postgres=# SELECT subject,stu_name,score,avg(score) OVER(PARTITION BY subject) FROM scores; subject | stu_name | score | avg ---------+----------+-------+--------------------- Chinese | user1 | 80 | 86.6666666666666667 Chinese | user2 | 90 | 86.6666666666666667 Chinese | user3 | 90 | 86.6666666666666667 English | user3 | 70 | 80.0000000000000000 English | user1 | 80 | 80.0000000000000000 English | user2 | 90 | 80.0000000000000000 math | user1 | 90 | 90.0000000000000000 math | user2 | 80 | 90.0000000000000000 math | user3 | 100 | 90.0000000000000000 (9 rows)
3,row_number() OVER()对分组后的数据标注行号,从1开始。
postgres=# SELECT row_number() OVER(PARTITION BY subject ORDER BY score DESC),* FROM scores; row_number | id | subject | stu_name | score ------------+----+---------+----------+------- 1 | 2 | Chinese | user2 | 90 2 | 3 | Chinese | user3 | 90 3 | 1 | Chinese | user1 | 80 1 | 8 | English | user2 | 90 2 | 7 | English | user1 | 80 3 | 9 | English | user3 | 70 1 | 6 | math | user3 | 100 2 | 4 | math | user1 | 90 3 | 5 | math | user2 | 80 (9 rows)
4,rank() OVER()与row_number() OVER()类似主要区别是当组内某行字段值相同时,行号重复并且行号产生间隙。
postgres=# SELECT rank() OVER(PARTITION BY subject ORDER BY score DESC),* FROM scores; rank | id | subject | stu_name | score ------+----+---------+----------+------- 1 | 2 | Chinese | user2 | 90 1 | 3 | Chinese | user3 | 90 3 | 1 | Chinese | user1 | 80 1 | 8 | English | user2 | 90 2 | 7 | English | user1 | 80 3 | 9 | English | user3 | 70 1 | 6 | math | user3 | 100 2 | 4 | math | user1 | 90 3 | 5 | math | user2 | 80 (9 rows)
5,dense_rank() OVER()与rank() 类似,主要区别为当组内某行字段值相同时,虽然重复行号,但行号不产生间隙。
postgres=# SELECT dense_rank() OVER(PARTITION BY subject ORDER BY score DESC),* FROM scores; dense_rank | id | subject | stu_name | score ------------+----+---------+----------+------- 1 | 2 | Chinese | user2 | 90 1 | 3 | Chinese | user3 | 90 2 | 1 | Chinese | user1 | 80 1 | 8 | English | user2 | 90 2 | 7 | English | user1 | 80 3 | 9 | English | user3 | 70 1 | 6 | math | user3 | 100 2 | 4 | math | user1 | 90 3 | 5 | math | user2 | 80 (9 rows)
6,lag() OVER()可以获取行偏移offset那行字段的数据。
postgres=# SELECT LAG(id,-1) OVER(),* FROM scores; lag | id | subject | stu_name | score -----+----+---------+----------+------- 2 | 1 | Chinese | user1 | 80 3 | 2 | Chinese | user2 | 90 4 | 3 | Chinese | user3 | 90 5 | 4 | math | user1 | 90 6 | 5 | math | user2 | 80 7 | 6 | math | user3 | 100 8 | 7 | English | user1 | 80 9 | 8 | English | user2 | 90 | 9 | English | user3 | 70 (9 rows) postgres=# SELECT LAG(id,1,100) OVER(),* FROM scores;--不存在时指定默认值 lag | id | subject | stu_name | score -----+----+---------+----------+------- 100 | 1 | Chinese | user1 | 80 1 | 2 | Chinese | user2 | 90 2 | 3 | Chinese | user3 | 90 3 | 4 | math | user1 | 90 4 | 5 | math | user2 | 80 5 | 6 | math | user3 | 100 6 | 7 | English | user1 | 80 7 | 8 | English | user2 | 90 8 | 9 | English | user3 | 70 (9 rows)
7,first_value() OVER()用来取结果集每一个分组的第一行数据的字段值。
postgres=# SELECT first_value(score) OVER(PARTITION BY subject ORDER BY score DESC),* FROM scores; first_value | id | subject | stu_name | score -------------+----+---------+----------+------- 90 | 2 | Chinese | user2 | 90 90 | 3 | Chinese | user3 | 90 90 | 1 | Chinese | user1 | 80 90 | 8 | English | user2 | 90 90 | 7 | English | user1 | 80 90 | 9 | English | user3 | 70 100 | 6 | math | user3 | 100 100 | 4 | math | user1 | 90 100 | 5 | math | user2 | 80 (9 rows)
8,last_value() OVER()用来取结果集每一个分组的最后一行数据的字段值。
postgres=# SELECT last_value(score) OVER(PARTITION BY subject),* FROM scores; last_value | id | subject | stu_name | score ------------+----+---------+----------+------- 90 | 1 | Chinese | user1 | 80 90 | 2 | Chinese | user2 | 90 90 | 3 | Chinese | user3 | 90 90 | 9 | English | user3 | 70 90 | 7 | English | user1 | 80 90 | 8 | English | user2 | 90 100 | 4 | math | user1 | 90 100 | 5 | math | user2 | 80 100 | 6 | math | user3 | 100 (9 rows)
9,nth_value() OVER()用来取结果集每一个分组的指定行数据的字段值。
postgres=# SELECT nth_value(score,2) OVER(PARTITION BY subject),* FROM scores; nth_value | id | subject | stu_name | score -----------+----+---------+----------+------- 90 | 1 | Chinese | user1 | 80 90 | 2 | Chinese | user2 | 90 90 | 3 | Chinese | user3 | 90 80 | 9 | English | user3 | 70 80 | 7 | English | user1 | 80 80 | 8 | English | user2 | 90 80 | 4 | math | user1 | 90 80 | 5 | math | user2 | 80 80 | 6 | math | user3 | 100 (9 rows)
10,如果窗口函数需要多次使用,可以使用窗口函数别名。
postgres=# SELECT avg(score) OVER(r),sum(score) OVER(r),* FROM scores WINDOW r AS (PARTITION BY subject); avg | sum | id | subject | stu_name | score ---------------------+-----+----+---------+----------+------- 86.6666666666666667 | 260 | 1 | Chinese | user1 | 80 86.6666666666666667 | 260 | 2 | Chinese | user2 | 90 86.6666666666666667 | 260 | 3 | Chinese | user3 | 90 80.0000000000000000 | 240 | 9 | English | user3 | 70 80.0000000000000000 | 240 | 7 | English | user1 | 80 80.0000000000000000 | 240 | 8 | English | user2 | 90 90.0000000000000000 | 270 | 4 | math | user1 | 90 90.0000000000000000 | 270 | 5 | math | user2 | 80 90.0000000000000000 | 270 | 6 | math | user3 | 100 (9 rows)
郭慕荣博客园