关于postgresql group by 报错
举个例子:
table name:makerar
cname | wmname | avg
--------+-------------+------------------------
canada | zoro | 2.0000000000000000
spain | luffy | 1.00000000000000000000
spain | usopp | 5.0000000000000000
执行语句:
1 SELECT cname, wmname, MAX(avg) FROM makerar GROUP BY cname;
错误:
ERROR: column "makerar.wmname" must appear in the GROUP BY clause or be used in an
aggregate function LINE 1: SELECT cname, wmname, MAX(avg) FROM makerar GROUP BY cname;
首先这个错误的原因是因为:
Before SQL3 (1999), the selected fields must appear in the GROUP BY clause[*].Interestingly enough,
even though the spec sort of allows to select non-grouped fields, major engines seem to not really like it.
Oracle and SQLServer just don't allow this at all. Mysql used to allow it by default, but now since 5.7 the
administrator needs to enable this option (ONLY_FULL_GROUP_BY) manually in the server configuration
for this feature to be supported.
翻译过来就主要是:在SQL3(1999)标准之前,select 的字段必须也放在group by 的语句里(因为当如未
在group的相同字段出现不同值时,数据库引擎便不知道刚显示什么了,如上例)。主要的数据库引擎都不允
许这样的操作(有selected field 不放在group by中),即使mysql在5.7版本后也需要打开一个选项才能使用。
这种操作在mysql上运行的情况:it doesn't work "well" in mysql -- in fact, they actually warn you in the docs
that if you do it, and all the "hidden" columns (those not in the GROUP BY) aren't 1-to-1 with the GROUP BY
columns, then the results are unpredictable in every other database you just plain can't do it, so i wouldn't call
what mysql does "doing it well"。
解决办法:
1.先在子查询里进行聚合运算(sum,max等),在通过join连接
1 SELECT m.cname, m.wmname, t.mx FROM ( SELECT cname, MAX(avg) AS mx 2 FROM makerar GROUP BY cname ) t 3 JOIN makerar m ON m.cname = t.cname AND t.mx = m.avg ;
cname | wmname | mx
--------+--------+------------------------
canada | zoro | 2.0000000000000000
spain | usopp | 5.0000000000000000
2.使用window functions
1 SELECT cname, wmname, MAX(avg) OVER (PARTITION BY cname) AS mx FROM makerar;
cname | wmname | mx
--------+--------+------------------------
canada | zoro | 2.0000000000000000
spain | luffy | 5.0000000000000000
spain | usopp | 5.0000000000000000
要去掉mx重复的话:
SELECT DISTINCT /* distinct here matters, because maybe there are various tuples for the same max value */ m.cname, m.wmname, t.avg AS mx FROM ( SELECT cname, wmname, avg, ROW_NUMBER() OVER (PARTITION BY avg DESC) AS rn FROM makerar ) t JOIN makerar m ON m.cname = t.cname AND m.wmname = t.wmname AND t.rn = 1 ;
cname | wmname | mx
--------+--------+------------------------
canada | zoro | 2.0000000000000000
spain | usopp | 5.0000000000000000
3.使用 DISTINCT ON
1 SELECT DISTINCT ON (cname) cname, wmname, avg FROM makerar ORDER BY cname, avg DESC ;