mysql 中 order by 对 group by 的影响

order by 对 group by 的影响

order by 通常意义上来说,如果不是特别看过相关资料,一般认为order by 对goup by 无影响即可,也就是排序之后分组,和不排序分组 是一样,``
即一般来说在内部分组,之后对分组进行聚合函数,在之后进行排序。
一般形如:

select id , juheId ,  habitId , noteId , noteCreator ,  userTagType ,
    existState ,status, noteCreateTime ,  totalLike ,  totalComment ,
    totalLikeComment ,  createTime 
   from ( 
   select id , juhe_id as juheId , habit_id as habitId , note_id as noteId , note_creator as noteCreator , user_tag_type as userTagType ,
   exist_state as  existState ,status, note_create_time as noteCreateTime , max(total_like) as  totalLike , total_comment as totalComment ,
   total_like_comment as totalLikeComment , create_time as createTime 
   from note_pool  where juhe_id  =  109909
   group by noteCreator
   ) a 
   order by totalLike desc ,totalComment desc, noteCreateTime desc
   limit 101

在内部进行 分组,同时对分组 应用聚合函数 max(), 最后在外层进行排序,同时取 topN

但是如果考虑到 order by 对 group by 的影响的情况下,可以使用一种比较取巧的方式,先排序,之后分组,同时不应用聚合函数,默认取分组中考前的,同样可以达到按某个顺序来取分组中的行
即形如:

select id , juheId ,  habitId , noteId , noteCreator ,  userTagType ,
    existState ,status, noteCreateTime ,  totalLike ,  totalComment ,
    totalLikeComment ,  createTime 
   from ( 
   select id , juhe_id as juheId , habit_id as habitId , note_id as noteId , note_creator as noteCreator , user_tag_type as userTagType ,
   exist_state as  existState ,status, note_create_time as noteCreateTime , total_like as  totalLike , total_comment as totalComment ,
   total_like_comment as totalLikeComment , create_time as createTime 
   from note_pool  where juhe_id  =  109909 
   order by total_like desc ,total_comment desc, note_create_time desc 
   ) a 
   group by noteCreator 
   order by totalLike desc ,totalComment desc, noteCreateTime desc
   limit 101

这个sql 可以达到和上面的sql同一个效果,但是却不太好理解。

当然这个是有条件的,mysql 5.6 的时候是可以的,但是到 mysql5.7 之后,mysql 进行了相关优化,内层的 order by 如果不加 limit 的话,那么内层排序,将对外层不可见,也就是说mysql 在5.7之后,内层有序
不保证外层用数据的时候同样有序(在内层无 limit 的情况)。

一般来说,在业务库中不要进行 group by 的操作,因为这种操作一般会进行 文件排序,会给msql 带来较大的压力,
将该业务交由 数据组进行支持,具体的计算由数据组在其他集群资源进行计算,将聚合结果同步至相关业务库,使用方通过 相关rpc 或者微服务进行调用即可。

参考资料:https://segmentfault.com/q/1010000011303563

posted @ 2020-08-25 20:50  jzczer  阅读(270)  评论(0编辑  收藏  举报