spark中agg函数的使用

以前在学这个函数的时候，上课睡着了，哈哈哈，没注意听，讲一下agg函数的用法。

首先，你需要先知道他的使用场景，知道使用场景了你才能灵活的去运用它。

我们一般主要使用它做一下分组后的聚合操作与groupBy函数一起使用，也可以单独使用对整体进行聚合操作。

下面给大家在网上找了一段非常不错的代码：

1   stuDF.groupBy("gender").agg(max("age"),min("age"),avg("age"),count("classId")).show()
2     //同样也可以这样写
3     //stuDF.groupBy("gender").agg("age"->"max","age"->"min","age"->"avg","id"->"count").show()
4  
5     stuDF.agg(max("age"),min("age"),avg("age"),count("classId")).show()
6  
7     stuDF.groupBy("classId","gender").agg(max("age"),min("age"),avg("age"),count("classId")).orderBy("classId").show()

然后结果区别：

+------+--------+--------+------------------+--------------+
|gender|max(age)|min(age)|          avg(age)|count(classId)|
+------+--------+--------+------------------+--------------+
|     F|      23|      20|21.333333333333332|             3|
|     M|      22|      16|              19.5|             4|
+------+--------+--------+------------------+--------------+
 
+--------+--------+------------------+--------------+
|max(age)|min(age)|          avg(age)|count(classId)|
+--------+--------+------------------+--------------+
|      23|      16|20.285714285714285|             7|
+--------+--------+------------------+--------------+
 
+-------+------+--------+--------+--------+--------------+
|classId|gender|max(age)|min(age)|avg(age)|count(classId)|
+-------+------+--------+--------+--------+--------------+
|   1001|     F|      20|      20|    20.0|             1|
|   1001|     M|      19|      19|    19.0|             1|
|   1002|     M|      16|      16|    16.0|             1|
|   1003|     M|      21|      21|    21.0|             1|
|   1003|     F|      23|      23|    23.0|             1|
|   1004|     F|      21|      21|    21.0|             1|
|   1004|     M|      22|      22|    22.0|             1|

posted @ 2021-11-09 20:11 小阿政阅读(1573) 评论(0) 编辑收藏举报

刷新页面返回顶部

小阿政

spark中agg函数的使用

公告