Action类算子(行动类算子)

1、RDD的转换,将RDD转换为maprdd.collectAsMap()

val rdd = sc.parallelize(Array[(String, Int)](
  ("zhangsan", 18),
  ("lisi", 19),
  ("wangwu", 20),
  ("maliu", 21)
))
rdd.foreach(println)
/**
 * (zhangsan,18)
 * (lisi,19)
 * (wangwu,20)
 * (maliu,21)
 */
val map: collection.Map[String, Int] = rdd.collectAsMap()
map.foreach(println)
/**
 * (lisi,19)
 * (maliu,21)
 * (zhangsan,18)
 * (wangwu,20)
 */

2、key统计RDD数据:rdd.countByKey

val rdd :RDD[(String,Int)]= sc.parallelize(Array[(String, Int)](
  ("zhangsan", 1),
  ("lisi", 2),
  ("wangwu", 3),
  ("lisi", 4),
  ("zhangsan", 5)
))
val map: collection.Map[String, Long] = rdd.countByKey()
map.foreach(println)
/**
 * (zhangsan,2)
 * (wangwu,1)
 * (lisi,2)
 */

3、key统计RDD数据:rdd.countByValue

val rdd = sc.parallelize(Array[String]("a", "b", "c", "d", "e", "a", "f", "c", "a"))
val map: collection.Map[String, Long] = rdd.countByValue()
map.foreach(println)
/**
 * (e,1)
 * (f,1)
 * (a,3)
 * (b,1)
 * (c,2)
 * (d,1)
 */

val rdd :RDD[(String,Int)]= sc.parallelize(Array[(String, Int)](
  ("zhangsan", 1),
  ("lisi", 2),
  ("wangwu", 3),
  ("lisi", 1),
  ("zhangsan", 1)
))
val map: collection.Map[(String, Int), Long] = rdd.countByValue()
map.foreach(println)
/**
 * ((lisi,1),1)
 * ((lisi,2),1)
 * ((wangwu,3),1)
 * ((zhangsan,1),2)
 */

4、foreach:对RDD数据进行遍历

5、count:对数据进行计数,统计RDD的行数

val lines = sc.textFile("./data/words")
val totalCount: Long = lines.count()

6、take:take(num) 可以获取RDD中前num条数据,会将数据获取到Driver端

val lines = sc.textFile("./data/words")
val strings: Array[String] = lines.take(5)
strings.foreach(println)

7、first:获取RDD中的第一条数据

val lines = sc.textFile("./data/words")
val str: String = lines.first()

8、collect:将数据回收到Driver端,放入集合中

val lines = sc.textFile("./data/words")
val totalCount: Long = lines.count()

9、RDD中的数据去取样:rdd.takeSample

 

val rdd1 = sc.parallelize(Array[String]("a","b","c","d","e","f","g"))
val result = rdd1.takeSample(false,3,100L)
result.foreach(println)
/**
 * f
 * b
 * a
 */

 

10、rdd.takeOrdered(num)

val rdd = sc.parallelize(Array[Int](1,2,3,4,5,6,7,8,9))
val result : Array[Int] = rdd.takeOrdered(4)
result.foreach(println)
/**
 * 1
 * 2
 * 3
 * 4
 * 5
 * 6
 */

11、rdd.top(num)

val rdd1 = sc.parallelize(Array[Int](1,2,3,4,5,6,7,8,9))
val result: Array[Int] = rdd1.top(4)
result.foreach(println)
/**
 * 9
 * 8
 * 7
 * 6
 * 5
 * 4
 */

 

posted @ 2021-04-19 11:13  大数据程序员  阅读(143)  评论(0编辑  收藏  举报