Action类算子(行动类算子)
1、RDD的转换,将RDD转换为map:rdd.collectAsMap()
val rdd = sc.parallelize(Array[(String, Int)](
("zhangsan", 18),
("lisi", 19),
("wangwu", 20),
("maliu", 21)
))
rdd.foreach(println)
/**
* (zhangsan,18)
* (lisi,19)
* (wangwu,20)
* (maliu,21)
*/
val map: collection.Map[String, Int] = rdd.collectAsMap()
map.foreach(println)
/**
* (lisi,19)
* (maliu,21)
* (zhangsan,18)
* (wangwu,20)
*/
2、按key统计RDD数据:rdd.countByKey
val rdd :RDD[(String,Int)]= sc.parallelize(Array[(String, Int)](
("zhangsan", 1),
("lisi", 2),
("wangwu", 3),
("lisi", 4),
("zhangsan", 5)
))
val map: collection.Map[String, Long] = rdd.countByKey()
map.foreach(println)
/**
* (zhangsan,2)
* (wangwu,1)
* (lisi,2)
*/
3、按key统计RDD数据:rdd.countByValue
val rdd = sc.parallelize(Array[String]("a", "b", "c", "d", "e", "a", "f", "c", "a"))
val map: collection.Map[String, Long] = rdd.countByValue()
map.foreach(println)
/**
* (e,1)
* (f,1)
* (a,3)
* (b,1)
* (c,2)
* (d,1)
*/
或
val rdd :RDD[(String,Int)]= sc.parallelize(Array[(String, Int)](
("zhangsan", 1),
("lisi", 2),
("wangwu", 3),
("lisi", 1),
("zhangsan", 1)
))
val map: collection.Map[(String, Int), Long] = rdd.countByValue()
map.foreach(println)
/**
* ((lisi,1),1)
* ((lisi,2),1)
* ((wangwu,3),1)
* ((zhangsan,1),2)
*/
4、foreach:对RDD数据进行遍历
5、count:对数据进行计数,统计RDD的行数
val lines = sc.textFile("./data/words")
val totalCount: Long = lines.count()
6、take:take(num) 可以获取RDD中前num条数据,会将数据获取到Driver端
val lines = sc.textFile("./data/words")
val strings: Array[String] = lines.take(5)
strings.foreach(println)
7、first:获取RDD中的第一条数据
val lines = sc.textFile("./data/words")
val str: String = lines.first()
8、collect:将数据回收到Driver端,放入集合中
val lines = sc.textFile("./data/words")
val totalCount: Long = lines.count()
9、对RDD中的数据去取样:rdd.takeSample
val rdd1 = sc.parallelize(Array[String]("a","b","c","d","e","f","g"))
val result = rdd1.takeSample(false,3,100L)
result.foreach(println)
/**
* f
* b
* a
*/
10、rdd.takeOrdered(num)
val rdd = sc.parallelize(Array[Int](1,2,3,4,5,6,7,8,9))
val result : Array[Int] = rdd.takeOrdered(4)
result.foreach(println)
/**
* 1
* 2
* 3
* 4
* 5
* 6
*/
11、rdd.top(num)
val rdd1 = sc.parallelize(Array[Int](1,2,3,4,5,6,7,8,9))
val result: Array[Int] = rdd1.top(4)
result.foreach(println)
/**
* 9
* 8
* 7
* 6
* 5
* 4
*/