|NO.Z.00034|——————————|BigDataEnd|——|Hadoop&Spark.V08|——|Spark.v08|sparkcore|RDD编程高阶&RDD累加器|
一、RDD累加器
### --- 累加器
~~~ 累加器的作用:可以实现一个变量在不同的 Executor 端能保持状态的累加;
~~~ 累计器在 Driver 端定义,读取;在 Executor 中完成累加;
~~~ 累加器也是 lazy 的,需要 Action 触发;Action触发一次,执行一次,触发多次,执行多次;
~~~ 累加器一个比较经典的应用场景是用来在 Spark Streaming 应用中记录某些事件的数量;
二、累加器使用示例
### --- 累加器使用示例
val data = sc.makeRDD(Seq("hadoop map reduce", "spark mllib"))
~~~ # 方式1
scala> val count1 = data.flatMap(line => line.split("\\s+")).map(word => 1).reduce(_ + _)
count1: Int = 5
scala> println(count1)
5
~~~ # 方式2:错误的方式
~~~ # 在Driver中定义变量,每个运行的Task会得到这些变量的一份新的副本,
~~~ 但在Task中更新这些副本的值不会影响Driver中对应变量的值
scala> var acc = 0
acc: Int = 0
scala> data.flatMap(line => line.split("\\s+")).foreach(word => acc += 1)
scala> println(acc)
0
三、spark内置的三种类型的累加器
### --- Spark内置了三种类型的累加器,分别是
~~~ LongAccumulator 用来累加整数型
~~~ DoubleAccumulator 用来累加浮点型
~~~ CollectionAccumulator 用来累加集合元素
scala> val data = sc.makeRDD("hadoop spark hive hbase java scala hello world spark scala java hive".split("\\s+"))
data: org.apache.spark.rdd.RDD[String] = ParallelCollectionRDD[4] at makeRDD at <console>:24
scala> val acc1 = sc.longAccumulator("totalNum1")
acc1: org.apache.spark.util.LongAccumulator = LongAccumulator(id: 50, name: Some(totalNum1), value: 0)
scala> val acc2 = sc.doubleAccumulator("totalNum2")
acc2: org.apache.spark.util.DoubleAccumulator = DoubleAccumulator(id: 51, name: Some(totalNum2), value: 0.0)
scala> val acc3 = sc.collectionAccumulator[String]("allWords")
acc3: org.apache.spark.util.CollectionAccumulator[String] = CollectionAccumulator(id: 52, name: Some(allWords), value: [])
scala> val rdd = data.map { word =>
| acc1.add(word.length)
| acc2.add(word.length)
| acc3.add(word)
| word
| }
rdd: org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[5] at map at <console>:31
scala> rdd.count
res3: Long = 12
scala> rdd.collect
res4: Array[String] = Array(hadoop, spark, hive, hbase, java, scala, hello, world, spark, scala, java, hive)
scala> println(acc1.value)
114
scala> println(acc2.value)
114.0
scala> println(acc3.value)
[spark, scala, java, hive, hadoop, spark, hive, hbase, java, scala, hello, world, java, scala, hello, world, hadoop, spark, hive, hbase, spark, scala, java, hive]
Walter Savage Landor:strove with none,for none was worth my strife.Nature I loved and, next to Nature, Art:I warm'd both hands before the fire of life.It sinks, and I am ready to depart
——W.S.Landor
分类:
bdv016-spark.v01
【推荐】国内首个AI IDE,深度理解中文开发场景,立即下载体验Trae
【推荐】编程新体验,更懂你的AI,立即体验豆包MarsCode编程助手
【推荐】抖音旗下AI助手豆包,你的智能百科全书,全免费不限次数
【推荐】轻量又高性能的 SSH 工具 IShell:AI 加持,快人一步
· 无需6万激活码!GitHub神秘组织3小时极速复刻Manus,手把手教你使用OpenManus搭建本
· Manus爆火,是硬核还是营销?
· 终于写完轮子一部分:tcp代理 了,记录一下
· 别再用vector<bool>了!Google高级工程师:这可能是STL最大的设计失误
· 单元测试从入门到精通