rdd取不到配置文件读出的值--使用广播变量解决

问题描述

spark读取配置文件读取成功后,rdd中未拿到配置文件的值(executor未拿到配置文件的值,但是driver有这个值)

解决方案

将所需要的对象通过广播发送到各个executor

code:

object BroadcastDemo {

  var c1 = 0
  var c2 = 0

  def main(args: Array[String]): Unit = {
    val conf = new SparkConf().setAppName("BroadcostDemo").setMaster(args(0))
    val sc = new SparkContext(conf)

    c1 = 10

    val rdd1 = sc.parallelize(1 to 10, 1)
    val c2_init = 10
    c2 = sc.broadcast(c2_init).value
    val c3_init = 10
    var c3 = sc.broadcast(c3_init).value

    rdd1.mapPartitions(t => {
      System.out.println("get c1:" + c1)
      System.out.println("get c2:" + c2)
      System.out.println("get c3:" + c3)
      t
    }).collect()
  }

}

启动参数

spark-submit --class com.blue.spark.demo.BroadcastDemo  \
--master yarn-cluster  --num-executors 1  \
--driver-memory 1g --executor-memory 1g   --executor-cores 1  \
/tmp/broadcast-demo.jar yarn-cluster

输出结果

get c1:0
get c2:0
get c3:10

分析

  1. c1由于存放在driver,mapPartitions运行在executor,driver和executor不在同一台机器上,故不到c1更改后的值
  2. c2也是定义在driver的,所以就是使用broadcast广播后,也未生效
  3. c3由于是通过broadcast设置一个广播值,然后通过value将这个变量和c3绑定,所以在executor可以拿到c3的值
posted @ 2018-10-10 19:12  shenjie2017  阅读(289)  评论(0编辑  收藏  举报