在Yarn集群上跑spark wordcount任务

  1. 准备的测试数据文件hello.txt
hello scala
hello world
nihao hello
i am scala
this is spark demo
gan jiu wan le
  1. 将文件上传到hdfs中
#创建hdfs测试目录
hdfs dfs -mkdir /user/spark/input/
#上传本地文件hello.txt到hdfs
 hdfs dfs -put ./hello.txt /user/spark/input/
  1. 代码(改为读取hdfs上的数据,并写入hdfs)
package org.example

import org.apache.spark.{SparkConf, SparkContext}

/**
 * spark-submit --master yarn --class org.example.SparkWordCountYarn /tmp/test/sparkwordcount2-1.0-SNAPSHOT.jar hdfs://hadoop1:8020/user/spark/input/hello.txt hdfs://hadoop1:8020/user/spark/output/helloOutput
 */
object SparkWordCountYarn {

  def main(args: Array[String]): Unit = {
    val conf = new SparkConf()
      .setAppName("WordCount")
      .setMaster("yarn")

    val srcFile = args(0)
    val outPutFile = args(1)
    val sc = new SparkContext(conf)
    val data = sc.textFile(srcFile)
    data.flatMap(_.split(" "))
      .map((_, 1))
      .reduceByKey(_+_)
      .saveAsTextFile(outPutFile)
  }
}

  1. 执行提交spark人物命令
spark-submit --master yarn --class org.example.SparkWordCountYarn /tmp/test/sparkwordcount2-1.0-SNAPSHOT.jar hdfs://hadoop1:8020/user/spark/input/hello.txt hdfs://hadoop1:8020/user/spark/output/helloOutput
  1. 执行结果
    image
posted @ 2022-01-05 16:56  明月照江江  阅读(67)  评论(0编辑  收藏  举报