在Yarn集群上跑spark wordcount任务

准备的测试数据文件hello.txt

hello scala
hello world
nihao hello
i am scala
this is spark demo
gan jiu wan le

将文件上传到hdfs中

#创建hdfs测试目录
hdfs dfs -mkdir /user/spark/input/
#上传本地文件hello.txt到hdfs
 hdfs dfs -put ./hello.txt /user/spark/input/

代码（改为读取hdfs上的数据，并写入hdfs）

package org.example

import org.apache.spark.{SparkConf, SparkContext}

/**
 * spark-submit --master yarn --class org.example.SparkWordCountYarn /tmp/test/sparkwordcount2-1.0-SNAPSHOT.jar hdfs://hadoop1:8020/user/spark/input/hello.txt hdfs://hadoop1:8020/user/spark/output/helloOutput
 */
object SparkWordCountYarn {

  def main(args: Array[String]): Unit = {
    val conf = new SparkConf()
      .setAppName("WordCount")
      .setMaster("yarn")

    val srcFile = args(0)
    val outPutFile = args(1)
    val sc = new SparkContext(conf)
    val data = sc.textFile(srcFile)
    data.flatMap(_.split(" "))
      .map((_, 1))
      .reduceByKey(_+_)
      .saveAsTextFile(outPutFile)
  }
}

执行提交spark人物命令

spark-submit --master yarn --class org.example.SparkWordCountYarn /tmp/test/sparkwordcount2-1.0-SNAPSHOT.jar hdfs://hadoop1:8020/user/spark/input/hello.txt hdfs://hadoop1:8020/user/spark/output/helloOutput

执行结果

posted @ 2022-01-05 16:56 明月照江江阅读(67) 评论(0) 编辑收藏举报

刷新页面返回顶部

明月照江江的技术博客

在Yarn集群上跑spark wordcount任务

公告