Spark-用户应用程序入门

 1 /* 
2
 Spark Standalone模式下的Application  3 Application是Spark中类似于Hadoop的Job的用户提交的应用。sc是Spark集群初始化时创建的SparkContext,Spark中包含Action算子和Transferer(lazy)算子。有宽依赖和窄依赖。默认情况下Spark的调度器(DAGScheduler)是FIFO方式。
4
*/ 5 //默认排序输出到磁盘文件 6 scala> val r1 = sc.textFile("/root/rdd1.txt").flatMap(_.split(" ")).map(x=>(x,1)).reduceByKey(_+_).saveAsTextFile("/root/rddOut/noSort") 7 FileOutputCommitter: Saved output of task 'attempt_201507140546_0014_m_000000_14' to file:/root/rddOut/noSort/_temporary/0/task_201507140546_0014_m_000000 8 9 10 //字典序正序排序输出到磁盘文件 11 val r1 = sc.textFile("/root/rdd1.txt").flatMap(_.split(" ")).map(x=>(x,1)).reduceByKey(_+_).sortByKey(true).saveAsTextFile("/root/rddOut/zSort") 12 FileOutputCommitter: Saved output of task 'attempt_201507140546_0017_m_000000_17' to file:/root/rddOut/zSort/_temporary/0/task_201507140546_0017_m_000000 13 14 15 //字典序倒序排序输出到磁盘文件 16 val r1 = sc.textFile("/root/rdd1.txt").flatMap(_.split(" ")).map(x=>(x,1)).reduceByKey(_+_).sortByKey(false).saveAsTextFile("/root/rddOut/fSort") 17 FileOutputCommitter: Saved output of task 'attempt_201507140547_0020_m_000000_20' to file:/root/rddOut/fSort/_temporary/0/task_201507140547_0020_m_000000 18 19 21 //spark-shell.sh中的的wordcount 22 val word = sc.textFile("hdfs://soy1:9000/mapreduces/word.txt") 23 word.flatMap(_.split("\t")).map(x=>(x,1)).reduceByKey(_+_).sortByKey(false).collect

通常在spark-shell中测试,但实际生产中则会使用IDE工具开发Spark程序。接下来我这里演示使用IDEA编写Scala版(建议使用Scala语言开发Spark的应用程序)的WordCount如下:

1、使用IDEA工具创建SparkTest项目,导入依赖包spark-assembly-1.4.1-hadoop2.6.0.jar,创建WordCountApp.scala,最后编译为SparkTest_jar.jar。


2、使用WinSCP上传SparkTest_jar到soy2节点的/usr/local/soft/下,准备运行Spark的WordCount程序。

3、进入Spark安装目录,提交Spark作业,我这里使用的是Spark on YARN-cluster模式
[root@soy2 spark-1.4.1-bin-hadoop2.6]# cd /usr/local/installs/spark-1.4.1-bin-hadoop2.6
[root@soy2 spark-1.4.1-bin-hadoop2.6]# bin/spark-submit --class com.mengyao.spark.app.WordCountApp --master yarn-cluster --num-executors 1 --driver-memory 512m --executor-cores 1 /usr/local/soft/SparkTest_jar.jar hdfs://soy1:9000/mapreduces/word.txt hdfs://soy1:9000/spark1
15/11/01 09:30:16 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
15/11/01 09:30:17 INFO yarn.Client: Requesting a new application from cluster with 3 NodeManagers
15/11/01 09:30:17 INFO yarn.Client: Verifying our application has not requested more than the maximum memory capability of the cluster (8192 MB per container)
15/11/01 09:30:17 INFO yarn.Client: Will allocate AM container, with 896 MB memory including 384 MB overhead
15/11/01 09:30:17 INFO yarn.Client: Setting up container launch context for our AM
15/11/01 09:30:17 INFO yarn.Client: Preparing resources for our AM container
15/11/01 09:30:18 INFO yarn.Client: Uploading resource file:/usr/local/installs/spark-1.4.1-bin-hadoop2.6/lib/spark-assembly-1.4.1-hadoop2.6.0.jar -> hdfs://ns1/user/root/.sparkStaging/application_1446335733827_0001/spark-assembly-1.4.1-hadoop2.6.0.jar
15/11/01 09:30:27 INFO yarn.Client: Uploading resource file:/usr/local/soft/SparkTest_jar.jar -> hdfs://ns1/user/root/.sparkStaging/application_1446335733827_0001/SparkTest_jar.jar
15/11/01 09:30:37 INFO yarn.Client: Uploading resource file:/tmp/spark-d3853661-1371-4c4f-86aa-18a432b6045b/__hadoop_conf__694912348735860853.zip -> hdfs://ns1/user/root/.sparkStaging/application_1446335733827_0001/__hadoop_conf__694912348735860853.zip
15/11/01 09:30:37 INFO yarn.Client: Setting up the launch environment for our AM container
15/11/01 09:30:37 INFO spark.SecurityManager: Changing view acls to: root
15/11/01 09:30:37 INFO spark.SecurityManager: Changing modify acls to: root
15/11/01 09:30:37 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); users with modify permissions: Set(root)
15/11/01 09:30:38 INFO yarn.Client: Submitting application 1 to ResourceManager
15/11/01 09:30:38 INFO impl.YarnClientImpl: Submitted application application_1446335733827_0001
15/11/01 09:30:39 INFO yarn.Client: Application report for application_1446335733827_0001 (state: ACCEPTED)
15/11/01 09:30:39 INFO yarn.Client:
     client token: N/A
     diagnostics: N/A
     ApplicationMaster host: N/A
     ApplicationMaster RPC port: -1
     queue: default
     start time: 1446341438409
     final status: UNDEFINED
     tracking URL: http://soy3:8088/proxy/application_1446335733827_0001/
     user: root
15/11/01 09:30:40 INFO yarn.Client: Application report for application_1446335733827_0001 (state: ACCEPTED)
15/11/01 09:30:41 INFO yarn.Client: Application report for application_1446335733827_0001 (state: ACCEPTED)
15/11/01 09:30:42 INFO yarn.Client: Application report for application_1446335733827_0001 (state: ACCEPTED)
15/11/01 09:30:43 INFO yarn.Client: Application report for application_1446335733827_0001 (state: ACCEPTED)
15/11/01 09:30:44 INFO yarn.Client: Application report for application_1446335733827_0001 (state: ACCEPTED)
15/11/01 09:30:45 INFO yarn.Client: Application report for application_1446335733827_0001 (state: ACCEPTED)
15/11/01 09:30:46 INFO yarn.Client: Application report for application_1446335733827_0001 (state: ACCEPTED)
15/11/01 09:30:47 INFO yarn.Client: Application report for application_1446335733827_0001 (state: ACCEPTED)
15/11/01 09:30:48 INFO yarn.Client: Application report for application_1446335733827_0001 (state: ACCEPTED)
15/11/01 09:30:49 INFO yarn.Client: Application report for application_1446335733827_0001 (state: ACCEPTED)
15/11/01 09:30:50 INFO yarn.Client: Application report for application_1446335733827_0001 (state: ACCEPTED)
15/11/01 09:30:51 INFO yarn.Client: Application report for application_1446335733827_0001 (state: ACCEPTED)
15/11/01 09:30:52 INFO yarn.Client: Application report for application_1446335733827_0001 (state: ACCEPTED)
15/11/01 09:30:53 INFO yarn.Client: Application report for application_1446335733827_0001 (state: ACCEPTED)
15/11/01 09:30:54 INFO yarn.Client: Application report for application_1446335733827_0001 (state: ACCEPTED)
15/11/01 09:30:55 INFO yarn.Client: Application report for application_1446335733827_0001 (state: ACCEPTED)
15/11/01 09:30:56 INFO yarn.Client: Application report for application_1446335733827_0001 (state: ACCEPTED)
15/11/01 09:30:57 INFO yarn.Client: Application report for application_1446335733827_0001 (state: ACCEPTED)
15/11/01 09:30:58 INFO yarn.Client: Application report for application_1446335733827_0001 (state: ACCEPTED)
15/11/01 09:30:59 INFO yarn.Client: Application report for application_1446335733827_0001 (state: ACCEPTED)
15/11/01 09:31:00 INFO yarn.Client: Application report for application_1446335733827_0001 (state: ACCEPTED)
15/11/01 09:31:01 INFO yarn.Client: Application report for application_1446335733827_0001 (state: RUNNING)
15/11/01 09:31:01 INFO yarn.Client:
     client token: N/A
     diagnostics: N/A
     ApplicationMaster host: 192.168.1.107
     ApplicationMaster RPC port: 0
     queue: default
     start time: 1446341438409
     final status: UNDEFINED
     tracking URL: http://soy3:8088/proxy/application_1446335733827_0001/
     user: root
15/11/01 09:31:02 INFO yarn.Client: Application report for application_1446335733827_0001 (state: RUNNING)
15/11/01 09:31:03 INFO yarn.Client: Application report for application_1446335733827_0001 (state: RUNNING)
15/11/01 09:31:04 INFO yarn.Client: Application report for application_1446335733827_0001 (state: RUNNING)
15/11/01 09:31:05 INFO yarn.Client: Application report for application_1446335733827_0001 (state: RUNNING)
15/11/01 09:31:06 INFO yarn.Client: Application report for application_1446335733827_0001 (state: RUNNING)
15/11/01 09:31:07 INFO yarn.Client: Application report for application_1446335733827_0001 (state: RUNNING)
15/11/01 09:31:08 INFO yarn.Client: Application report for application_1446335733827_0001 (state: RUNNING)
15/11/01 09:31:09 INFO yarn.Client: Application report for application_1446335733827_0001 (state: RUNNING)
15/11/01 09:31:10 INFO yarn.Client: Application report for application_1446335733827_0001 (state: RUNNING)
15/11/01 09:31:11 INFO yarn.Client: Application report for application_1446335733827_0001 (state: RUNNING)
15/11/01 09:31:12 INFO yarn.Client: Application report for application_1446335733827_0001 (state: RUNNING)
15/11/01 09:31:13 INFO yarn.Client: Application report for application_1446335733827_0001 (state: RUNNING)
15/11/01 09:31:14 INFO yarn.Client: Application report for application_1446335733827_0001 (state: RUNNING)
15/11/01 09:31:15 INFO yarn.Client: Application report for application_1446335733827_0001 (state: RUNNING)
15/11/01 09:31:16 INFO yarn.Client: Application report for application_1446335733827_0001 (state: RUNNING)
15/11/01 09:31:17 INFO yarn.Client: Application report for application_1446335733827_0001 (state: RUNNING)
15/11/01 09:31:18 INFO yarn.Client: Application report for application_1446335733827_0001 (state: RUNNING)
15/11/01 09:31:19 INFO yarn.Client: Application report for application_1446335733827_0001 (state: RUNNING)
15/11/01 09:31:20 INFO yarn.Client: Application report for application_1446335733827_0001 (state: RUNNING)
15/11/01 09:31:21 INFO yarn.Client: Application report for application_1446335733827_0001 (state: RUNNING)
15/11/01 09:31:22 INFO yarn.Client: Application report for application_1446335733827_0001 (state: RUNNING)
15/11/01 09:31:23 INFO yarn.Client: Application report for application_1446335733827_0001 (state: RUNNING)
15/11/01 09:31:24 INFO yarn.Client: Application report for application_1446335733827_0001 (state: RUNNING)
15/11/01 09:31:25 INFO yarn.Client: Application report for application_1446335733827_0001 (state: RUNNING)
15/11/01 09:31:26 INFO yarn.Client: Application report for application_1446335733827_0001 (state: RUNNING)
15/11/01 09:31:27 INFO yarn.Client: Application report for application_1446335733827_0001 (state: RUNNING)
15/11/01 09:31:28 INFO yarn.Client: Application report for application_1446335733827_0001 (state: RUNNING)
15/11/01 09:31:29 INFO yarn.Client: Application report for application_1446335733827_0001 (state: RUNNING)
15/11/01 09:31:30 INFO yarn.Client: Application report for application_1446335733827_0001 (state: RUNNING)
15/11/01 09:31:31 INFO yarn.Client: Application report for application_1446335733827_0001 (state: RUNNING)
15/11/01 09:31:32 INFO yarn.Client: Application report for application_1446335733827_0001 (state: RUNNING)
15/11/01 09:31:33 INFO yarn.Client: Application report for application_1446335733827_0001 (state: RUNNING)
15/11/01 09:31:34 INFO yarn.Client: Application report for application_1446335733827_0001 (state: RUNNING)
15/11/01 09:31:35 INFO yarn.Client: Application report for application_1446335733827_0001 (state: FINISHED)
15/11/01 09:31:35 INFO yarn.Client:
     client token: N/A
     diagnostics: N/A
     ApplicationMaster host: 192.168.1.107
     ApplicationMaster RPC port: 0
     queue: default
     start time: 1446341438409
     final status: SUCCEEDED
     tracking URL: http://soy3:8088/proxy/application_1446335733827_0001/A
     user: root
15/11/01 09:31:35 INFO yarn.Client: Deleting staging directory .sparkStaging/application_1446335733827_0001
15/11/01 09:31:35 INFO util.Utils: Shutdown hook called
15/11/01 09:31:35 INFO util.Utils: Deleting directory /tmp/spark-d3853661-1371-4c4f-86aa-18a432b6045b

4、作业执行成功后,查看Spark输出到HDFS上的结果文件,如下::
4.1、检查HDFS
    [root@soy2 spark-1.4.1-bin-hadoop2.6]# hdfs dfs -ls /spark1
    Found 3 items
    -rw-r--r--   3 root supergroup          0 2015-11-01 09:31 /spark1/_SUCCESS
    -rw-r--r--   3 root supergroup         62 2015-11-01 09:31 /spark1/part-00000
    -rw-r--r--   3 root supergroup         71 2015-11-01 09:31 /spark1/part-00001
    [root@soy2 spark-1.4.1-bin-hadoop2.6]# hdfs dfs -cat /spark1/*
    (zookeeper,3)
    (storm,1)
    (sqoop,1)
    (spark,1)
    (redis,1)
    (pig,1)
    (mllib,1)
    (mahout,1)
    (kafka,1)
    (hive,2)
    (hbase,1)
    (hadoop,3)
    (flume,1)
    [root@soy2 spark-1.4.1-bin-hadoop2.6]#
4.2、查看YARN中的作业运行记录

到这里,Spark入门应用WordCount程序就运行成功了。

接下来,使用Spark的Java版本开发本地WordCount程序和集群模式的WordCount程序。

本地WordCount程序,如下

package com.mengyao.javaspark.application;

import java.util.Arrays;

import org.apache.spark.SparkConf;
import org.apache.spark.api.java.JavaPairRDD;
import org.apache.spark.api.java.JavaRDD;
import org.apache.spark.api.java.JavaSparkContext;
import org.apache.spark.api.java.function.FlatMapFunction;
import org.apache.spark.api.java.function.Function2;
import org.apache.spark.api.java.function.PairFunction;
import org.apache.spark.api.java.function.VoidFunction;

import scala.Tuple2;

/**
 * Local WordCount Application
 * @author mengyao
 *
 */
public class WordCountApp {

    public static void main(String[] args) {
        SparkConf conf = new SparkConf().setAppName("wordcount").setMaster("local");
        JavaSparkContext sc = new JavaSparkContext(conf);
        JavaRDD<String> lines = sc.textFile("D:/word.txt");
        JavaRDD<String> words = lines.flatMap(new FlatMapFunction<String, String>() {
            private static final long serialVersionUID = 921776559012714233L;
            @Override
            public Iterable<String> call(String line) throws Exception {
                return Arrays.asList(line.split("\t"));
            }
        });
        JavaPairRDD<String, Integer> pairs = words.mapToPair(new PairFunction<String, String, Integer>() {
            private static final long serialVersionUID = -1823807293840503627L;
            @Override
            public Tuple2<String, Integer> call(String word) throws Exception {
                return new Tuple2<String, Integer>(word, 1);
            }
        });
        JavaPairRDD<String, Integer> wordcounts = pairs.reduceByKey(new Function2<Integer, Integer, Integer>() {
            private static final long serialVersionUID = 6902960074353889312L;
            @Override
            public Integer call(Integer v1, Integer v2) throws Exception {
                return v1+v2;
            }
        });
        wordcounts.foreach(new VoidFunction<Tuple2<String,Integer>>() {
            private static final long serialVersionUID = 2076598418130686106L;
            @Override
            public void call(Tuple2<String, Integer> wordCount) throws Exception {
                System.out.println(wordCount._1+"\t"+wordCount._2);
            }
        });
        sc.close();
    }
    
}

 

posted @ 2015-07-20 20:30  孟尧  阅读(403)  评论(0编辑  收藏  举报