Spark集群安装与配置

　　一、Scala安装

　　1.https://www.scala-lang.org/download/2.11.12.html下载并复制到/home/jun下解压

[jun@master ~]$ cd scala-2.12.6/
[jun@master scala-2.12.6]$ ls -l
total 0
drwxrwxr-x. 2 jun jun 162 Apr 27 16:49 bin
drwxrwxr-x. 4 jun jun  86 Apr 27 16:49 doc
drwxrwxr-x. 2 jun jun 244 Apr 27 16:49 lib
drwxrwxr-x. 3 jun jun  18 Apr 27 16:49 man

　　2.启动Scala并使用Scala Shell

[jun@master scala-2.12.6]$ bin/scala
Welcome to Scala 2.12.6 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_171).
Type in expressions for evaluation. Or try :help.

scala> println("hello,world")
hello,world

scala> 5*9
res1: Int = 45

scala> 2*res1
res2: Int = 90

scala> :help
All commands can be abbreviated, e.g., :he instead of :help.
:completions <string>    output completions for the given string
:edit <id>|<line>        edit history
:help [command]          print this summary or command-specific help
:history [num]           show the history (optional num is commands to show)
:h? <string>             search the history
:imports [name name ...] show import history, identifying sources of names
:implicits [-v]          show the implicits in scope
:javap <path|class>      disassemble a file or class name
:line <id>|<line>        place line(s) at the end of history
:load <path>             interpret lines in a file
:paste [-raw] [path]     enter paste mode or paste a file
:power                   enable power user mode
:quit                    exit the interpreter
:replay [options]        reset the repl and replay all previous commands
:require <path>          add a jar to the classpath
:reset [options]         reset the repl to its initial state, forgetting all session entries
:save <path>             save replayable session to a file
:sh <command line>       run a shell command (result is implicitly => List[String])
:settings <options>      update compiler options, if possible; see reset
:silent                  disable/enable automatic printing of results
:type [-v] <expr>        display the type of an expression without evaluating it
:kind [-v] <type>        display the kind of a type. see also :help kind
:warnings                show the suppressed warnings from the most recent line which had any

scala> :quit

　　3.将Scala安装包复制到slave节点

　　二、Spark集群的安装与配置

　　采用Hadoop Yarn模式安装Spark

　　1.http://spark.apache.org/downloads.html下载spark-2.3.1-bin-hadoop2.7.tgz.gz并赋值到/home/jun下解压

[jun@master ~]$ cd spark-2.3.1-bin-hadoop2.7/
[jun@master spark-2.3.1-bin-hadoop2.7]$ ls -l
total 84
drwxrwxr-x. 2 jun jun  4096 Jun  2 04:49 bin
drwxrwxr-x. 2 jun jun   230 Jun  2 04:49 conf
drwxrwxr-x. 5 jun jun    50 Jun  2 04:49 data
drwxrwxr-x. 4 jun jun    29 Jun  2 04:49 examples
drwxrwxr-x. 2 jun jun 12288 Jun  2 04:49 jars
drwxrwxr-x. 3 jun jun    25 Jun  2 04:49 kubernetes
-rw-rw-r--. 1 jun jun 18045 Jun  2 04:49 LICENSE
drwxrwxr-x. 2 jun jun  4096 Jun  2 04:49 licenses
-rw-rw-r--. 1 jun jun 24913 Jun  2 04:49 NOTICE
drwxrwxr-x. 8 jun jun   240 Jun  2 04:49 python
drwxrwxr-x. 3 jun jun    17 Jun  2 04:49 R
-rw-rw-r--. 1 jun jun  3809 Jun  2 04:49 README.md
-rw-rw-r--. 1 jun jun   161 Jun  2 04:49 RELEASE
drwxrwxr-x. 2 jun jun  4096 Jun  2 04:49 sbin
drwxrwxr-x. 2 jun jun    42 Jun  2 04:49 yarn

　　2.配置Linux环境变量

#spark
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export HDFS_CONF_DIR=$HADOOP_HOME/etc/hadoop
export YARN_CONF_DIR=$HADOOP_HOME/etc/hadoop

　　3.配置spark-env.sh环境变量，注意三个计算机上都必须要这样配置才行

　　复制默认配置文件并使用gedit打开

[jun@master conf]$ cp ~/spark-2.3.1-bin-hadoop2.7/conf/spark-env.sh.template   ~/spark-2.3.1-bin-hadoop2.7/conf/spark-env.sh
[jun@master conf]$ gedit ~/spark-2.3.1-bin-hadoop2.7/conf/spark-env.sh

　　增加下面的配置

export SPARK_MASTER_IP=192.168.1.100
export JAVA_HOME=/usr/java/jdk1.8.0_171/
export SCALA_HOME=/home/jun/scala-2.12.6/

　　4.修改Spark的slaves文件

　　使用gedit打开文件

[jun@master conf]$ cp ~/spark-2.3.1-bin-hadoop2.7/conf/slaves.template slaves
[jun@master conf]$ gedit ~/spark-2.3.1-bin-hadoop2.7/conf/slaves

　　删除默认的localhost并增加下面的配置

# A Spark Worker will be started on each of the machines listed below.
slave0
slave1

　　5.将Spark复制到Slave节点

　　三、Spark集群的启动与验证

　　1.启动Spark集群

　　首先确保Hadoop集群处于启动状态，然后执行启动脚本

[jun@master conf]$ /home/jun/spark-2.3.1-bin-hadoop2.7/sbin/start-all.sh 
starting org.apache.spark.deploy.master.Master, logging to /home/jun/spark-2.3.1-bin-hadoop2.7/logs/spark-jun-org.apache.spark.deploy.master.Master-1-master.out
slave0: starting org.apache.spark.deploy.worker.Worker, logging to /home/jun/spark-2.3.1-bin-hadoop2.7/logs/spark-jun-org.apache.spark.deploy.worker.Worker-1-slave0.out
slave1: starting org.apache.spark.deploy.worker.Worker, logging to /home/jun/spark-2.3.1-bin-hadoop2.7/logs/spark-jun-org.apache.spark.deploy.worker.Worker-1-slave1.out

　　2.验证启动状态

　　（1）通过jps查看进程，可以看到master节点上增加了Master进程，而slave节点上增加了Worker进程

[jun@master conf]$ jps
8480 ResourceManager
8273 SecondaryNameNode
8805 Master
8038 NameNode
8889 Jps

[jun@slave0 ~]$ jps
7794 DataNode
8166 Worker
7961 NodeManager
8252 Jps

[jun@slave1 ~]$ jps
5539 DataNode
5917 Worker
5711 NodeManager
5999 Jps

　　（2）通过Web查看系统状态，输入http://master:8080

　　（3）通过终端命令行向Spark集群提交计算程序

　　为了直接在目录下加载jar包，先将示例程序jar包复制到Spark主安装目录下

[jun@master conf]$ cp /home/jun/spark-2.3.1-bin-hadoop2.7/examples/jars/spark-examples_2.11-2.3.1.jar /home/jun/spark-2.3.1-bin-hadoop2.7/

　　执行SparkPi程序

[jun@master spark-2.3.1-bin-hadoop2.7]$ bin/spark-submit --class org.apache.spark.examples.SparkPi --master yarn-cluster --num-executors 3 --driver-memory 512m --executor-memory 512m --executor-cores 1 spark-examples_2.11-2.3.1.jar 10

　　这个时候报了一个错误，意思就是container要用2.3G内存，而实际的虚拟内存只有2.1G。Yarn默认的虚拟内存和物理内存比例是2.1，也就是说虚拟内存是2.1G，小于了需要的内存2.2G。解决的办法是把拟内存和物理内存比例增大，在yarn-site.xml中增加一个设置：

diagnostics: Application application_1532350446978_0001 failed 2 times due to AM Container for appattempt_1532350446978_0001_000002 exited with  exitCode: -103
Failing this attempt.Diagnostics: Container [pid=6514,containerID=container_1532350446978_0001_02_000001] is running beyond virtual memory limits. Current usage: 289.7 MB of 1 GB physical memory used; 2.3 GB of 2.1 GB virtual memory used. Killing container.

　　关闭Yarn然后在配置文件中增加下面的配置，然后重启Yarn

    <property>
        <name>yarn.nodemanager.vmem-pmem-ratio</name>
        <value>2.5</value>
    </property>

　　再次运行SparkPi程序，在final status可以看到运行成功！

2018-07-23 21:26:53 INFO  Client:54 - 
     client token: N/A
     diagnostics: N/A
     ApplicationMaster host: 192.168.1.101
     ApplicationMaster RPC port: 0
     queue: default
     start time: 1532352353581
     final status: SUCCEEDED
     tracking URL: http://master:18088/proxy/application_1532352327714_0001/
     user: jun
2018-07-23 21:26:53 INFO  ShutdownHookManager:54 - Shutdown hook called
2018-07-23 21:26:53 INFO  ShutdownHookManager:54 - Deleting directory /tmp/spark-1ed5bee9-1aa7-4888-b3ec-80ff2b153192
2018-07-23 21:26:53 INFO  ShutdownHookManager:54 - Deleting directory /tmp/spark-7349a4e3-9483-4d09-91ff-e1e48cb59b46

　　在tracking URL上右键然后选择open link即可在浏览器看到运行状态

　　点击logs，然后点击stdout，可以看到运行结果Pi is roughly 3.143951143951144

posted @ 2018-07-23 21:33 BigJunOba 阅读(365) 评论(0) 编辑收藏举报

刷新页面返回顶部

BigJunOBa

Spark集群安装与配置

公告