[b0006] Spark 2.0.1 伪分布式搭建练手

环境:

已经安装好:

hadoop 2.6.4  yarn 

参考: [b0001] 伪分布式 hadoop 2.6.4

准备:

spark-2.0.1-bin-hadoop2.6.tgz 下载地址: http://spark.apache.org/downloads.html

说明

  • 官方说 2.0 后的spark 自带scala,所以接下来不用额外安装
  • 安装spark 不一定要装hadoop,可以直接跑在linux系统上
  • 以下所有操作都是用hadoop安装用户进行,权限不够就sudo

1. 获得安装包

迅雷下载,上传到linux,解压

hadoop@ssmaster:~$ tar zxvf  spark-2.0.1-bin-hadoop2.6.tgz
hadoop@ssmaster:~$ sudo  mv  spark-2.0.1-bin-hadoop2.6 /opt/

2.配置spark

2.1  SPARK_HOME 环境变量

hadoop@ssmaster:/opt$ sudo vi /etc/profile

添加

export SPARK_HOME=/opt/spark-2.0.1-bin-hadoop2.6
export HADOOP_HOME=/opt/hadoop-2.6.4
export JAVA_HOME=/home/szb/hadoop/jdk1.7.0_80
export JRE_HOME=$JAVA_HOME/jre
export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$SPARK_HOME/bin
export CLASSPATH=./:$JAVA_HOME/lib:$JAVA_HOME/jre/lib

保存、使生效并测试

hadoop@ssmaster:/opt$ source /etc/profile
hadoop@ssmaster:/opt$ echo $SPARK_HOME
/opt/spark-2.0.1-bin-hadoop2.6

 

2.2  参数配置

修改 spark-env.sh

1 hadoop@ssmaster:/opt/spark-2.0.1-bin-hadoop2.6/conf$ pwd
2 /opt/spark-2.0.1-bin-hadoop2.6/conf
3 hadoop@ssmaster:/opt/spark-2.0.1-bin-hadoop2.6/conf$ cp spark-env.sh.template spark-env.sh
4 hadoop@ssmaster:/opt/spark-2.0.1-bin-hadoop2.6/conf$ vi spark-env.sh

在 spark-env.sh中添加以下参数,含义很好理解

###jdk dir

export  JAVA_HOME=/home/szb/hadoop/jdk1.7.0_80

###the ip of master node of spark

export SPARK_MASTER_IP=192.168.249.144

###the max memory size of worker

export SPARK_WORKER_MEMORY=1024m

###hadoop configuration file dir

export HADOOP_CONF_DIR=/opt/hadoop-2.6.4/etc/hadoop/

2.3 指定spark slave节点

hadoop@ssmaster:/opt/spark-2.0.1-bin-hadoop2.6/conf$ cp slaves.template slaves
hadoop@ssmaster:/opt/spark-2.0.1-bin-hadoop2.6/conf$ vi slaves

修改内容为 ssmaster。

至此,配置都完成了

3 启动spark

3.1 先启动 hadoop ,依次执行 start-dfs.sh,start-yarn.sh,  jps查看后没问题

3.2 启动spark 所有节点

hadoop@ssmaster:/opt/spark-2.0.1-bin-hadoop2.6$ sbin/start-all.sh

hadoop@ssmaster:/opt/spark-2.0.1-bin-hadoop2.6$ jps
5859 ResourceManager
5979 NodeManager
5690 SecondaryNameNode
5361 NameNode
7014 Jps
5479 DataNode
6866 Master
6955 Worker

3.3 启动scala

hadoop@ssmaster:/opt/spark-2.0.1-bin-hadoop2.6$ bin/spark-shell 
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel).
16/10/23 04:27:01 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
16/10/23 04:27:05 WARN spark.SparkContext: Use an existing SparkContext, some configuration may not take effect.
Spark context Web UI available at http://192.168.249.144:4040
Spark context available as 'sc' (master = local[*], app id = local-1477222025276).
Spark session available as 'spark'.
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 2.0.1
      /_/
         
Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.7.0_80)
Type in expressions to have them evaluated.
Type :help for more information.

scala> print("hello world")
hello world

 

备注:

 停止spark         sbin/stop-all.sh

 停止scala          ctrl-c

 启动python入口  bin/pyspark 

 

能打开以下页面,说明spark 安装成功

http://ssmaster:8080/

http://ssmaster:4040/

 

Z 总结:

   hadoop2.6上的spark伪分布式搭建成功

   后续:

  • 重点学习使用它做一些实际的事情
  • 有空研究分布式spark搭建
  • 有空研究spark各个参数的功能
  • spark的用户与hadoop用户同一个,如何不同,怎么安装[遗留研究]

C 参考:

Hadoop2.6.0上的spark1.5.2集群搭建

http://www.open-open.com/lib/view/open1453950039980.html

 

 

 

posted @ 2016-10-23 19:56  sunzebo  阅读(497)  评论(0编辑  收藏  举报