一步一步搭建:spark之Standalone模式+zookeeper之HA机制
理论参考:http://www.cnblogs.com/hseagle/p/3673147.html
基于3台主机搭建:以下仅是操作步骤,原理网上自查 :
1. 增加ip和hostname的对应关系,跨主机WORKER无法连接MASTER问题
]$ cat /etc/hosts
192.168.1.6 node6
192.168.1.7 node7
192.168.1.8 node8
2. 新增spark用户,并建立无密互信
3. 下载依赖安装包,解压
$ ll
total 426288
-rw-rw-r-- 1 spark spark 181435897 Sep 22 09:40 jdk-8u102-linux-x64.tar.gz
-rw-rw-r-- 1 spark spark 29086055 Sep 22 09:36 scala-2.11.11.tgz
-rw-rw-r-- 1 spark spark 203728858 Sep 22 09:41 spark-2.2.0-bin-hadoop2.7.tgz
-rw-rw-r-- 1 spark spark 22261552 Sep 22 09:40 zookeeper-3.4.8.tar.gz
export SPARK_HOME=~/soft/spark-2.2.0-bin-hadoop2.7
4. spark 配置添加
cd $SPARK_HOME/conf
cp spark-env.sh.template spark-env.sh
cp slaves.template slaves
$ cat slaves
#localhost
192.168.1.6
192.168.1.7
192.168.1.8
$ cat spark-env.sh
#spark
export JAVA_HOME=~/soft/jdk1.8.0_102
export SCALA_HOME=~/soft/scala-2.11.11
#export SPARK_MASTER_IP=127.0.0.1
export SPARK_WORKER_CORES=12
export SPARK_WORKER_MEMORY=32g
export SPARK_HOME=~/soft/spark-2.2.0-bin-hadoop2.7
export SPARK_DAEMON_JAVA_OPTS="-Dspark.deploy.recoveryMode=ZOOKEEPER -Dspark.deploy.zookeeper.url=192.168.1.7:2181 -Dspark.deploy.zookeeper.dir=/spark"
5. 确认以上4步,在每台主机上执行一遍
6. 启动zk 或集群(略)
7. 启动spark
cd $SPARK_HOME/sbin;
./start-all.sh (主节点)
./start-master.sh (STANDBY master节点 )
8. 查看前台监控
http://192.168.1.6:8080
9. 测试spark
./bin/spark-submit --class org.apache.spark.examples.SparkPi --master spark://192.168.1.6:7077,192.168.1.7:7077,192.168.1.8:7077 ./examples/jars/spark-examples_2.11-2.2.0.jar