Hadoop 3.1.1+Spark 2.4 安装部署手册
Spark安装
Spark2.1.0完全分布式环境搭建:
MASTER节点:
1.下载文件:
wget -O "spark.tgz" "http://d3kbcqa49mib13.cloudfront.net/spark.tgz"
2.解压并移动至相应的文件夹;
tar -xvf spark.tgz
mv spark /opt
3.修改相应的配置文件:
(1)vi /etc/profie
#Spark enviroment
export SPARK_HOME=/opt/spark/
export PATH="$SPARK_HOME/bin:$PATH"
(2)$SPARK_HOME/conf/spark-env.sh
cp spark-env.sh.template spark-env.sh
vi spark-env.sh
#配置内容如下:
export SCALA_HOME=/usr/share/scala
export JAVA_HOME=/usr/java/jdk1.8.0_112/
export SPARK_MASTER_IP=master
export SPARK_WORKER_MEMORY=1g
export HADOOP_CONF_DIR=/opt/hadoop-2.7.3/etc/hadoop
#实际配置如下
export JAVA_HOME=/opt/jdk-10.0.2
export PYTHONPATH=/usr/bin/python
export HADOOP_HOME=/opt/hadoop-3.1.1
export HADOOP_CONF_DIR=/opt/hadoop-3.1.1/etc/hadoop
export SPARK_MASTER_IP=SparkMaster
export SPARK_WORKER_MEMORY=4g
export SPARK_WORKER_CORES=2
export SPARK_WORKER_INSTANCES=1
#其他可配置的参数如下:
# - SPARK_MASTER_PORT / SPARK_MASTER_WEBUI_PORT, to use non-default ports for the master
# - SPARK_MASTER_OPTS, to set config properties only for the master (e.g. "-Dx=y")
# - SPARK_WORKER_CORES, to set the number of cores to use on this machine
# - SPARK_WORKER_MEMORY, to set how much total memory workers have to give executors (e.g. 1000m, 2g)
# - SPARK_WORKER_PORT / SPARK_WORKER_WEBUI_PORT, to use non-default ports for the worker
# - SPARK_WORKER_DIR, to set the working directory of worker processes
# - SPARK_WORKER_OPTS, to set config properties only for the worker (e.g. "-Dx=y")
# - SPARK_DAEMON_MEMORY, to allocate to the master, worker and history server themselves (default: 1g).
# - SPARK_HISTORY_OPTS, to set config properties only for the history server (e.g. "-Dx=y")
# - SPARK_SHUFFLE_OPTS, to set config properties only for the external shuffle service (e.g. "-Dx=y")
# - SPARK_DAEMON_JAVA_OPTS, to set config properties for all daemons (e.g. "-Dx=y")
# - SPARK_DAEMON_CLASSPATH, to set the classpath for all daemons
# - SPARK_PUBLIC_DNS, to set the public dns name of the master or workers
# Generic options for the daemons used in the standalone deploy mode
# - SPARK_CONF_DIR Alternate conf dir. (Default: ${SPARK_HOME}/conf)
# - SPARK_LOG_DIR Where log files are stored. (Default: ${SPARK_HOME}/logs)
# - SPARK_PID_DIR Where the pid file is stored. (Default: /tmp)
# - SPARK_IDENT_STRING A string representing this instance of spark. (Default: $USER)
# - SPARK_NICENESS The scheduling priority for daemons. (Default: 0)
# - SPARK_NO_DAEMONIZE Run the proposed command in the foreground. It will not output a PID file.
# Options for native BLAS, like Intel MKL, OpenBLAS, and so on.
# You might get better performance to enable these options if using native BLAS (see SPARK-21305).
# - MKL_NUM_THREADS=1 Disable multi-threading of Intel MKL
# - OPENBLAS_NUM_THREADS=1 Disable multi-threading of OpenBLAS
(3)$SPARK_HOME/conf/slaves
cp slaves.template slaves
配置内容如下
master
slave1
slave2
WorkerN节点:
将配置好的spark文件复制到workerN节点
scp spark root@workerN:/opt
修改/etc/profile,增加spark相关的配置,如MASTER节点一样