mac安装spark

一、基础信息

spark版本:spark-3.1.3-bin-hadoop3.2

hadoop版本:hadoop-3.2.1

scala版本:scala-2.11.12  建议3.12版本

下载地址:https://spark.apache.org/downloads.html

二、配置修改

1、复制hive配置到conf目录下,使用hive源

cp $HIVE_HOME/conf/hive-site.xml  $SPARK_HOME/conf

2、修改配置spark-defaults.conf

export SPARK_HOME=/Users/Robots2/softWare/spark-3.1.3

PATH=$SPARK_HOME/bin:$SCALA_HOME/bin:$HBASE_HOME/bin:$HIVE_HOME/bin:$ZOOKEEPER_HOME/bin:$HADOOP_HOME/bin:$JAVA_HOME/bin:$PATH

3、修改spark-env.sh

export SPARK_MASTER_IP=localhost
export JAVA_HOME=/Library/Java/JavaVirtualMachines/jdk1.8.0_311.jdk/Contents/Home
export SCALA_HOME=/Users/Robots2/softWare/scala-2.11.12

export SPARK_CONF_DIR=/Users/Robots2/softWare/spark-3.1.3/conf
export HADOOP_CONF_DIR=/Users/Robots2/softWare/hadoop-3.2.1
export YARN_CONF_DIR=/Users/Robots2/softWare/hadoop-3.2.1/etc/hadoop

SPARK_LOCAL_IP=localhost

4、环境变量

vim ~/.bash_profile 

#Spark3
SPARK_HOME=/Users/Robots2/softWare/spark-3.1.3
export PATH="${SPARK_HOME}/bin:${PATH}"

 source ~/.bash_profile

三、运维

1、启动spark,可以考虑直接使用yarn提交任务

命令:cd $SPARK_HOME/sbin
命令: ./start-all.sh
命令:jps

49452 Master
49495 Worker

 

四、Spark on yarn配置

1、 Spark上修改spark-env.sh文件

cp spark-env.sh.template spark-env.sh

vim $SPARK_HOME/conf/spark-env.sh

添加如下配置

export HADOOP_CONF_DIR=${HADOOP_HOME}/etc/hadoop
export YARN_CONF_DIR=${HADOOP_HOME}/etc/hadoop

3.2 修改hadoop 下的capacity-scheduler.xml文件修改配置保证yarn资源调度按照CPU + 内存模式

<property> 
    <name>yarn.scheduler.capacity.resource-calculator</name> 
    <!-- <value>org.apache.hadoop.yarn.util.resource.DefaultResourceCalculator</value> --> 
    <value>org.apache.hadoop.yarn.util.resource.DominantResourceCalculator</value> 
</property>

3.3 在hadoop 下 yarn-site.xml开启日志功能

<property>
    <description>Whether to enable log aggregation</description>
    <name>yarn.log-aggregation-enable</name>
    <value>true</value>
</property>
<property>
    <name>yarn.log.server.url</name>
    <value>http://master:19888/jobhistory/logs</value>
</property>

3.4 修改hadoop下mapred-site.xml

<property>
    <name>mapreduce.jobhistory.address</name>
    <value>master:10020</value>
</property>
 
<property>
    <name>mapreduce.jobhistory.webapp.address</name>
    <value>master:19888</value>
</property>

3.5 修改spark 下 spakr-defaults.conf文件

spark.eventLog.dir=hdfs:///user/spark/applicationHistory
spark.eventLog.enabled=true
spark.yarn.historyServer.address=http://master:18018

3.6 修改spark 下 spark-evn.sh环境变量

export SPARK_HISTORY_OPTS="-Dspark.history.ui.port=18018 -Dspark.history.fs.logDirectory=hdfs:///user/spark/applicationHistory"

3.7 查看日志

yarn查看日志命令: yarn logs -applicationId <application_1590546538590_0017>

 

 

四、启动异常

4.1 ssh配置错误

ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa

cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

ssh localhost

~/.ssh需要是700权限 (chmod 700 ~/.ssh)

 

posted @ 2022-04-02 18:14  黑水滴  阅读(701)  评论(0编辑  收藏  举报