记录hadoop2.7.3、Spark2.4.5配置
core-site.xml:
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://node1:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/opt/hadoop-2.7.3/tmp</value>
</property>
</configuration>
hdfs-site.xml:
<configuration>
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/opt/hadoop-2.7.3/dfs/name</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/opt/hadoop-2.7.3/dfs/data</value>
</property>
</configuration>
<configuration> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> <property> <name>mapreduce.jobhistory.address</name> <value>node1:10020</value> </property> <property> <name>mapreduce.jobhistory.webapp.address</name> <value>node1:19888</value> </property> </configuration>
yarn-site.xml: <configuration> <!-- Site specific YARN configuration properties --> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <property> <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name> <value>org.apache.hadoop.mapred.ShuffleHandler</value> </property> <property> <name>yarn.resourcemanager.address</name> <value>node1:8032</value> </property> <property> <name>yarn.resourcemanager.scheduler.address</name> <value>node1:8030</value> </property> <property> <name>yarn.resourcemanager.resource-tracker.address</name> <value>node1:8031</value> </property> <property> <name>yarn.resourcemanager.admin.address</name> <value>node1:8033</value> </property> <property> <name>yarn.resourcemanager.webapp.address</name> <value>node1:8888</value> </property> </configuration>
node1:192.168.31.100
node2:192.168.31.101
node3:192.168.31.102
关闭iptables
关闭selinux
配置IPADDR、ONBOOT=yes、GATEWAY
配置JAVA、hadoop
配置SSH免密
配置hadoop-env.sh、yarn-env.sh,取消注释添加JAVA_HOME
Spark:
进入conf文件夹,把spark-env.sh.template复制一份spark-env.sh
export JAVA_HOME=/opt/jdk1.8.0_221 export SCALA_HOME=/opt/scala-2.12.11 export HADOOP_HOME=/opt/hadoop-2.7.3 export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop export SPARK_HOME=/opt/spark-2.4.5-bin-hadoop2.7 export SPARK_MASTER_IP=192.168.31.100
进入conf文件夹,把slaves.template拷贝一份改名为slaves
node1
node2
node3
BUG:
设置yarn运行时:
## 打包jars
jar cv0f spark-libs.jar -C $SPARK_HOME/jars/ .
## 新建hdfs路径
hdfs dfs -mkdir -p /spark/jar
## 上传jars到HDFS
hdfs dfs -put spark-libs.jar /spark/jar
## 增加配置
vim spark-defaults.conf(先cp模板)
spark.yarn.archive=hdfs:
//node1:9000/spark/jar/spark-libs.jar
红色部分一定要写,不然提示错误如下:
java.lang.IllegalArgumentException: java.net.UnknownHostException: spark
Spark集群运行的几种模式
a.local本地模式
b.Spark内置standalone集群模式
c.Yarn集群模式