hadoop 集群搭建
前言
部署不同模式集群的区别
1.单机模式(standalone)
单机模式是Hadoop的默认模式。这种模式在一台单机上运行,没有分布式文件系统,而是直接读写本地操作系统的文件系统。当首次解压Hadoop的源码包时,Hadoop无法了解硬件安装环境,便保守地选择了最小配置。在这种默认模式下所有3个XML文件均为空。当配置文件为空时,Hadoop会完全运行在本地。因为不需要与其他节点交互,单机模式就不使用HDFS,也不加载任何Hadoop的守护进程。该模式主要用于开发调试MapReduce程序的应用逻辑。
2.伪分布模式(Pseudo-Distributed Mode)
这种模式也是在一台单机上运行,但用不同的Java进程模仿分布式运行中的各类结点
伪分布模式在“单节点集群”上运行Hadoop,其中所有的守护进程都运行在同一台机器上。该模式在单机模式之上增加了代码调试功能,允许你检查内存使用情况,HDFS输入输出,以及其他的守护进程交互。
3 . 全分布模式(Fully Distributed Mode)
Hadoop守护进程运行在一个集群上。
4 为什么伪分布式要比单机慢?
众所周知MapReduce是基于硬盘的计算引擎,计算一个结果就会存入硬盘,reduce计算时会从硬盘中取出再进行计算,在单机模式下硬盘就是我们的自身的Linux系统,但是分布式的情况下,硬盘是我们的hdfs分布式文件系统,存取数据会有一层映射,故而慢。既然这样的话,那为何还要有分布式文件系统?原因就是大数据时代,单机硬盘存不下大量数据,只能通过分布式存储。
| |
| https://github.com/xiaoguangbiao-github/bigdata_spark_env |
| |
| https://github.com/xiaoguangbiao-github/bigdata_hadoop_env/blob/main/README.md |
1. hadoop
1.1 配置 hosts
| cat >>/etc/hosts <<'EOF' |
| 172.16.3.20 node001 |
| 172.16.3.21 node002 |
| 172.16.3.22 node003 |
| EOF |
1.2 配置 密钥互信
| |
| |
| ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa |
| cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys |
| |
| |
| |
| ssh localhost 这个好使就代表成功 |
1.3 JDK 环境(3台全部部署)
| |
| |
| mkdir -p /app/tools |
| tar xf jdk-8u333-linux-x64.tar.gz -C /app/tools |
| ln -s /app/tools/jdk1.8.0_333/ /usr/local/java |
| |
| |
| cat >> /etc/profile <<'EOF' |
| export JAVA_HOME=/usr/local/java |
| export PATH=$JAVA_HOME/bin:$JAVA_HOME/jre/bin:$PATH |
| export CLASSPATH=.$CLASSPATH:$JAVA_HOME/lib:$JAVA_HOME/jre/lib:$JAVA_HOME/lib/tools.jar |
| EOF |
| source /etc/profile |
| |
| |
| java -version |
1.4 创建 目录
| mkdir -p /export/server/ |
| mkdir -p /export/data/ |
| mkdir -p /export/software/ |
1.5 下载 压缩包,并解压到指定目录
| wget https://mirror.tuna.tsinghua.edu.cn/apache/hadoop/common/hadoop-3.3.4/hadoop-3.3.4.tar.gz |
| tar xf hadoop-3.3.4.tar.gz -C /export/server/ |
1.6 编辑配置文件
| cd /export/server/hadoop-3.3.4/etc/hadoop |
| |
| vim ...... |
| |
| <!------------------------hadoop-env.sh-------------------------------> |
| |
| export JAVA_HOME=/usr/local/java |
| |
| |
| export HDFS_NAMENODE_USER=root |
| export HDFS_DATANODE_USER=root |
| export HDFS_SECONDARYNAMENODE_USER=root |
| export YARN_RESOURCEMANAGER_USER=root |
| export YARN_NODEMANAGER_USER=root |
| |
| <!------------------------core-site.xml-------------------------------> |
| |
| <!-- 默认文件系统的名称。通过URI中schema区分不同文件系统。--> |
| <!-- file:///本地文件系统 hdfs:// hadoop分布式文件系统 gfs://。--> |
| <!-- hdfs文件系统访问地址:http://nn_host:8020。--> |
| <property> |
| <name>fs.defaultFS</name> |
| <value>hdfs://node001:8020</value> |
| </property> |
| <!-- hadoop本地数据存储目录 format时自动生成 --> |
| <property> |
| <name>hadoop.tmp.dir</name> |
| <value>/export/data/hadoop-3.3.4</value> |
| </property> |
| <!-- 在Web UI访问HDFS使用的用户名。--> |
| <property> |
| <name>hadoop.http.staticuser.user</name> |
| <value>root</value> |
| </property> |
| |
| <!------------------------hdfs-site.xml-------------------------------> |
| |
| <!-- 设定SNN运行主机和端口。--> |
| <property> |
| <name>dfs.namenode.secondary.http-address</name> |
| <value>node002:9868</value> |
| </property> |
| |
| <!------------------------mapred-site.xml-------------------------------> |
| |
| <!-- mr程序默认运行方式。yarn集群模式 local本地模式--> |
| <property> |
| <name>mapreduce.framework.name</name> |
| <value>yarn</value> |
| </property> |
| <!-- MR App Master环境变量。--> |
| <property> |
| <name>yarn.app.mapreduce.am.env</name> |
| <value>HADOOP_MAPRED_HOME=${HADOOP_HOME}</value> |
| </property> |
| <!-- MR MapTask环境变量。--> |
| <property> |
| <name>mapreduce.map.env</name> |
| <value>HADOOP_MAPRED_HOME=${HADOOP_HOME}</value> |
| </property> |
| <!-- MR ReduceTask环境变量。--> |
| <property> |
| <name>mapreduce.reduce.env</name> |
| <value>HADOOP_MAPRED_HOME=${HADOOP_HOME}</value> |
| </property> |
| |
| <!------------------------yarn-site.xml-------------------------------> |
| |
| <!-- yarn集群主角色RM运行机器。--> |
| <property> |
| <name>yarn.resourcemanager.hostname</name> |
| <value>node001</value> |
| </property> |
| <!-- NodeManager上运行的附属服务。需配置成mapreduce_shuffle,才可运行MR程序。--> |
| <property> |
| <name>yarn.nodemanager.aux-services</name> |
| <value>mapreduce_shuffle</value> |
| </property> |
| <!-- 每个容器请求的最小内存资源(以MB为单位)。--> |
| <property> |
| <name>yarn.scheduler.minimum-allocation-mb</name> |
| <value>512</value> |
| </property> |
| <!-- 每个容器请求的最大内存资源(以MB为单位)。--> |
| <property> |
| <name>yarn.scheduler.maximum-allocation-mb</name> |
| <value>2048</value> |
| </property> |
| <!-- 容器虚拟内存与物理内存之间的比率。--> |
| <property> |
| <name>yarn.nodemanager.vmem-pmem-ratio</name> |
| <value>4</value> |
| </property> |
1.7 编辑Hadoop配置文件
| cd /export/server/hadoop-3.3.4/etc/hadoop/ |
| vim workers |
| |
| node001 |
| node002 |
| node003 |
1.8 分发配置文件
| cd /export/server/ |
| scp -rp hadoop-3.3.4 node002:`pwd` |
| scp -rp hadoop-3.3.4 node003:`pwd` |
1.9 配置 hadoop 环境变量
| cat >>/etc/profile <<'EOF' |
| export HADOOP_HOME=/export/server/hadoop-3.3.4 |
| export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin |
| EOF |
| |
| source /etc/profile |
| |
| |
| hadoop |
首次启动HDFS时,必须对其进行格式化操作。 format本质上是初始化工作,进行HDFS清理和准备工作
首次启动之前需要format操作,format只能进行一次 后续不再需要,如果多次format除了造成数据丢失外,还会导致hdfs集群主从角色之间互不识别,通过删除所有机器hadoop.tmp.dir目录重新forma解决。
1.11 启动
| cd /export/server/hadoop-3.3.4/sbin |
| ./start-all.sh |
| |
| |
| node002: WARNING: /export/server/hadoop-3.3.4/logs does not exist. Creating. |
| node003: WARNING: /export/server/hadoop-3.3.4/logs does not exist. Creating. |
| |
| mkdir -p /export/server/hadoop-3.3.4/logs |
1.12 访问测试
hadoop页面

HDFS页面

2. scala 搭建部署(3节点全部部署)
2.1 下载解压
| wget https://downloads.lightbend.com/scala/2.11.12/scala-2.11.12.tgz |
| tar xf scala-2.11.12.tgz -C /export/server/ |
2.2 配置环境变量
| cat >> /etc/profile <<'EOF' |
| export SCALA_HOME=/export/server/scala-2.11.12 |
| export PATH=$SCALA_HOME/bin:$PATH |
| EOF |
| |
| source /etc/profile |
2.3测试
| |
| [root@node1 ~] |
| Welcome to Scala 2.11.12 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_201). |
| Type in expressions for evaluation. Or try :help. |
| |
| scala> |
3. spark 配置部署
3.1 下载解压
| wget --no-check-certificate https://dlcdn.apache.org/spark/spark-3.2.2/spark-3.2.2-bin-hadoop3.2.tgz |
| tar xf spark-3.2.2-bin-hadoop3.2.tgz -C /export/server |
| mv /export/server/spark-3.2.2 /export/server/spark |
3.2 修改配置文件
| cd /export/server/spark/conf |
| mv slaves.template slaves |
| |
| vim slaves |
| |
| node002 |
| node003 |
3.3 配置master
| |
| cd /export/server/spark/conf |
| |
| |
| mv spark-env.sh.template spark-env.sh |
| |
| |
| vim spark-env.sh |
| |
| 增加如下内容: |
| |
| JAVA_HOME=/usr/local/java |
| |
| |
| HADOOP_CONF_DIR=/export/server/hadoop/etc/hadoop |
| YARN_CONF_DIR=/export/server/hadoop/etc/hadoop |
| |
| |
| |
| |
| SPARK_MASTER_HOST=node001 |
| SPARK_MASTER_PORT=7077 |
| |
| SPARK_MASTER_WEBUI_PORT=8080 |
| |
| SPARK_WORKER_CORES=1 |
| SPARK_WORKER_MEMORY=1g |
3.4 分发配置文件
| cd /export/server/ |
| scp -rp spark root@node002:`pwd` |
| scp -rp spark root@node003:`pwd` |
3.5 启动服务
| |
| /export/server/spark/sbin/start-all.sh |
| |
| |
| /export/server/spark/sbin/stop-all.sh |
| |
| |
| start-master.sh |
| stop-master.sh |
| |
| |
| start-slaves.sh |
| stop-slaves.sh |
3.6 检测
| |
| node001:master |
| node002/node003:worker |
| |
| http://node001:8080/ |
页面-内部通信是7077

交互式访问
| [root@node1 export] |
| 22/10/04 17:01:30 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable |
| Spark context Web UI available at http://node1:4040 |
| Spark context available as 'sc' (master = local[*], app id = local-1664874092205). |
| Spark session available as 'spark'. |
| Welcome to |
| ____ __ |
| / __/__ ___ _____/ /__ |
| _\ \/ _ \/ _ `/ __/ '_/ |
| /___/ .__/\_,_/_/ /_/\_\ version 3.2.2 |
| /_/ |
| |
| Using Scala version 2.12.15 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_201) |
| Type in expressions to have them evaluated. |
| Type :help for more information. |
| |
| scala> |
| |

4. spark-on-yarn 部署
4.1 关闭之前的Spark-Standalone集群
| /export/server/spark/sbin/stop-all.sh |
4.2 配置Yarn历史服务器并关闭资源检查
| vim /export/server/hadoop/etc/hadoop/yarn-site.xml |
| |
| <configuration> |
| <!-- 配置yarn主节点的位置 --> |
| <property> |
| <name>yarn.resourcemanager.hostname</name> |
| <value>node001</value> |
| </property> |
| <property> |
| <name>yarn.nodemanager.aux-services</name> |
| <value>mapreduce_shuffle</value> |
| </property> |
| <!-- 设置yarn集群的内存分配方案 --> |
| <property> |
| <name>yarn.nodemanager.resource.memory-mb</name> |
| <value>20480</value> |
| </property> |
| <property> |
| <name>yarn.scheduler.minimum-allocation-mb</name> |
| <value>2048</value> |
| </property> |
| <property> |
| <name>yarn.nodemanager.vmem-pmem-ratio</name> |
| <value>2.1</value> |
| </property> |
| <!-- 开启日志聚合功能 --> |
| <property> |
| <name>yarn.log-aggregation-enable</name> |
| <value>true</value> |
| </property> |
| <!-- 设置聚合日志在hdfs上的保存时间 --> |
| <property> |
| <name>yarn.log-aggregation.retain-seconds</name> |
| <value>604800</value> |
| </property> |
| <!-- 设置yarn历史服务器地址 --> |
| <property> |
| <name>yarn.log.server.url</name> |
| <value>http://node001:19888/jobhistory/logs</value> |
| </property> |
| <!-- 关闭yarn内存检查 --> |
| <property> |
| <name>yarn.nodemanager.pmem-check-enabled</name> |
| <value>false</value> |
| </property> |
| <property> |
| <name>yarn.nodemanager.vmem-check-enabled</name> |
| <value>false</value> |
| </property> |
| </configuration> |
4.3 分发并重启yarn
| cd /export/server/hadoop/etc/hadoop |
| scp -r yarn-site.xml root@node002:`pwd` |
| scp -r yarn-site.xml root@node003:`pwd` |
| |
| /export/server/hadoop/sbin/stop-yarn.sh |
| /export/server/hadoop/sbin/start-yarn.sh |
4.4 配置Spark的历史服务器和Yarn的整合
| cd /export/server/spark/conf |
| mv spark-defaults.conf.template spark-defaults.conf |
| |
| vim spark-defaults.conf |
| |
| spark.eventLog.enabled true |
| spark.eventLog.dir hdfs://node001:8020/sparklog/ |
| spark.eventLog.compress true |
| spark.yarn.historyServer.address node001:18080 |
4.5 修改spark-env.sh
| vim /export/server/spark/conf/spark-env.sh |
| |
| |
| SPARK_HISTORY_OPTS="-Dspark.history.fs.logDirectory=hdfs://node001:8020/sparklog/ -Dspark.history.fs.cleaner.enabled=true" |
4.6 创建sparklog
| hadoop fs -mkdir -p /sparklog |
4.7 修改日志级别
| cd /export/server/spark/conf |
| |
| mv log4j.properties.template log4j.properties |
| |
| vim log4j.properties |
| |
| |
| cd /export/server/spark/conf |
| scp -rp spark-env.sh root@node002:$PWD |
| scp -rp spark-env.sh root@node003:$PWD |
| scp -rp spark-defaults.conf root@node002:$PWD |
| scp -rp spark-defaults.conf root@node003:$PWD |
| scp -rp log4j.properties root@node002:$PWD |
| scp -rp log4j.properties root@node003:$PWD |
4.8 配置依赖的Spark 的jar包
- 在HDFS上创建存储spark相关jar包的目录
| hadoop fs -mkdir -p /spark/jars/ |
- 上传$SPARK_HOME/jars所有jar包到HDFS
| hadoop fs -put /export/server/spark/jars/* /spark/jars/ |
- 在node001上修改spark-defaults.conf
| vim /export/server/spark/conf/spark-defaults.conf |
| |
| spark.yarn.jars hdfs://node1:8020/spark/jars/* |
| |
| |
| cd /export/server/spark/conf |
| scp -r spark-defaults.conf root@node002:$PWD |
| scp -r spark-defaults.conf root@node003:$PWD |
4.9 启动服务
启动HDFS和YARN服务,在node001执行命令
| start-dfs.sh |
| start-yarn.sh |
| 或 |
| start-all.sh |
启动MRHistoryServer服务,在node001执行命令
| mr-jobhistory-daemon.sh start historyserver |
启动Spark HistoryServer服务,在node001执行命令
| /export/server/spark/sbin/start-history-server.sh |
4.10所有显示页面
hadoop

HDFS

SPARK

SPARK-HISTORY

HADOOP-HISTORY

5.容器化参考hadoop yarn集群集成spark onyarn
| FROM centos:7.9.2009 |
| LABEL auther=QuYi hadoop=3.3.4 jdk=1.8 scala=2.11.12 spark=3.2.2 |
| |
| |
| |
| RUN mkdir -p /export/server/ \ |
| && mkdir -p /export/data/ \ |
| && mkdir -p /export/software/ |
| |
| |
| WORKDIR /export |
| |
| |
| COPY hadoop-3.3.4.tar.gz /export/software/ |
| COPY scala-2.11.12.tgz /export/software/ |
| COPY spark-3.2.2-bin-hadoop3.2.tgz /export/software/ |
| COPY jdk-8u201-linux-x64.tar.gz /export/software/ |
| |
| |
| RUN cd /export/software/ \ |
| && tar xf hadoop-3.3.4.tar.gz -C /export/server/ \ |
| && tar xf jdk-8u201-linux-x64.tar.gz -C /root/ \ |
| && tar xf scala-2.11.12.tgz -C /export/server/ \ |
| && tar xf spark-3.2.2-bin-hadoop3.2.tgz -C /export/server/ \ |
| && ln -s /export/server/spark-3.2.2-bin-hadoop3.2/ /export/server/spark \ |
| && ln -s /export/server/hadoop-3.3.4/ /export/server/hadoop |
| |
| |
| COPY core-site.xml /export/server/hadoop/etc/hadoop/ |
| COPY hadoop-env.sh /export/server/hadoop/etc/hadoop/ |
| COPY hdfs-site.xml /export/server/hadoop/etc/hadoop/ |
| COPY mapred-site.xml /export/server/hadoop/etc/hadoop/ |
| COPY yarn-site.xml /export/server/hadoop/etc/hadoop/ |
| COPY workers /export/server/hadoop/etc/hadoop/ |
| |
| COPY slaves /export/server/spark/conf/ |
| COPY spark-env.sh /export/server/spark/conf/ |
| COPY spark-defaults.conf /export/server/spark/conf/ |
| COPY log4j.properties /export/server/spark/conf/ |
| |
| |
| |
| RUN yum install -y openssh openssh-clients openssh-server iproute initscripts \ |
| && /usr/sbin/sshd-keygen -A \ |
| && /usr/sbin/sshd \ |
| && sed -i '4iStrictHostKeyChecking no' /etc/ssh/ssh_config \ |
| && sed -i '3iPort 22' /etc/ssh/sshd_config \ |
| && sed -i '8iPubkeyAuthentication yes' /etc/ssh/sshd_config \ |
| && mkdir -p /root/soft/hadoop/logs \ |
| && rm -rf /root/.ssh/* \ |
| && /bin/cp /usr/share/zoneinfo/Asia/Shanghai /etc/localtime && echo 'Asia/Shanghai' >/etc/timezone \ |
| && rm -rf /export/software/* |
| |
| COPY id_rsa /root/.ssh/ |
| COPY authorized_keys /root/.ssh/ |
| |
| RUN chmod 700 /root/.ssh \ |
| && chmod 600 /root/.ssh/id_rsa \ |
| && chmod 600 /root/.ssh/authorized_keys \ |
| && chmod og-wx /root/.ssh/authorized_keys |
| |
| |
| |
| |
| ENV JAVA_HOME="/root/jdk1.8.0_201" |
| ENV PATH="$PATH:${JAVA_HOME}/bin" |
| |
| ENV HADOOP_HOME="/export/server/hadoop-3.3.4" |
| ENV PATH="$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin" |
| |
| ENV SCALA_HOME="/export/server/scala-2.11.12" |
| ENV PATH="$SCALA_HOME/bin:$PATH" |
| |
| ENV SPARK_HOME="/export/server/spark" |
| ENV PATH="$PATH:$SPARK_HOME/bin" |
| |
| |
| COPY entrypoint.sh / |
| |
| |
| RUN chmod 777 /entrypoint.sh |
| |
| |
| |
| EXPOSE 8040 9864 9000 8042 9866 9867 9868 33389 50070 8088 8030 36638 8031 8032 8033 7077 41904 8081 8082 4044 |
| |
| CMD ["/entrypoint.sh"] |
| |
| #!/bin/bash |
| |
| |
| /usr/sbin/sshd -D |
| |
| |
| hdfs namenode -format |
| |
| /export/server/hadoop-3.3.4/sbin/start-dfs.sh |
| |
| hadoop fs -mkdir -p /wordcount/input |
| hadoop fs -mkdir -p /sparklog |
| hadoop fs -mkdir -p /spark/jars/ |
| hadoop fs -put /export/server/spark/jars/* /spark/jars/ |
| |
| /export/server/hadoop-3.3.4/sbin/start-yarn.sh |
| /export/server/hadoop-3.3.4/sbin/start-all.sh |
| /export/server/spark/sbin/start-all.sh |
| /export/server/hadoop/sbin/mr-jobhistory-daemon.sh start historyserver |
| /export/server/spark/sbin/start-history-server.sh |
| |
| |
| tail -f /etc/hosts |
6.运行容器的命令(记录)
| #!/bin/bash |
| |
| |
| |
| |
| docker network create sparkonyarn |
| |
| |
| docker run -d --name node1 --hostname node1 --network sparkonyarn --add-host node1:172.18.0.2 --add-host node2:172.18.0.3 --add-host node3:172.18.0.4 --restart=always -p 8088:8088 -p 8080:8080 -p 9870:9870 -p 19888:19888 -p 18080:18080 -p 4040:4040 master:latest |
| |
| docker run -d --name node2 --hostname node2 --network sparkonyarn --add-host node1:172.18.0.2 --add-host node2:172.18.0.3 --add-host node3:172.18.0.4 --restart=always master:latest |
| |
| docker run -d --name node3 --hostname node3 --network sparkonyarn --add-host node1:172.18.0.2 --add-host node2:172.18.0.3 --add-host node3:172.18.0.4 --restart=always master:latest |
| |
| |
| |
| docker exec -it HadoopSpark bash |
| |
| |
| /root/soft/spark/bin/spark-shell |
| |
| |
| |
| |
| docker exec -it HadoopSpark bash |
| |
| |
| |
| /root/soft/hadoop/sbin/start-all.sh |
| |
| |
| |
| |
| /root/soft/scala/bin/scala |
| |
7.目录结构
| [root@docker container-spark] |
| total 1189076 |
| -rw-r--r-- 1 root root 392 Oct 2 22:53 authorized_keys |
| -rw-r--r-- 1 root root 1394 Oct 2 22:53 core-site.xml |
| -rw-r--r-- 1 root root 2972 Oct 3 10:30 Dockerfile |
| -rw-r--r-- 1 root root 945 Oct 2 22:53 docker-hadoopspark.sh |
| -rw-r--r-- 1 root root 739 Oct 2 22:53 entrypoint.sh |
| -rw-r--r-- 1 root root 695457782 Jul 30 02:11 hadoop-3.3.4.tar.gz |
| -rw-r--r-- 1 root root 17007 Oct 2 22:53 hadoop-env.sh |
| -rw-r--r-- 1 root root 923 Oct 2 22:53 hdfs-site.xml |
| -rw-r--r-- 1 root root 1679 Oct 2 22:53 id_rsa |
| -rw-r--r-- 1 root root 191817140 Sep 30 16:50 jdk-8u201-linux-x64.tar.gz |
| -rw-r--r-- 1 root root 2471 Oct 2 22:53 log4j.properties |
| -rw-r--r-- 1 root root 1356 Oct 2 22:53 mapred-site.xml |
| -rw-r--r-- 1 root root 29114457 Nov 10 2017 scala-2.11.12.tgz |
| -rw-r--r-- 1 root root 13 Oct 2 22:53 slaves |
| -rw-r--r-- 1 root root 301112604 Jul 12 00:18 spark-3.2.2-bin-hadoop3.2.tgz |
| -rw-r--r-- 1 root root 1549 Oct 2 22:53 spark-defaults.conf |
| -rw-r--r-- 1 root root 5036 Oct 2 22:53 spark-env.sh |
| -rw-r--r-- 1 root root 8895 Oct 2 23:13 wget-log |
| -rw-r--r-- 1 root root 19 Oct 2 22:53 workers |
| -rw-r--r-- 1 root root 2074 Oct 2 22:53 yarn-site.xml |
8.spark 提交任务进行测试
| |
| |
| /export/server/spark/bin/spark-submit \ |
| --master yarn \ |
| --deploy-mode client \ |
| --driver-memory 512m \ |
| --executor-memory 512m \ |
| --num-executors 1 \ |
| --class org.apache.spark.examples.SparkPi \ |
| /export/server/spark/examples/jars/spark-examples_2.12-3.2.2.jar \ |
| 10 |
| |
| cluster |
| |
| /export/server/spark/bin/spark-submit \ |
| --master yarn \ |
| --deploy-mode cluster \ |
| --driver-memory 512m \ |
| --executor-memory 512m \ |
| --num-executors 1 \ |
| --class org.apache.spark.examples.SparkPi \ |
| /export/server/spark/examples/jars/spark-examples_2.12-3.2.2.jar \ |
| 10 |
9.提交任务成功样子

