Hadoop完全高可用集群安装
架构图(HA模型没有SNN节点)
用vm规划了8台机器,用到了7台,SNN节点没用
|
NN
|
DN
|
SN
|
ZKFC
|
ZK
|
JNN
|
RM
|
NM
|
node1
|
*
|
|
|
*
|
|
|
|
|
node2
|
*
|
|
|
*
|
|
|
|
|
node3
|
|
|
|
|
|
|||
node4
|
|
|
|
*
|
|
|
*
|
|
node5
|
|
|
|
*
|
|
|
*
|
|
node6
|
|
*
|
|
|
*
|
*
|
|
*
|
node7
|
|
*
|
|
|
*
|
*
|
|
*
|
node8
|
|
*
|
|
|
*
|
*
|
|
*
|
集群搭建前准备工作:
*搭建集群之前需要关闭所有服务器的selinux和防火墙
1.更改所有服务器的主机名和hosts文件对应关系
1 2 3 4 5 6 7 8 9 10 11 12 13 | [root@localhost ~] # hostnamectl set-hostname node1 [root@localhost ~] # cat /etc/hosts 127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4 ::1 localhost localhost.localdomain localhost6 localhost6.localdomain6 192.168.159.129 node1 192.168.159.130 node2 192.168.159.132 node3 192.168.159.133 node4 192.168.159.136 node5 192.168.159.137 node6 192.168.159.138 node7 192.168.159.139 node8 |
2.两个NameNode节点做对所有主机的免密登陆,包括自己的节点;两个resourcemanager节点互相做免密登陆,包括自己的节点
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 | [root@localhost ~] # ssh-keygen Generating public /private rsa key pair. Enter file in which to save the key ( /root/ . ssh /id_rsa ): Created directory '/root/.ssh' . Enter passphrase (empty for no passphrase): Enter same passphrase again: Your identification has been saved in /root/ . ssh /id_rsa . Your public key has been saved in /root/ . ssh /id_rsa .pub. The key fingerprint is: SHA256:lIvGygyJHycNTZJ0KeuE /BM0BWGGq/UTgMUQNo7Qm2M root@node1 The key's randomart image is: +---[RSA 2048]----+ |+@=**o | |*.XB. . | |oo+*o o | |.+E=.. o . | |o=*o+.+ S | |...Xoo | | . =. | | | | | +----[SHA256]-----+ [root@localhost ~] # for i in `seq 1 8`;do ssh-copy-id root@node$i;done |
3.同步所有服务器时间
1 2 3 4 5 6 7 8 | [root@node1 ~] # ansible all -m shell -o -a 'ntpdate ntp1.aliyun.com' node4 | CHANGED | rc=0 | (stdout) 20 Feb 16:08:37 ntpdate[2477]: adjust time server 120.25.115.20 offset 0.001546 sec node6 | CHANGED | rc=0 | (stdout) 20 Feb 16:08:37 ntpdate[2470]: adjust time server 120.25.115.20 offset 0.000220 sec node2 | CHANGED | rc=0 | (stdout) 20 Feb 16:08:37 ntpdate[2406]: adjust time server 120.25.115.20 offset -0.002414 sec node3 | CHANGED | rc=0 | (stdout) 20 Feb 16:08:37 ntpdate[2465]: adjust time server 120.25.115.20 offset -0.001185 sec node5 | CHANGED | rc=0 | (stdout) 20 Feb 16:08:37 ntpdate[2466]: adjust time server 120.25.115.20 offset 0.005768 sec node7 | CHANGED | rc=0 | (stdout) 20 Feb 16:08:43 ntpdate[2503]: adjust time server 120.25.115.20 offset 0.000703 sec node8 | CHANGED | rc=0 | (stdout) 20 Feb 16:08:43 ntpdate[2426]: adjust time server 120.25.115.20 offset -0.001338 sec |
4.所有服务器安装JDK环境并配置好环境变量
1 2 3 4 5 6 7 8 9 10 11 12 13 | [root@node1 ~] # tar -xf jdk-8u144-linux-x64.gz -C /usr/ [root@node1 ~] # ln -sv /usr/jdk1.8.0_144/ /usr/java "/usr/java" -> "/usr/jdk1.8.0_144/" [root@node1 ~] # cat /etc/profile.d/java.sh export JAVA_HOME= /usr/java export PATH=$PATH:$JAVA_HOME /bin [root@node1 ~] # source /etc/profile.d/java.sh [root@node1 ~] # java -version java version "1.8.0_144" Java(TM) SE Runtime Environment (build 1.8.0_144-b01) Java HotSpot(TM) 64-Bit Server VM (build 25.144-b01, mixed mode) |
zookeeper集群搭建
在规划好的6、7、8节点上安装zookeeper(JDK环境要准备好)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 | #解压zookeeper程序到/usr目录下 [root@node6 ~] # tar xf zookeeper-3.4.6.tar.gz -C /usr/ #创建zookeeper存放数据目录 [root@node6 ~] # mkdir /usr/data/zookeeper #将zookeeper的conf目录下sample配置文件更改成cfg文件 [root@node6 ~] # cp /usr/zookeeper-3.4.6/conf/zoo_sample.cfg /usr/zookeeper-3.4.6/conf/zoo.cfg #编辑配置文件,更改数据存放目录,并添加zookeeper集群配置信息 [root@node6 ~] # vim /usr/zookeeper-3.4.6/conf/zoo.cfg dataDir= /usr/data/zookeeper #修改 server.1=node6:2888:3888 #添加 server.2=node7:2888:3888 #添加 server.3=node8:2888:3888 #添加 #把配置好的zookeeper程序文件分发至其余的两个节点 [root@node6 ~] # scp -r /usr/zookeeper-3.4.6/ node7:/usr/zookeeper-3.4.6/ [root@node6 ~] # scp -r /usr/zookeeper-3.4.6/ node8:/usr/zookeeper-3.4.6/ #在刚刚创建的目录下当前zookeeper节点信息,必须为数字,且三个节点不能相同 [root@node6 ~] # echo 1 > /usr/data/zookeeper/myid #在剩下的两个节点上也要创建数据存放目录和节点配置文件 [root@node7 ~] # mkdir /usr/data/zookeeper [root@node7 ~] # echo 2 > /usr/data/zookeeper/myid [root@node8 ~] # mkdir /usr/data/zookeeper [root@node8 ~] # echo 3 > /usr/data/zookeeper/myid #配置完成后启动zookeeper集群 [root@node6 ~] # /usr/zookeeper-3.4.6/bin/zkServer.sh start [root@node7 ~] # /usr/zookeeper-3.4.6/bin/zkServer.sh start [root@node8 ~] # /usr/zookeeper-3.4.6/bin/zkServer.sh start #查看集群启动情况(先启动的会成为leader,同时启动数字大的会成为leader) [root@node6 ~] # /usr/zookeeper-3.4.6/bin/zkServer.sh status JMX enabled by default Using config: /usr/zookeeper-3 .4.6 /bin/ .. /conf/zoo .cfg Mode: follower [root@node7 ~] # /usr/zookeeper-3.4.6/bin/zkServer.sh status JMX enabled by default Using config: /usr/zookeeper-3 .4.6 /bin/ .. /conf/zoo .cfg Mode: follower [root@node8 ~] # /usr/zookeeper-3.4.6/bin/zkServer.sh status JMX enabled by default Using config: /usr/zookeeper-3 .4.6 /bin/ .. /conf/zoo .cfg Mode: leader [root@node8 ~] # netstat -tnlp | grep java #只有主节点有2888 tcp6 0 0 :::2181 :::* LISTEN 33766 /java tcp6 0 0 192.168.159.139:2888 :::* LISTEN 33766 /java tcp6 0 0 192.168.159.139:3888 :::* LISTEN 33766 /java tcp6 0 0 :::43793 :::* LISTEN 33766 /java |
Hadoop集群搭建
1.先添加hadoop的环境变量
1 2 3 | [root@node1 ~] # cat /etc/profile.d/hadoop.sh export HADOOP_HOME= /usr/hadoop-2 .9.2 export PATH=$PATH:$HADOOP_HOME /bin :$HADOOP_HOME /sbin |
2.解压hadoop程序包到/usr目录下
1 2 3 | [root@node1 ~] # tar xf hadoop-2.9.2.tar.gz -C /usr [root@node1 ~] # ln -sv /usr/hadoop-2.9.2/ /usr/hadoop "/usr/hadoop" -> "/usr/hadoop-2.9.2/" |
3.更改hadoop程序包内 hadoop-env.sh,mapred-env.sh,yarn-env.sh中的JAVA_HOME环境变量
1 2 3 4 | [root@node1 ~] # grep 'export JAVA_HOME' /usr/hadoop/etc/hadoop/{hadoop-env.sh,mapred-env.sh,yarn-env.sh} /usr/hadoop/etc/hadoop/hadoop-env .sh: export JAVA_HOME= /usr/java /usr/hadoop/etc/hadoop/mapred-env .sh: export JAVA_HOME= /usr/java /usr/hadoop/etc/hadoop/yarn-env .sh: export JAVA_HOME= /usr/java |
4.修改core-site.xml文件(NameNode配置文件)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 | [root@node1 ~] # vim /usr/hadoop/etc/hadoop/core-site.xml <configuration> <property> <name>fs.defaultFS< /name > <value>hdfs: //hadoop < /value > <!--HA部署下,NameNode访问hdfs-site.xml中的dfs.nameservices值 --> < /property > <property> <name>hadoop.tmp. dir < /name > <value> /usr/data/hadoop < /value > <!--Hadoop的文件存放目录 --> < /property > <property> <name>ha.zookeeper.quorum< /name > <value>node6:2181,node7:2181,node8:2181< /value > <!--zookeeper集群地址 --> < /property > < /configuration > |
5.在所有hadoop节点创建/usr/data/hadoop目录
6.修改hdfs-site.xml文件
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 | [root@node1 ~] # vim /usr/hadoop/etc/hadoop/hdfs-site.xml <configuration> <property> <name>dfs.replication< /name > <value>3< /value > <!--数据文件副本数量--> < /property > <property> <name>dfs.blocksize< /name > <value>134217728< /value > <!--数据块大小,文件超过这个大小就会切开,128M --> < /property > <property> <name>dfs.permissions.enabled< /name > <value> false < /value > <!-- **** --> < /property > <property> <name>dfs.nameservices< /name > <value>hadoop< /value > <!--这个值就是core-site.xml中hdfs集群入口 --> < /property > <property> <name>dfs.ha.namenodes.hadoop< /name > <value>nn1,nn2< /value > <!--集群中一共有两个namenode --> < /property > <property> <name>dfs.namenode.rpc-address.hadoop.nn1< /name > <value>node1:9000< /value > <!--nn1的rpc通信地址 --> < /property > <property> <name>dfs.namenode.http-address.hadoop.nn1< /name > <value>node1:50070< /value > <!--nn1的http通信地址 --> < /property > <property> <name>dfs.namenode.rpc-address.hadoop.nn2< /name > <value>node2:9000< /value > <!--nn2的rpc通信地址 --> < /property > <property> <name>dfs.namenode.http-address.hadoop.nn2< /name > <value>node2:50070< /value > <!--nn2的http通信地址 --> < /property > <property> <name>dfs.namenode.shared.edits. dir < /name > <value>qjournal: //node6 :8485;node7:8485;node8:8485 /hadoop < /value > <!-- 指定NameNode的元数据在JournalNode日志上的存放位置(一般和zookeeper部署在一起)--> <!-- 存储路径可以随便起,如果有多个集群,不一样就行--> < /property > <property> <name>dfs.ha.automatic-failover.enabled< /name > <value> true < /value > <!--是否开启故障自动隔离--> < /property > <property> <name>dfs.journalnode.edits. dir < /name > <value> /usr/data/journalnode < /value > <!-- 指定JournalNode在本地磁盘存放数据的位置,这个需要指定,默认是放在tmp目录下 --> < /property > <property> <name>dfs.client.failover.proxy.provider.hadoop< /name > <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider< /value > <!-- namenode故障转移实现的代理类,注意 "name键" 要改动--> < /property > <property> <name>dfs.ha.fencing.methods< /name > <value>sshfence< /value > <!--故障自动转移的方法,这里选用 ssh 远程登陆方法--> < /property > <property> <name>dfs.ha.fencing. ssh .private-key-files< /name > <value> /root/ . ssh /id_rsa < /value > <!--选用了 ssh 远程登陆就需要 ssh 密钥,两台namenode需要互相做密钥认证--> < /property > <property> <name>dfs.ha.fencing. ssh .connect-timeout< /name > <value>30000< /value > <!--配置 ssh 超时时间--> < /property > < /configuration > |
7.在journalnode节点创建/usr/data/journalnode目录
8.修改mapred-site.xml( 修改mapred-site.xml.template名称为mapred-site.xml)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | [root@node1 ~] # vim /usr/hadoop/etc/hadoop/mapred-site.xml <configuration> <property> <name>mapreduce.framework.name< /name > <value>yarn< /value > < /property > <property> <name>mapreduce.jobhistory.address< /name > <value>node3:10020< /value > < /property > <property> <name>mapreduce.jobhistory.webapp.address< /name > <value>node3:19888< /value > < /property > < /configuration > |
9.修改yarn-site.xml
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 | [root@node1 ~] # vim /usr/hadoop/etc/hadoop/yarn-site.xml <configuration> <property> <name>yarn.nodemanager.aux-services< /name > <value>mapreduce_shuffle< /value > < /property > <property> <name>yarn.resourcemanager.ha.enabled< /name > <value> true < /value > <!--是否开启 rm 的高可用--> < /property > <property> <name>yarn.resourcemanager.cluster- id < /name > <value>rmcluster< /value > <!--生成 rm 集群的唯一标识,name键不需要改动 --> < /property > <property> <name>yarn.resourcemanager.ha. rm -ids< /name > <value>rm1,rm2< /value > <!-- rm 集群的两台机器名称 --> < /property > <property> <name>yarn.resourcemanager. hostname .rm1< /name > <value>node4< /value > <!--rm1的机器地址 --> < /property > <property> <name>yarn.resourcemanager.webapp.address.rm1< /name > <value>node4:8088< /value > <!--rm1的网页访问地址 --> < /property > <property> <name>yarn.resourcemanager. hostname .rm2< /name > <value>node5< /value > <!--rm2的机器地址 --> < /property > <property> <name>yarn.resourcemanager.webapp.address.rm2< /name > <value>node5:8088< /value > <!--rm2的网页访问地址 --> < /property > <property> <name>yarn.resourcemanager.zk-address< /name > <value>node6:2181,node7:2181,node8:2181< /value > <!--指定zookeeper集群的地址--> < /property > <property> <name>yarn.resourcemanager.recovery.enabled< /name > <value> true < /value > <!--启用自动恢复,默认是 false --> < /property > <property> <name>yarn.resourcemanager.store.class< /name > <value>org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore< /value > <!--指定resourcemanager的状态信息存储在zookeeper集群,默认是存放在FileSystem里--> < /property > < /configuration > |
10.编辑datanode配置文件(也是nodemanager的启动位置)
1 2 3 4 | [root@node1 ~] # vim /usr/hadoop/etc/hadoop/slaves node6 node7 node8 |
仅首次初始化时需要的步骤如下:
1.首先启动三台journalnode集群
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 | [root@node6 ~] # hadoop-daemon.sh start journalnode starting journalnode, logging to /usr/hadoop-2 .9.2 /logs/hadoop-root-journalnode-node6 .out [root@node6 ~] # jps 2965 Jps 2904 JournalNode 2779 QuorumPeerMain [root@node7 ~] # hadoop-daemon.sh start journalnode starting journalnode, logging to /usr/hadoop-2 .9.2 /logs/hadoop-root-journalnode-node7 .out [root@node7 ~] # jps 2119 QuorumPeerMain 2220 JournalNode 2318 Jps [root@node8 ~] # hadoop-daemon.sh start journalnode starting journalnode, logging to /usr/hadoop-2 .9.2 /logs/hadoop-root-journalnode-node8 .out [root@node8 ~] # jps 2229 Jps 2025 QuorumPeerMain 2153 JournalNode |
2.格式化NameNode主节点
1 | [root@node1 ~] # hadoop namenode -format |
3.启动NameNode主节点
1 2 3 4 5 | [root@node1 ~] # hadoop-daemon.sh start namenode starting namenode, logging to /usr/hadoop-2 .9.2 /logs/hadoop-root-namenode-node1 .out [root@node1 ~] # jps 7302 Jps 7225 NameNode |
4.格式化NameNode从节点
1 | [root@node2 ~] # hadoop namenode -bootstrapStandby |
5.NameNode主节点向zookeeper提交初始化节点信息
1 | [root@node1 ~] # hdfs zkfc -formatZK |
5.1可以在zookeeper节点上使用zkCli.sh命令查看hdfs信息
1 2 3 4 5 6 7 8 9 10 | [root@node6 ~] # /usr/zookeeper-3.4.6/bin/zkCli.sh Connecting to localhost:2181 ...... ...... [zk: localhost:2181(CONNECTED) 0] ls / [zookeeper] #namenode还没提交信息的时候 [zk: localhost:2181(CONNECTED) 1] ls / [zookeeper, hadoop-ha] #执行了上面那个提交命令 [zk: localhost:2181(CONNECTED) 2] ls /hadoop-ha/hadoop [] |
6.启动HDFS集群
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 | [root@node1 ~] # start-dfs.sh Starting namenodes on [node1 node2] node2: starting namenode, logging to /usr/hadoop-2 .9.2 /logs/hadoop-root-namenode-node2 .out node1: namenode running as process 7225. Stop it first. node8: starting datanode, logging to /usr/hadoop-2 .9.2 /logs/hadoop-root-datanode-node8 .out node6: starting datanode, logging to /usr/hadoop-2 .9.2 /logs/hadoop-root-datanode-node6 .out node7: starting datanode, logging to /usr/hadoop-2 .9.2 /logs/hadoop-root-datanode-node7 .out Starting journal nodes [node6 node7 node8] node6: journalnode running as process 2904. Stop it first. node7: journalnode running as process 2220. Stop it first. node8: journalnode running as process 2153. Stop it first. Starting ZK Failover Controllers on NN hosts [node1 node2] node2: starting zkfc, logging to /usr/hadoop-2 .9.2 /logs/hadoop-root-zkfc-node2 .out node1: starting zkfc, logging to /usr/hadoop-2 .9.2 /logs/hadoop-root-zkfc-node1 .out [root@node1 ~] # jps 7857 DFSZKFailoverController 7924 Jps 7225 NameNode [root@node2 ~] # jps 2788 Jps 2633 NameNode 2732 DFSZKFailoverController [root@node6 ~] # jps 3235 Jps 3125 DataNode 2904 JournalNode 2779 QuorumPeerMain [root@node7 ~] # jps 2119 QuorumPeerMain 2220 JournalNode 2572 Jps 2462 DataNode [root@node8 ~] # jps 2483 Jps 2373 DataNode 2025 QuorumPeerMain 2153 JournalNode |
7.此时zookeeper上就会有namenode的信息了,只存储主节点信息
以上HDFS高可用集群初始化完成,下面启动yarn集群
1.在namenode主节点上开启yarn集群,start-yarn.sh命令仅可以启动nodemanager,resourcemanager需要在对应节点上手动启动
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 | [root@node1 ~] # start-yarn.sh starting yarn daemons starting resourcemanager, logging to /usr/hadoop-2 .9.2 /logs/yarn-root-resourcemanager-node1 .out node7: starting nodemanager, logging to /usr/hadoop-2 .9.2 /logs/yarn-root-nodemanager-node7 .out node8: starting nodemanager, logging to /usr/hadoop-2 .9.2 /logs/yarn-root-nodemanager-node8 .out node6: starting nodemanager, logging to /usr/hadoop-2 .9.2 /logs/yarn-root-nodemanager-node6 .out [root@node6 ~] # jps 3125 DataNode 3397 NodeManager 3509 Jps 2904 JournalNode 2779 QuorumPeerMain [root@node7 ~] # jps 2736 NodeManager 2848 Jps 2119 QuorumPeerMain 2220 JournalNode 2462 DataNode [root@node8 ~] # jps 2373 DataNode 2646 NodeManager 2758 Jps 2025 QuorumPeerMain 2153 JournalNode |
2.在resourcemanager节点手动启动rm
1 2 3 4 5 6 7 8 9 10 11 | [root@node4 ~] # yarn-daemon.sh start resourcemanager starting resourcemanager, logging to /usr/hadoop-2 .9.2 /logs/yarn-root-resourcemanager-node4 .out [root@node4 ~] # jps 2840 ResourceManager 3103 Jps [root@node5 ~] # yarn-daemon.sh start resourcemanager starting resourcemanager, logging to /usr/hadoop-2 .9.2 /logs/yarn-root-resourcemanager-node5 .out [root@node5 ~] # jps 2994 Jps 2955 ResourceManager |
1 2 | start-dfs.sh start-yarn.sh |
在resourcemanager节点
1 | yarn-daemon.sh start resourcemanager |
【推荐】国内首个AI IDE,深度理解中文开发场景,立即下载体验Trae
【推荐】编程新体验,更懂你的AI,立即体验豆包MarsCode编程助手
【推荐】抖音旗下AI助手豆包,你的智能百科全书,全免费不限次数
【推荐】轻量又高性能的 SSH 工具 IShell:AI 加持,快人一步
· Linux系列:如何用heaptrack跟踪.NET程序的非托管内存泄露
· 开发者必知的日志记录最佳实践
· SQL Server 2025 AI相关能力初探
· Linux系列:如何用 C#调用 C方法造成内存泄露
· AI与.NET技术实操系列(二):开始使用ML.NET
· 没有Manus邀请码?试试免邀请码的MGX或者开源的OpenManus吧
· 无需6万激活码!GitHub神秘组织3小时极速复刻Manus,手把手教你使用OpenManus搭建本
· C#/.NET/.NET Core优秀项目和框架2025年2月简报
· DeepSeek在M芯片Mac上本地化部署
· 葡萄城 AI 搜索升级:DeepSeek 加持,客户体验更智能