hadoop+yarn+hbase+storm+kafka+spark+zookeeper)高可用集群详细配置
配置 hadoop+yarn+hbase+storm+kafka+spark+zookeeper 高可用集群,同时安装相关组建:JDK,MySQL,Hive,Flume
文章目录
环境介绍
节点介绍
-
虚拟机数量:8 台
-
操作系统版本:CentOS-7-x86_64-Minimal-1611.iso
每台虚拟机的配置如下:
虚拟机名称 | CPU核心数 | 内存(G) | 硬盘(G) | 网卡 |
---|---|---|---|---|
hadoop1 | 2 | 8 | 100 | 2 |
hadoop2 | 2 | 8 | 100 | 2 |
hadoop3 | 2 | 8 | 100 | 2 |
hadoop4 | 2 | 8 | 100 | 2 |
hadoop5 | 2 | 8 | 100 | 2 |
hadoop6 | 2 | 8 | 100 | 2 |
hadoop7 | 2 | 8 | 100 | 2 |
hadoop8 | 2 | 8 | 100 | 2 |
集群介绍
8节点Hadoop+Yarn+Spark+Hbase+Kafka+Storm+ZooKeeper高可用集群搭建:
集群 | 虚拟机节点 |
---|---|
HadoopHA集群 | hadoop1,hadoop2,hadoop3,hadoop4,hadoop5,hadoop6,hadoop7,hadoop8 |
YarnHA集群 | hadoop1,hadoop2,hadoop3,hadoop4,hadoop5,hadoop6,hadoop7,hadoop8 |
ZooKeeper集群 | hadoop3,hadoop4,hadoop5 |
Hbase集群 | hadoop3,hadoop4,hadoop5,hadoop6,hadoop7 |
Kafka集群 | hadoop6,hadoop7,hadoop8 |
Storm集群 | hadoop3,hadoop4,hadoop5,hadoop6,hadoop7 |
SparkHA集群 | hadooop1,hadoop2,hadoop3,hadoop4,hadoop5,hadoop6,hadoop7,hadoop8 |
集群详细规划:
虚拟机名称 | IP | 安装软件 | 进程 | 功能 |
---|---|---|---|---|
hadoop1 | 59.68.29.79 | jdk,hadoop,mysql | NameNode,ResourceManeger,DFSZKFailoverController(zkfc),master(spark) | hadoop的NameNode节点,spark的master节点,yarn的ResourceManeger节点 |
hadoop2 | 10.230.203.11 | jdk,hadoop,spark | NameNode,ResourceManeger,DFSZKFailoverController(zkfc),worker(spark) | hadoop(yarn)的容灾节点,spark的容灾节点 |
hadoop3 | 10.230.203.12 | jdk,hadoop,zookeeper,hbase,storm,spark | DataNode,NodeManager,journalnode,QuorumPeerMain(zk),HMaster,…(storm),worker(spark) | storm,hbase,zookeeper的主节点 |
hadoop4 | 10.230.203.13 | jdk,hadoop,zookeeper,hbase,storm,spark | DataNode,NodeManager,journalnode,QuorumPeerMain(zk),HRegionServer,…(storm),worker(spark) | |
hadoop5 | 10.230.203.14 | jdk,hadoop,zookeeper,hbase,storm,spark | DataNode,NodeManager,journalnode,QuorumPeerMain(zk),HRegionServer,…(storm),worker(spark) | |
hadoop6 | 10.230.203.15 | jdk,hadoop,hbase,storm,kafka,spark | DataNode,NodeManager,journalnode,kafka,HRegionServer,…(storm),worker(spark) | kafka的主节点 |
hadoop7 | 10.230.203.16 | jdk,hadoop,hbase,storm,kafka,spark | DataNode,NodeManager,journalnode,kafka,HRegionServer,…(storm),worker(spark) | |
hadoop8 | 10.230.203.17 | jdk,hadoop,kafka,spark | DataNode,NodeManager,journalnode,kafka,worker(spark) |
软件版本介绍
-
JDK版本: jdk-8u65-linux-x64.tar.gz
-
hadoop版本: hadoop-2.7.6.tar.gz
-
zookeeper版本: zookeeper-3.4.12.tar.gz
-
hbase版本: hbase-1.2.6-bin.tar.gz
-
Storm版本: apache-storm-1.1.3.tar.gz
-
kafka版本: kafka_2.11-2.0.0.tgz
-
MySQL版本: mysql-5.6.41-linux-glibc2.12-x86_64.tar.gz
-
hive版本: apache-hive-2.3.3-bin.tar.gz
-
Flume版本: apache-flume-1.8.0-bin.tar.gz
-
Spark版本: spark-2.3.1-bin-hadoop2.7.tgz
前期准备
相关配置
每台主机节点都进行相同设置
新建用户 centos
千万注意:不要在root权限下配置集群
- 新建 centos 用户组
$> groupadd centos
- 1
- 新建用户 centos,并将该用户添加到用户组 centos
$> useradd centos -g centos
- 1
- 为 centos 用户设置密码
$> passwd centos
- 1
添加sudo权限
- 切换到root用户,修改 /etc/sudoers 文件
$> nano /etc/sudoers
添加如下语句:
## Allow root to run any commands anywhere
root ALL=(ALL) ALL
centos ALL=(ALL) ALL
- 1
- 2
- 3
- 4
- 5
- 6
更改用户名
- 进入 /etc/hostname 下,将原来的内容删除,添加新的用户名
$> sudo nano /etc/hostname
用户名:hadoop1,hadoop2.....
- 1
- 2
- 3
主机名与IP映射
- 进入 /etc/hosts,将原来的内容删除,添加主机节点之间的相互映射
$> sudo nano /etc/hosts
添加内容如下:
127.0.0.1 localhost
59.68.29.79 hadoop1
10.230.203.11 hadoop2
10.230.203.12 hadoop3
10.230.203.13 hadoop4
10.230.203.14 hadoop5
10.230.203.15 hadoop6
10.230.203.16 hadoop7
10.230.203.17 hadoop8
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
显示当前文件的绝对路径
命令:pwd。形如 ~ 转换为 /home/centos。方便确定当前文件的路径
- 进入 /etc/profile 中进行配置
[centos@hadoop1 ~]$ sudo nano /etc/profile
在末尾添加:
export PS1='[\u@\h `pwd`]\$'
// source /etc/profile 马上生效
[centos@hadoop1 /home/centos]$
- 1
- 2
- 3
- 4
- 5
- 6
- 7
ssh免密登录
hadoop1 和 hadoop2 是容灾节点(解决单点故障问题),所以这两个主机除了能互相访问之外,还需要登录其他主机节点,可以免密登录
- 检查是否安装了ssh相关软件包(openssh-server + openssh-clients + openssh)
[centos@hadoop1 /home/centos]$ yum list installed | grep ssh
- 1
- 检查是否启动了sshd进程
[centos@hadoop1 /home/centos]$ ps -Af | grep sshd
- 1
- 在hadoop1~hadoop8主机节点的 ~(/home/centos) 目录下创建 .ssh 文件目录,并修改权限
[centos@hadoop1 /home/centos]$ mkdir .ssh
[centos@hadoop1 /home/centos]$ chmod 700 ~/.ssh
- 1
- 2
- 在hadoop1主机上生成秘钥对,追加公钥到~/.ssh/authorized_keys文件中,并修改authorized_keys文件的权限为644(centos系统)
//生成秘钥对
[centos@hadoop1 /home/centos]$ ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
//进入 ~/.ssh 文件夹下
[centos@hadoop1 /home/centos]$ cd ~/.ssh
//追加公钥到~/.ssh/authorized_keys文件中
[centos@hadoop1 /home/centos/.ssh]$ cat id_rsa.pub >> authorized_keys
// 修改authorized_keys文件的权限为644
[centos@hadoop1 /home/centos/.ssh]$ chmod 644 authorized_keys
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 将hadoop1的公钥文件id_rsa.pub远程复制给其他7台主机节点,并放置在/home/centos/.ssh/authorized_keys下
//重名名
[centos@hadoop2 /home/centos/.ssh]$ mv id_rsa.pub id_rsa_hadoop1.pub
[centos@hadoop1 /home/centos/.ssh]$ scp id_rsa_hadoop1.pub centos@hadoop2:/home/centos/.ssh/authorized_keys
[centos@hadoop1 /home/centos/.ssh]$ scp id_rsa_hadoop1.pub centos@hadoop3:/home/centos/.ssh/authorized_keys
[centos@hadoop1 /home/centos/.ssh]$ scp id_rsa_hadoop1.pub centos@hadoop4:/home/centos/.ssh/authorized_keys
[centos@hadoop1 /home/centos/.ssh]$ scp id_rsa_hadoop1.pub centos@hadoop5:/home/centos/.ssh/authorized_keys
[centos@hadoop1 /home/centos/.ssh]$ scp id_rsa_hadoop1.pub centos@hadoop6:/home/centos/.ssh/authorized_keys
[centos@hadoop1 /home/centos/.ssh]$ scp id_rsa_hadoop1.pub centos@hadoop7:/home/centos/.ssh/authorized_keys
[centos@hadoop1 /home/centos/.ssh]$ scp id_rsa_hadoop1.pub centos@hadoop8:/home/centos/.ssh/authorized_keys
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 在hadoop2主机上生成秘钥对。为了与hadoop1的公钥区分,重命名为 id_rsa_hadoop2.pub。追加公钥到~/.ssh/authorized_keys文件中,并分发给其他7台主机节点
//生成秘钥对
[centos@hadoop2 /home/centos]$ ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
//重名名
[centos@hadoop2 /home/centos/.ssh]$ mv id_rsa.pub id_rsa_hadoop2.pub
//追加公钥到~/.ssh/authorized_keys文件中
[centos@hadoop1 /home/centos/.ssh]$ cat id_rsa_hadoop2.pub >> authorized_keys
//将authorized_keys分发给其他节点
[centos@hadoop1 /home/centos/.ssh]$ scp authorized_keys centos@hadoop:/home/centos/.ssh/
... 分发给其他节点
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
关闭防火墙
为了保证集群正常启动,先要关闭各台主机的防火墙,一些命令如下:
[cnetos 6.5之前的版本]
$>sudo service firewalld stop //停止服务
$>sudo service firewalld start //启动服务
$>sudo service firewalld status //查看状态
[centos7]
$>sudo systemctl enable firewalld.service //"开机启动"启用
$>sudo systemctl disable firewalld.service //"开机自启"禁用
$>sudo systemctl start firewalld.service //启动防火墙
$>sudo systemctl stop firewalld.service //停止防火墙
$>sudo systemctl status firewalld.service //查看防火墙状态
[开机自启]
$>sudo chkconfig firewalld on //"开启自启"启用
$>sudo chkconfig firewalld off //"开启自启"禁用
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
两个批处理脚本
提示:为了全局可用,脚本都放在 /usr/local/bin 目录下。只在hadoop1和hadoop2节点配置
//以本地用户身份创建xcall.sh
$>touch ~/xcall.sh //centos
//将其复制到 /usr/local/bin 目录下
$>sudo mv xcall.sh /usr/local/bin
//修改权限
$>sudo chmod a+x xcall.sh
//添加脚本
$>sudo nano xcall.sh
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
批分发指令脚本(xcall.sh)
#!/bin/bash
params=$@
i=1
for (( i=1 ; i <= 8 ; i = $i + 1 )) ; do
echo ============= s$i $params =============
ssh hadoop$i "$params"
done
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
批同步脚本(xsync.sh):类似于 scp 指令
#!/bin/bash
if [[ $# -lt 1 ]] ; then echo no params ; exit ; fi
p=$1
#echo p=$p
dir=`dirname $p`
#echo dir=$dir
filename=`basename $p`
#echo filename=$filename
cd $dir
fullpath=`pwd -P .`
#echo fullpath=$fullpath
user=`whoami`
for (( i = 1 ; i <= 8 ; i = $i + 1 )) ; do
echo ======= hadoop$i =======
rsync -lr $p ${user}@hadoop$i:$fullpath
done ;
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
集群环境搭建
安装JDK
-
准备JDK:jdk-8u65-linux-x64.tar.gz,将其上传到主机hadoop1的 /home/centos/localsoft 目录下,该目录用于存放所有需要安装的软件安装包
-
在根目录下(/)新建一个 soft 文件夹,并将该文件夹的用户组权限和用户权限改为 centos,该文件夹下为所有需要安装的软件
//创建soft文件夹
[centos@hadoop1 /home/centos]$ sudo mkdir /soft
//修改权限(centosmin0是自己的本机用户名)
[centos@hadoop1 /home/centos]$ sudo chown centos:centos /soft
- 1
- 2
- 3
- 4
- 5
- 解压 jdk-8u65-linux-x64.tar.gz 到 /soft 目录下,并创建符号链接
// 从 /home/centos/localsoft 下解压到 /soft
[centos@hadoop1 /home/centos/localsoft]$ tar -xzvf jdk-8u65-linux-x64.tar.gz -C /soft
// 创建符号链接
[centos@hadoop1 /soft]$ ln -s /soft/jdk1.8.0_65 jdk
- 1
- 2
- 3
- 4
- 5
- 在 /etc/profile 文件中配置环境变量,同时 source /etc/profile,使其立即生效
// 进入profile
[centos@hadoop1 /home/centos]$ sudo nano /etc/profile
// 环境变量
# jdk
export JAVA_HOME=/soft/jdk
export PATH=$PATH:$JAVA_HOME/bin
// source 立即生效
[centos@hadoop1 /home/centos]$ source /etc/profile
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 检验是否安装配置成功
[centos@hadoop1 /home/centos]$ java -version
// 显示如下
java version "1.8.0_65"
Java(TM) SE Runtime Environment (build 1.8.0_65-b17)
Java HotSpot(TM) 64-Bit Server VM (build 25.65-b01, mixed mode)
- 1
- 2
- 3
- 4
- 5
- 6
- 按照以上步骤配置其他主句(hadoop2~hadoop8):可以使用批分发指令(xsync.sh分发文件给其他7台主机节点)
Hadoop安装配置(手动HA搭建)
1. hadoop安装配置
- 准备hadoop:hadoop-2.7.6.tar.gz,解压到 /soft 目录下,创建符号链接
// 从 /home/centos/localsoft 下解压到 /soft
[centos@hadoop1 /home/centos/localsoft]$ tar -xzvf hadoop-2.7.6.tar.gz -C /soft
// 创建符号链接
[centos@hadoop1 /soft]$ ln -s /soft/hadoop-2.7.6 hadoop
- 1
- 2
- 3
- 4
- 5
- 在 /etc/profile 下配置环境变量,source /etc/profile 立即生效,使用 hadoop version 检测是否安装成功
// 进入profile
[centos@hadoop1 /home/centos]$ sudo nano /etc/profile
// 环境变量
# hadoop
export HADOOP_HOME=/soft/hadoop
export PATH=$PATH:$HADOOP_HOME/bin/:$HADOOP_HOME/sbin
// source 立即生效
[centos@hadoop1 /home/centos]$ source /etc/profilea
// 检测是否安装成功
[centos@hadoop1 /home/centos]$ hadoop version
显示如下:
Hadoop 2.7.6
Subversion https://shv@git-wip-us.apache.org/repos/asf/hadoop.git -r 085099c66cf28be31604560c376fa282e69282b8
Compiled by kshvachk on 2018-04-18T01:33Z
Compiled with protoc 2.5.0
From source with checksum 71e2695531cb3360ab74598755d036
This command was run using /soft/hadoop-2.7.6/share/hadoop/common/hadoop-common-2.7.6.jar
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
提示: 现在的操作在hadoop1节点上,先不用在其他节点进行安装配置,等后续配置结束后再一起将配置传给其他节点,能大大节省工作量。
2. hadoop手动NameNode HA搭建
基于hadoop的原生NameNode HA搭建,后面会与zookeeper集群进行整合,实现自动容灾(Yarn+NameNode)
- 进入 /soft/hadoop/etc 目录,复制 hadoop 文件为:full,ha,pesudo,并创建指向ha的符号链接hadoop
[centos@hadoop1 /soft/hadoop/etc]$ cp hadoop ha
[centos@hadoop1 /soft/hadoop/etc]$ cp hadoop full
[centos@hadoop1 /soft/hadoop/etc]$ cp hadoop pesudo
// 创建符号链接
[centos@hadoop1 /soft/hadoop/etc]$ ln -s /soft/hadoop/etc/ha hadoop
- 1
- 2
- 3
- 4
- 5
- 6
- 进入 ha 目录下配置4个文件:core-site.xml;hdfs-site.xml;mapred-site.xml;yarn-site.xml
[core-site.xml]
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://mycluster</value>
</property>
<!--- 配置新的本地目录 -->
<property>
<name>hadoop.tmp.dir</name>
<value>/home/centos/hadoop</value>
</property>
<property>
<name>ipc.client.connect.max.retries</name>
<value>20</value>
</property>
<property>
<name>ipc.client.connect.retry.interval</name>
<value>5000</value>
</property>
</configuration>
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
[hdfs-site.xml]
<configuration>
<!-- 配置nameservice -->
<property>
<name>dfs.nameservices</name>
<value>mycluster</value>
</property>
<!-- myucluster下的名称节点两个id -->
<property>
<name>dfs.ha.namenodes.mycluster</name>
<value>nn1,nn2</value>
</property>
<!-- 配置每个nn的rpc地址 -->
<property>
<name>dfs.namenode.rpc-address.mycluster.nn1</name>
<value>hadoop1:8020</value>
</property>
<property>
<name>dfs.namenode.rpc-address.mycluster.nn2</name>
<value>hadoop2:8020</value>
</property>
<!-- 配置webui端口 -->
<property>
<name>dfs.namenode.http-address.mycluster.nn1</name>
<value>hadoop1:50070</value>
</property>
<property>
<name>dfs.namenode.http-address.mycluster.nn2</name>
<value>hadoop2:50070</value>
</property>
<!-- 名称节点共享编辑目录 -->
<property>
<name>dfs.namenode.shared.edits.dir</name>
<value>qjournal://hadoop3:8485;hadoop4:8485;hadoop5:8485;hadoop6:8485;hadoop7:8485;hadoop8:8485/mycluster</value>
</property>
<!-- java类,client使用它判断哪个节点是激活态 -->
<property>
<name>dfs.client.failover.proxy.provider.mycluster</name>
<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>