hadoop完全分布式搭建(非高可用)
一、准备工作
1.新建虚拟机 固定ip NAT、固定主机名
- 新建虚拟机:
VMware安装Linux详细教程
## 固定主机名:
vi /etc/sysconfig/network
2.关闭防火墙 or 暴露端口
service iptables stop 关闭防火墙
chkconfig iptables off 禁止开启启动
3.必要软件 JDK、Hadoop
4.而配置ssh无密码登录 [每台节点都需要生成]
(1)生成公钥和私钥
ssh-keygen -t rsa
(2)配置hosts文件(/etc/hosts)Ip与hostname的对照关系:
192.168.121.101 node01
192.168.121.102 node02
192.168.121.103 node03
192.168.121.103 node04
...
对于以上的文件,在node01上修改完毕之后,使用scp 命令 远程拷贝给node02 node03
(3)导入公钥到认证文件
ssh-copy-id -i /root/.ssh/id_rsa.pub node01
ssh-copy-id -i /root/.ssh/id_rsa.pub node02
ssh-copy-id -i /root/.ssh/id_rsa.pub node03
ssh-copy-id -i /root/.ssh/id_rsa.pub node04
...
5.配置NTP,使得集群间的时间同步(非必须)
- 安装 NTP:
yum install ntp
- 修改 /etc/ntp.conf文件:
## 注释掉 server开头的行,并添加
restrict 192.168.0.0 mask 255.255.255.0 nomodify notrap
server 127.127.1.0
fudge 127.127.1.0 stratum 10
- 在 node02、03、04添加如下内容:
## 注释掉 server开头的行,并添加
server node01
- 永久启动NTP服务:
service ntpd start&chkconfig ntpd on
6.【建议】删除hadoop的doc文档,400多M占据空间~
二、集群规划
node01 | node02 | node03 | node04 |
---|---|---|---|
NameNode | |||
DataNode | DataNode | DataNode | |
Resoucemanager | |||
SecondaryNameNode | |||
NodeManager | nodemanager | nodemanager | nodemanager |
三、配置的文件
需要配置的文件有7个:
- $HADOOP_HOME/ etc/hadoop/hadopp-env.sh
- $HADOOP_HOME/ etc/hadoop/yarn-env.sh
- $HADOOP_HOME/ etc/hadoop/slaves [hadoop2.x version ] or etc/hadoop/workers [hadoop3.x version]
- $HADOOP_HOME/ etc/hadoop/core-site.xml
- $HADOOP_HOME/ etc/hadoop/hdfs-site.xml
- $HADOOP_HOME/ etc/hadoop/yarn-site.xml
- $HADOOP_HOME/ etc/hadoop/mapred-site.xml
1. 配置 etc/hadoop/hadopp-env.sh
hadoop只会读这个文件配置的jdk。
在hadoop2.x只需要配置jdk,而在hadoop3.x需要配置角色;hadoop3.x对角色有了严格的管理,必须在配置。
export JAVA_HOME=/opt/app/jdk1.8.0_201
export HDFS_NAMENODE_USER=root
export HDFS_DATANODE_USER=root
export HDFS_SECONDARYNAMENODE_USER=root
2. 配置 etc/hadoop/yarn-env.sh
export JAVA_HOME=/opt/app/jdk1.8.0_201
3. 配置 etc/hadoop/slaves|etc/hadoop/workers
node02
node03
node04
4. 配置 etc/hadoop/core-site.xml
<configuration>
<!--说明:hadoop2.x端口默认9000;hadoop3.x端口默认9820-->
<property>
<name>fs.defaultFS</name>
<value>hdfs://node01:9820</value>
</property>
<!--注意:临时目录自己创建下-->
<property>
<name>hadoop.tmp.dir</name>
<value>/opt/tmp/hadoop/full</value>
</property>
</configuration>
5. 配置 etc/hadoop/hdfs-site.xml
<configuration>
<!--说明:不配置副本的情况下默认是3 -->
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
<property>
<!--设置 secondaryNameNode 为 node02节点的虚拟机; hadoop2.x 端口为50090-->
<name>dfs.namenode.secondary.http-address</name>
<value>node02:9868</value>
</property>
<!--关闭 hdfs 读取权限,即不检查权限-->
<property>
<name>dfs.permissions.enabled</name>
<value>false</value>
</property>
</configuration>
6. 配置 etc/hadoop/yarn-site.xml
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<!--指定 resourcemanager 在 node02这台节点上启动-->
<property>
<name>yarn.resourcemanager.hostname</name>
<value>node02</value>
</property>
</configuration>
7. 配置 etc/hadoop/mapred-site.xml
<configuration>
<!--配置 mapreduce的运行的框架名称为 yarn (MR 配置为在 yarn上运行)-->
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
8. 将hadoop分发到其他节点
scp -r /opt/app/hadoop-3.2.0 node02:/opt/app/hadoop-3.2.0
四、启动集群
1 格式化namenode
bin/hdfs namenode -format
2 启动NameNode、SecondaryNameNode与DataNode
## 在 node01启动 namenode
sbin/hadoop-daemon.sh start namenode
## 在 node02启动 secondarynamenode
sbin/haddop-daemon.sh star sencdarynamenode
## 在其他 node02 、03、04 启动 datanode
sbin/hadoop-daemon.sh start datanode
3启动YARN ,ResouceManager 以及NodeManager
## 在 node02节点启动 resourcemanager、nodemanager
sbin/yarn-daemon.sh start resourcemanager
sbin/yarn-daemon.sh start nodemanager
## 在 其他 node01、03、04 节点启动 nodemanager
sbin/yarn-daemon.sh start nodemanager
说明:可以配置环境变量就不用到hadoop去执行命令了
[root@node01 hadoop-3.2.0]# vi /etc/profile
## JDK环境变量
export JAVA_HOME=/opt/app/jdk1.8.0_201
## hadoop环境变量
export HADOOP_HOME=/opt/app/hadoop-3.2.0
## hadoop日志输出级别设置为debug
#export HADOOP_ROOT_LOGGER=DEBUG,console
## 依赖的包这两个
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib:$HADOOP_COMMON_LIB_NATIVE_DIR"
## path
export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
一键启动/关闭 start-all.sh / stop-all.sh
[root@node01 logs]# start-all.sh
Starting namenodes on [node01]
Starting datanodes
Starting secondary namenodes [node02]
2019-06-15 01:15:29,452 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your
platform... using builtin-java classes where applicableStarting resourcemanager
Starting nodemanagers
[root@node01 logs]# stop-all.sh
Stopping namenodes on [node01]
Stopping datanodes
Stopping secondary namenodes [node02]
2019-06-15 01:21:10,936 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your
platform... using builtin-java classes where applicableStopping nodemanagers
node03: WARNING: nodemanager did not stop gracefully after 5 seconds: Trying to kill with kill -9
node02: WARNING: nodemanager did not stop gracefully after 5 seconds: Trying to kill with kill -9
node04: WARNING: nodemanager did not stop gracefully after 5 seconds: Trying to kill with kill -9
Stopping resourcemanager
[root@node01 logs]#
问题&解决方案:
[root@node01 ~]# cd /opt/app/hadoop-3.2.0/lib/native
[root@node01 native]# ls
examples libhadooppipes.a libhadoop.so.1.0.0 libnativetask.a libnativetask.so.1.0.0
libhadoop.a libhadoop.so libhadooputils.a libnativetask.so
[root@node01 native]# ldd libhadoop.so.1.0.0
./libhadoop.so.1.0.0: /lib64/libc.so.6: version `GLIBC_2.14' not found (required by ./libhadoop.so
.1.0.0) linux-vdso.so.1 => (0x00007fff9bd8a000)
libdl.so.2 => /lib64/libdl.so.2 (0x00007f7f51dd7000)
libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f7f51bb9000)
libc.so.6 => /lib64/libc.so.6 (0x00007f7f51825000)
/lib64/ld-linux-x86-64.so.2 (0x00007f7f52208000)
[root@node01 native]# ldd --version
ldd (GNU libc) 2.12
Copyright (C) 2010 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
Written by Roland McGrath and Ulrich Drepper.
[root@node01 native]#
五、验证
可以通过jps命令查看启动进程;以及通过 ss -nal命令监控端口进行查看
[root@node01 hadoop-3.2.0]# jps
1426 NodeManager
1304 NameNode
1550 Jps
[root@node01 hadoop-3.2.0]# ss -nal
State Recv-Q Send-Q Local Address:Port Peer Address:Port
LISTEN 0 128 *:9870 *:*
LISTEN 0 128 *:59635 *:*
LISTEN 0 128 :::22 :::*
LISTEN 0 128 *:22 *:*
LISTEN 0 100 ::1:25 :::*
LISTEN 0 100 127.0.0.1:25 *:*
LISTEN 0 128 *:13562 *:*
LISTEN 0 128 192.168.121.101:9820 *:*
LISTEN 0 128 *:8040 *:*
LISTEN 0 128 *:8042 *:*
[root@node01 hadoop-3.2.0]#
web仪表盘查看: