基于Docker容器搭建hadoop完全分布式集群环境
简介
- 物理机:windows10
- 宿主机:Centos7虚拟机,需要安装Docker服务
- hadoop集群节点:3个centos7的容器,hadoop1、hadoop2、hadoop3
- 组件:
- 容器镜像:Centos7
- Docker CE 24.0.7
- JDK1.8.0_181
- Hadoop3.1.3
1.新建虚拟机
安装CentOS7
2.安装Docker
(1)安装docker服务
yum -y install docker-ce
(2)开启docker服务
systemctl start docker
systemctl status docker
# 查看服务状态
docker version
# 查看版本
点击查看版本信息
Client: Docker Engine - Community
Version: 24.0.7
...
Server: Docker Engine - Community
Engine:
Version: 24.0.7
...
3. 制作镜像
(1)拉取镜像
docker pull centos:7
docker images
#查看镜像
REPOSITORY TAG IMAGE ID CREATED SIZE
centos 7 eeb6ee3f44bd 2 years ago 204MB
(2) 制作镜像
a. 制作Dockerfile文件
vi Dockerfile
Dockerfile
FROM centos:7
MAINTAINER zyz
RUN cd /etc/yum.repos.d/
RUN sed -i 's/mirrorlist/#mirrorlist/g' /etc/yum.repos.d/CentOS-*
RUN sed -i 's|#baseurl=http://mirror.centos.org|baseurl=http://mirrors.aliyun.com|g' /etc/yum.repos.d/CentOS-*
RUN yum makecache
RUN yum update -y
RUN yum install -y openssh-server sudo
RUN sed -i 's/UsePAM yes/UsePAM no/g' /etc/ssh/sshd_config
RUN yum install -y openssh-clients
RUN echo "root:root" | chpasswd
RUN echo "root ALL=(ALL) ALL" >> /etc/sudoers
RUN ssh-keygen -t dsa -f /etc/ssh/ssh_host_dsa_key
RUN ssh-keygen -t rsa -f /etc/ssh/ssh_host_rsa_key
RUN mkdir /var/run/sshd
EXPOSE 22
CMD ["/usr/sbin/sshd", "-D"]
# FROM:基于什么镜像来制作自己的镜像
# MAINTAINER:表示该镜像的作者(维护者)
# 第2段:配置yum,包括修改镜像源、提速、更新
# 第3段:安装ssh服务和ssh客户端
# 第4段:生成ssh密钥
# 第5段:开启ssh服务,暴露SSH的默认端口22
b.生成镜像
docker build -t centos7-ssh .
# "."表示当前目录,即Dockerfile所在的位置
docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
centos7-ssh latest d39095d60198 17 hours ago 1.42GB
4. 创建容器
(1)创建网桥
docker network create hadoop
docker network ls
# 查看网桥
NETWORK ID NAME DRIVER SCOPE
371545b29a8d hadoop bridge local
(2)创建容器
docker run -itd --network hadoop --name hadoop1 -p 50070:50070 -p 8088:8088 centos7-ssh
docker run -itd --network hadoop --name hadoop2 centos7-ssh
docker run -itd --network hadoop --name hadoop3 centos7-ssh
# i:立即运行,t:终端,d:后台运行
# --network:容器要加入的网桥,--name:指定容器名称,-p:表示映射端口,centos7-ssh表示创建容器的镜像
(3)查看容器
docker ps
#查看正在运行的容器
点击查看容器信息
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
71c3b5fa9846 centos7-ssh "/usr/sbin/sshd -D" 12 hours ago Up 10 hours 22/tcp hadoop3
a16e70f1373e centos7-ssh "/usr/sbin/sshd -D" 17 hours ago Up 10 hours 22/tcp hadoop2
bac46cc68c73 centos7-ssh "/usr/sbin/sshd -D" 17 hours ago Up 10 hours 0.0.0.0:8088->8088/tcp, :::8088->8088/tcp, 22/tcp, 0.0.0.0:50070->50070/tcp, :::50070->50070/tcp hadoop1
(4)查看网桥
docker network inspect hadoop
点击查看网桥信息
"Containers": {
"71c3b5fa98463d995affb206496c04ee6f2fdaedda15240dc490f79f8cad23f9": {
"Name": "hadoop3",
"EndpointID": "2710a60e5ef5ea4e590ed2faff7a9db5eca2e5ea960867f05cc818a665a3c4bf",
"MacAddress": "02:42:ac:13:00:04",
"IPv4Address": "172.19.0.4/16",
"IPv6Address": ""
},
"a16e70f1373ef80a53bee0fa0af01b861137ada82787bc909d708fe8774a6651": {
"Name": "hadoop2",
"EndpointID": "eee50c5aae298154e0811b6dfdc60fc46e9bf2d4a097b67b73582150748fd890",
"MacAddress": "02:42:ac:13:00:03",
"IPv4Address": "172.19.0.3/16",
"IPv6Address": ""
},
"bac46cc68c739292fa36b4a1b92ed9f0347f8d622a7129922b6ab7d009b618f9": {
"Name": "hadoop1",
"EndpointID": "209ace260c8f2161e0aa805f3b034c66b7892fb8fa9a2a1f7d903dc9100aff3f",
"MacAddress": "02:42:ac:13:00:02",
"IPv4Address": "172.19.0.2/16",
"IPv6Address": ""
}
}
5.给容器安装软件
5.1 登录容器
开3个终端,分别进入相应的容器。
docker exec -it hadoop1 bash
docker exec -it hadoop2 bash
docker exec -it hadoop3 bash
5.2 SSH免密
- hadoop1免密
(1)生成密钥对
ssh-keygen
#一路回车
(2)复制密钥
ssh-copy-id hadoop1
ssh-copy-id hadoop2
ssh-copy-id hadoop3
(3)测试
ssh hadoop1
#是否需要密码,exit退出
ssh hadoop2
#是否需要密码,exit退出
ssh hadoop3
#是否需要密码,exit退出 - hadoop2免密
略 - hadoop3免密
略
5.3 安装JDK
(1)下载
由宿主机复制到容器
docker cp /package/jdk-8u181-linux-x64.tar.gz hadoop1:/package/
# package目录需要事先创建
(2)安装
tar -zxvf /package/jdk-8u181-linux-x64.tar.gz -C /software/
# software目录需要事先创建
(3)配置
vi /etc/bashrc
export JAVA_HOME=/software/jdk1.8.0_181
export PATH=$PATH:$JAVA_HOME/bin
source /etc/bashrc
# 立即生效
(4)测试
java -version
java version "1.8.0_181"
Java(TM) SE Runtime Environment (build 1.8.0_181-b13)
Java HotSpot(TM) 64-Bit Server VM (build 25.181-b13, mixed mode)
(5)复制到其它节点
a.复制到hadoop2
scp -r /software/jdk1.8.0_181/ hadoop2:/software/
scp /etc/bashrc hadoop2:/etc
hadoop2上执行:source /etc/bashrc
b.复制到hadoop3
scp -r /software/jdk1.8.0_181/ hadoop3:/software/
scp /etc/bashrc hadoop3:/etc
hadoop3上执行:source /etc/bashrc
5.4 安装hadoop
(1)下载
由宿主机复制到容器
docker cp /package/hadoop-3.1.3.tar.gz hadoop1:/package/
(2)安装
tar -zxvf /package/hadoop-3.1.3.tar.gz -C /software/
(3)环境配置
vi /etc/bashrc
export JAVA_HOME=/software/jdk1.8.0_181
export HADOOP_HOME=/software/hadoop-3.1.3 # 新增
export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin #修改
# 指定root用户访问
export HDFS_NAMENODE_USER=root
export HDFS_DATANODE_USER=root
export HDFS_SECONDARYNAMENODE_USER=root
export YARN_RESOURCEMANAGER_USER=root
export YARN_NODEMANAGER_USER=root
source /etc/bashrc
# 立即生效
(4)测试
hadoop version
Hadoop 3.1.3
...
(5)hadoop配置
- hadoop-env.sh
vi $HADOOP_HOME/etc/hadoop/hadoop-env.sh
export JAVA_HOME=/software/jdk1.8.0_181
- core-site.xml
vi $HADOOP_HOME/etc/hadoop/core-site.xml
core-site.xml
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://hadoop1:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/software/hadoop-3.1.3/data</value>
</property>
</configuration>
<!-- data目录可以不创建,格式化时会系统可自动创建 -->
- mapred-site.xml
vi $HADOOP_HOME/etc/hadoop/mapred-site.xml
mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<!-- "Could not find or load main class org.apache.hadoop.mapreduce.v2.app.MRAppMaster"的解决方法 -->
<property>
<name>yarn.app.mapreduce.am.env</name>
<value>HADOOP_MAPRED_HOME=${HADOOP_HOME}</value>
</property>
<property>
<name>mapreduce.map.env</name>
<value>HADOOP_MAPRED_HOME=${HADOOP_HOME}</value>
</property>
<property>
<name>mapreduce.reduce.env</name>
<value>HADOOP_MAPRED_HOME=${HADOOP_HOME}</value>
</property>
<!--
报错:"is running 221518336B beyond the 'VIRTUAL' memory limit. Current usage: 74.0 MB of 1 GB physical memory used; 2.3 GB of 2.1 GB virtual memory used. Killing container."
此问题原因是container在申请多余的内存时,被resouremanager杀掉了,
解决方法:-->
<property>
<name>mapreduce.map.memory.mb</name>
<value>1536</value>
</property>
<property>
<name>mapreduce.map.java.opts</name>
<value>-Xmx1024M</value>
</property>
<property>
<name>mapreduce.reduce.memory.mb</name>
<value>3072</value>
</property>
<property>
<name>mapreduce.reduce.java.opts</name>
<value>-Xmx2560M</value>
</property>
</configuration>
- hdfs-site.xml
vi $HADOOP_HOME/etc/hadoop/hdfs-site.xml
hdfs-site.xml
<configuration>
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
<property>
<name>dfs.namenode.http-address</name>
<value>hadoop1:50070</value>
</property>
</configuration>
- yarn-site.xml
vi $HADOOP_HOME/etc/hadoop/yarn-site.xml
yarn-site.xml
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>hadoop1</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>hadoop1:8088</value>
</property>
# Error: Could not find or load main class org.apache.hadoop.mapred.YarnChild
<property>
<name>yarn.application.classpath</name>
<value>
${HADOOP_HOME}/etc/hadoop,
${HADOOP_HOME}/share/hadoop/common/*,
${HADOOP_HOME}/share/hadoop/common/lib/*,
${HADOOP_HOME}/share/hadoop/hdfs/*,
${HADOOP_HOME}/share/hadoop/hdfs/lib/*,
${HADOOP_HOME}/share/hadoop/mapreduce/*,
${HADOOP_HOME}/share/hadoop/mapreduce/lib/*,
${HADOOP_HOME}/share/hadoop/yarn/*,
${HADOOP_HOME}/share/hadoop/yarn/lib/*
</value>
</property>
</configuration>
- workers
vi $HADOOP_HOME/etc/hadoop/workers
hadoop1
hadoop2
hadoop3
(6)复制hadoop到其它节点
scp -r /software/hadoop-3.1.3/ hadoop2:/software/
scp -r /software/hadoop-3.1.3/ hadoop3:/software/
(7)格式化
hdfs namenode -format
(8)启动
start-all.sh
jps
点击查看进程
# hadoop1
3488 NodeManager
2881 DataNode
5425 Jps
3369 ResourceManager
2763 NameNode
3051 SecondaryNameNode
# hadoop2
630 DataNode
1064 Jps
732 NodeManager
# hadoop3
993 Jps
599 DataNode
701 NodeManager
hdfs webUI访问:http://10.10.0.100:50070
yarn webUI访问:http://10.10.0.100:8088
(9)wordcount测试
a. 准备数据
vi 1.txt
hello michael
hello julia
vi 2.txt
hello michael is julia father
b.上传至hdfs
hdfs dfs -mkdir -p /wordcount/input
hdfs dfs -put *.txt /wordcount/input/
hdfs dfs -ls -R /wordcount
drwxr-xr-x - root supergroup 0 2023-12-09 14:25 /wordcount/input
-rw-r--r-- 2 root supergroup 26 2023-12-09 14:25 /wordcount/input/words1.txt
-rw-r--r-- 2 root supergroup 30 2023-12-09 14:25 /wordcount/input/words2.txt
c.运行单词统计程序
hadoop jar $HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.1.3.jar wordcount /wordcount/input /wordcount/output
hadoop fs -ls -R /wordcount/output/
-rw-r--r-- 2 root supergroup 0 2023-12-09 16:11 /wordcount/output/_SUCCESS
-rw-r--r-- 2 root supergroup 40 2023-12-09 16:11 /wordcount/output/part-r-00000
hadoop fs -cat /wordcount/output/part-r-00000
2023-12-09 16:14:54,373 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
father 1
hello 3
is 1
julia 2
michael 2
(10)完成