虚拟机搭建hadoop2.7.4
CentOS7.0安装配置hadoop2.7.4
资源准备
资源下载:
- hadoop官网下载链接 至于安装什么版本,看自己需求
- jdk官网下载链接
注意事项:
- 注意hadoop,jdk,centos都应该是64位或者32位的,以免出现无法预料的错误建议使用64位的
linux 虚拟机配置
系统配置:
- 虚拟机:一个master(Master.Hadoop),两个slave(Slave1.Hadoop, Slave2.Hadoop)
- 网络设置:我NAT的方式
- 内存:每个虚拟机配置1024M内存
- 分区:自动
- 软件选择:最小安装,注意选择开发工具
- 进行以下步骤前,确保3台虚拟机与主机之间可以相互ping通
-
yum search ifconfg
-
yum install net-tools.x86_64
- 1
- 2
- 1
- 2
- 1
- 2
完成其它两个虚拟机的安装:
- 两个slave的hostname可以改成Slave1.Hadoop, Slave2.Hadoop,方便区分
安装完后各个虚拟机的ip配置(参考用)
主机 | ip地址 |
---|---|
master.hadoop | 192.168.202.128 |
slave1.hadoop | 192.168.202.129 |
slave2.hadoop | 192.168.202.130 |
配置本地hosts
、、需把下图红框内容删除,否则启动有问题。
- 输入指令
-
vi /etc/hosts
-
-
// 将以下数据复制进入各个主机中
-
-
192.168.202.128 master.hadoop
-
192.168.202.129 slave1.hadoop
-
192.168.202.130 slave2.hadoop
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 1
- 2
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 使用以下指令对master主机中进行测试,可使用类似指令在slave主机测试
-
ping slave1.hadoop
-
ping slave2.hadoop
- 1
- 2
- 3
- 1
- 2
- 3
- 1
- 2
- 3
配置Master无密码登录所有Salve
以下在Master主机上配置
- 输入以下指令生成ssh,过程中遇到需要输入密码,只需执行回车Enter即可
-
ssh-keygen
-
-
// 会生成两个文件,放到默认的/root/.ssh/文件夹中
- 1
- 2
- 3
- 1
- 2
- 3
- 1
- 2
- 3
- 把id_rsa.pub追加到授权的key里面去
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
- 1
- 1
- 1
- 修改文件”authorized_keys”权限
chmod 600 ~/.ssh/authorized_keys
- 1
- 1
- 1
- 设置SSH配置
-
vi /etc/ssh/sshd_config
-
-
// 以下三项修改成以下配置
-
-
RSAAuthentication yes # 启用 RSA 认证
-
-
PubkeyAuthentication yes # 启用公钥私钥配对认证方式
-
-
AuthorizedKeysFile .ssh/authorized_keys # 公钥文件路径(和上面生成的文件同)
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 重启ssh服务
service sshd restart
- 1
- 1
- 1
- 把公钥复制所有的Slave机器上
-
// scp ~/.ssh/id_rsa.pub 远程用户名@远程服务器IP:~/
-
-
scp ~/.ssh/id_rsa.pub root@192.168.202.129:~/
-
scp ~/.ssh/id_rsa.pub root@192.168.202.130:~/
- 1
- 2
- 3
- 4
- 1
- 2
- 3
- 4
- 1
- 2
- 3
- 4
以下在Slave主机上配置
- 在slave主机上创建.ssh文件夹
-
mkdir ~/.ssh
-
-
// 修改权限
-
chmod 700 ~/.ssh
- 1
- 2
- 3
- 4
- 1
- 2
- 3
- 4
- 1
- 2
- 3
- 4
- 追加到授权文件”authorized_keys”
-
cat ~/id_rsa.pub >> ~/.ssh/authorized_keys
-
-
// 修改权限
-
chmod 600 ~/.ssh/authorized_keys
- 1
- 2
- 3
- 4
- 1
- 2
- 3
- 4
- 1
- 2
- 3
- 4
- 删除无用.pub文件
rm –r ~/id_rsa.pub
- 1
- 1
- 1
在master主机下进行测试
-
ssh 192.168.202.129
-
ssh 192.168.202.130
-
-
// 如果能够分别无密码登陆slave1, slave2主机,则成功配置
- 1
- 2
- 3
- 4
- 1
- 2
- 3
- 4
- 1
- 2
- 3
- 4
进行jdk, hadoop软件安装
jdk安装:
- 在/usr下创建java文件夹
- 使用第三方软件将jdk压缩包文件传到3台虚拟主机中
- 使用以下指令进行加压
-
tar zxvf jdk-8u45-linux-x64.tar.gz
-
-
// 解压后可以删除掉gz文件
-
rm jdk-8u45-linux-x64.tar.gz
- 1
- 2
- 3
- 4
- 1
- 2
- 3
- 4
- 1
- 2
- 3
- 4
- 配置jdk环境变量
vi /etc/profile
- 1
- 2
- 1
- 2
- 1
- 2
- 添加Java环境变量
-
-
// 将以下数据复制到文件底部
-
-
export JAVA_HOME=/usr/java/jdk1.8.0_45
-
-
export JRE_HOME=/usr/java/jdk1.8.0_45/jre
-
-
export CLASSPATH=.:$CLASSPATH:$JAVA_HOME/lib:$JRE_HOME/lib
-
-
export PATH=$PATH:$JAVA_HOME/bin:$JRE_HOME/bin
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 使配置生效
source /etc/profile
- 1
- 1
- 1
- 验证安装成功
-
java -version
-
-
// 如果出现以下信息,则配置成功
-
java version "1.8.0_45"
-
Java(TM) SE Runtime Environment (build 1.8.0_45-b14)
-
Java HotSpot(TM) 64-Bit Server VM (build 25.45-b02, mixed mode)
- 1
- 2
- 3
- 4
- 5
- 6
- 1
- 2
- 3
- 4
- 5
- 6
- 1
- 2
- 3
- 4
- 5
- 6
Hadoop安装
- 把下载的hadoop压缩文件传到/usr目录下
- 解压hadoop-2.7.4.tar.gz文件,并重命名
-
cd /usr
-
tar zxvf hadoop-2.7.4.tar.gz
-
mv hadoop-2.7.4 hadoop
-
-
// 删除hadoop-2.7.4.tar.gz文件
-
rm –rf hadoop-2.7.4.tar.gz
- 1
- 2
- 3
- 4
- 5
- 6
- 1
- 2
- 3
- 4
- 5
- 6
- 1
- 2
- 3
- 4
- 5
- 6
- 在”/usr/hadoop”下面创建tmp文件夹
-
cd /usr/hadoop
-
mkdir tmp
- 1
- 2
- 1
- 2
- 1
- 2
- 把Hadoop的安装路径添加到”/etc/profile”中
-
vi /etc/profile
-
-
// 将以下数据加入到文件末尾
-
-
export HADOOP_INSTALL=/usr/hadoop
-
export PATH=${HADOOP_INSTALL}/bin:${HADOOP_INSTALL}/sbin:${PATH}
-
export HADOOP_MAPRED_HOME=${HADOOP_INSTALL}
-
export HADOOP_COMMON_HOME=${HADOOP_INSTALL}
-
export HADOOP_HDFS_HOME=${HADOOP_INSTALL}
-
export YARN_HOME=${HADOOP_INSTALLL}
-
export HADOOP_COMMON_LIB_NATIVE_DIR=${HADOOP_INSTALL}/lib/natvie
-
export HADOOP_OPTS="-Djava.library.path=${HADOOP_INSTALL}/lib:${HADOOP_INSTALL}/lib/native"
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 重启”/etc/profile”
source /etc/profile
- 1
- 1
- 1
配置hadoop(先只在Master主机配置,配置完后传入两个Slave主机)
- 设置hadoop-env.sh和yarn-env.sh中的java环境变量
-
cd /usr/hadoop/etc/hadoop/
-
vi hadoop-env.sh
-
-
// 修改JAVA_HOME
-
export JAVA_HOME=/usr/java/jdk1.8.0_45
- 1
- 2
- 3
- 4
- 5
- 1
- 2
- 3
- 4
- 5
- 1
- 2
- 3
- 4
- 5
- 配置core-site.xml文件
-
vi core-site.xml
-
-
// 修改文件内容为以下
-
<configuration>
-
-
<property>
-
-
<name>hadoop.tmp.dir</name>
-
-
<value>/usr/hadoop/tmp</value>
-
-
<description>A base for other temporary directories.</description>
-
-
</property>
-
-
<property>
-
-
<name>fs.default.name</name>
-
-
<value>hdfs://master.hadoop:9000</value>
-
-
</property>
-
</configuration>
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
- 22
- 23
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
- 22
- 23
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
- 22
- 23
- 配置hdfs-site.xml文件
-
vi hdfs-site.xml
-
-
-
// 修改文件内容为以下
-
-
<configuration>
-
<property>
-
<name>dfs.namenode.name.dir</name>
-
<value>file:///usr/hadoop/dfs/name</value>
-
</property>
-
<property>
-
<name>dfs.datanode.data.dir</name>
-
<value>file:///usr/hadoop/dfs/data</value>
-
</property>
-
<property>
-
<name>dfs.replication</name>
-
<value>1</value>
-
</property>
-
-
<property>
-
<name>dfs.nameservices</name>
-
<value>hadoop-cluster1</value>
-
</property>
-
<property>
-
<name>dfs.namenode.secondary.http-address</name>
-
<value>master.hadoop:50090</value>
-
</property>
-
<property>
-
<name>dfs.webhdfs.enabled</name>
-
<value>true</value>
-
</property>
-
</configuration>
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
- 22
- 23
- 24
- 25
- 26
- 27
- 28
- 29
- 30
- 31
- 32
- 33
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
- 22
- 23
- 24
- 25
- 26
- 27
- 28
- 29
- 30
- 31
- 32
- 33
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
- 22
- 23
- 24
- 25
- 26
- 27
- 28
- 29
- 30
- 31
- 32
- 33
- 配置mapred-site.xml文件,如没有此文件需要手动创建,文件头尾加上configuration标签,把下面内容复制到中间即可
-
vi mapred-site.xml
-
-
// 修改文件为以下
-
-
<property>
-
<name>mapreduce.framework.name</name>
-
<value>yarn</value>
-
<final>true</final>
-
</property>
-
-
<property>
-
<name>mapreduce.jobtracker.http.address</name>
-
<value>master.hadoop:50030</value>
-
</property>
-
<property>
-
<name>mapreduce.jobhistory.address</name>
-
<value>master.hadoop:10020</value>
-
</property>
-
<property>
-
<name>mapreduce.jobhistory.webapp.address</name>
-
<value>master.hadoop:19888</value>
-
</property>
-
<property>
-
<name>mapred.job.tracker</name>
-
<value>http://master.hadoop:9001</value>
-
</property>
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
- 22
- 23
- 24
- 25
- 26
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
- 22
- 23
- 24
- 25
- 26
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
- 22
- 23
- 24
- 25
- 26
- 配置yarn-site.xml文件
-
vi yarn-site.xml
-
-
// 修改文件内容为以下
-
-
<property>
-
<name>yarn.resourcemanager.hostname</name>
-
<value>master.hadoop</value>
-
</property>
-
-
<property>
-
<name>yarn.nodemanager.aux-services</name>
-
<value>mapreduce_shuffle</value>
-
</property>
-
<property>
-
<name>yarn.resourcemanager.address</name>
-
<value>master.hadoop:8032</value>
-
</property>
-
<property>
-
<name>yarn.resourcemanager.scheduler.address</name>
-
<value>master.hadoop:8030</value>
-
</property>
-
<property>
-
<name>yarn.resourcemanager.resource-tracker.address</name>
-
<value>master.hadoop:8031</value>
-
</property>
-
<property>
-
<name>yarn.resourcemanager.admin.address</name>
-
<value>master.hadoop:8033</value>
-
</property>
-
<property>
-
<name>yarn.resourcemanager.webapp.address</name>
-
<value>master.hadoop:8088</value>
-
</property>
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
- 22
- 23
- 24
- 25
- 26
- 27
- 28
- 29
- 30
- 31
- 32
- 33
- 34
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
- 22
- 23
- 24
- 25
- 26
- 27
- 28
- 29
- 30
- 31
- 32
- 33
- 34
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
- 22
- 23
- 24
- 25
- 26
- 27
- 28
- 29
- 30
- 31
- 32
- 33
- 34
配置Hadoop的集群
- 将Master中配置好的hadoop传入两个Slave中
-
scp -r /usr/hadoop root@192.168.202.129:/usr/
-
scp -r /usr/hadoop root@192.168.202.130:/usr/
- 1
- 2
- 1
- 2
- 1
- 2
- 修改Master主机上的slaves文件
-
cd /usr/hadoop/etc/hadoop
-
vi slaves
-
-
// 将文件内容修改为
-
slave1.hadoop
-
slave2.hadoop
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 格式化HDFS文件系统
- 注意:只需初始化一次便可。若执行多次,可能会导致namenode与datanode的clusterID不一致,从而导致无法启动
-
// 在Master主机上输入以下指令
-
hadoop namenode -format
- 1
- 2
- 1
- 2
- 1
- 2
- 启动hadoop
-
-
-
// 关闭机器防火墙 根据自己的版本不同,命令有所不同
-
CentOS 7
-
关闭firewall:
-
systemctl stop/start firewalld.service 停止/启动firewall
-
systemctl disable firewalld.service 禁止firewall开机启动
-
firewall-cmd --state 查看默认防火墙状态
-
-
cd /usr/hadoop/sbin
-
./start-all.sh
-
-
// 更推荐的运行方式:
-
cd /usr/hadoop/sbin
-
./start-dfs.sh
-
./start-yarn.sh
-
-
应该输出以下信息:
-
-
Starting namenodes on [Master.Hadoop]
-
Master.Hadoop: starting namenode, logging to /usr/hadoop/logs/hadoop-root-namenode-localhost.localdomain.out
-
Slave2.Hadoop: starting datanode, logging to /usr/hadoop/logs/hadoop-root-datanode-Slave2.Hadoop.out
-
Slave1.Hadoop: starting datanode, logging to /usr/hadoop/logs/hadoop-root-datanode-Slave1.Hadoop.out
-
-
starting yarn daemons
-
starting resourcemanager, logging to /usr/hadoop/logs/yarn-root-resourcemanager-localhost.localdomain.out
-
Slave1.Hadoop: starting nodemanager, logging to /usr/hadoop/logs/yarn-root-nodemanager-Slave1.Hadoop.out
-
Slave2.Hadoop: starting nodemanager, logging to /usr/hadoop/logs/yarn-root-nodemanager-Slave2.Hadoop.out
-
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
- 22
- 23
- 24
- 25
- 26
- 27
- 28
- 29
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
- 22
- 23
- 24
- 25
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
- 22
- 23
- 24
- 25
- 验证hadoop
-
// 1. 直接在Master或Slave输入指令:
-
jps
-
-
// 应该输出以下信息(端口号仅供参考)
-
Master:
-
3930 ResourceManager
-
4506 Jps
-
3693 NameNode
-
-
Slave:
-
2792 NodeManager
-
2920 Jps
-
2701 DataNode
-
-
// 2. 输入以下指令
-
hadoop dfsadmin -report
-
-
// 应该输出以下信息:
-
Configured Capacity: 14382268416 (13.39 GB)
-
Present Capacity: 10538565632 (9.81 GB)
-
DFS Remaining: 10538557440 (9.81 GB)
-
DFS Used: 8192 (8 KB)
-
DFS Used%: 0.00%
-
Under replicated blocks: 0
-
Blocks with corrupt replicas: 0
-
Missing blocks: 0
-
Missing blocks (with replication factor 1): 0
-
-
-------------------------------------------------
-
Live datanodes (2):
-
-
Name: 192.168.1.124:50010 (Slave2.Hadoop)
-
Hostname: Slave2.Hadoop
-
Decommission Status : Normal
-
Configured Capacity: 7191134208 (6.70 GB)
-
DFS Used: 4096 (4 KB)
-
Non DFS Used: 1921933312 (1.79 GB)
-
DFS Remaining: 5269196800 (4.91 GB)
-
DFS Used%: 0.00%
-
DFS Remaining%: 73.27%
-
Configured Cache Capacity: 0 (0 B)
-
Cache Used: 0 (0 B)
-
Cache Remaining: 0 (0 B)
-
Cache Used%: 100.00%
-
Cache Remaining%: 0.00%
-
Xceivers: 1
-
Last contact: Thu Jul 02 10:45:04 CST 2015
-
-
-
Name: 192.168.1.125:50010 (Slave1.Hadoop)
-
Hostname: Slave1.Hadoop
-
Decommission Status : Normal
-
Configured Capacity: 7191134208 (6.70 GB)
-
DFS Used: 4096 (4 KB)
-
Non DFS Used: 1921769472 (1.79 GB)
-
DFS Remaining: 5269360640 (4.91 GB)
-
DFS Used%: 0.00%
-
DFS Remaining%: 73.28%
-
Configured Cache Capacity: 0 (0 B)
-
Cache Used: 0 (0 B)
-
Cache Remaining: 0 (0 B)
-
Cache Used%: 100.00%
-
Cache Remaining%: 0.00%
-
Xceivers: 1
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
- 22
- 23
- 24
- 25
- 26
- 27
- 28
- 29
- 30
- 31
- 32
- 33
- 34
- 35
- 36
- 37
- 38
- 39
- 40
- 41
- 42
- 43
- 44
- 45
- 46
- 47
- 48
- 49
- 50
- 51
- 52
- 53
- 54
- 55
- 56
- 57
- 58
- 59
- 60
- 61
- 62
- 63
- 64
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
- 22
- 23
- 24
- 25
- 26
- 27
- 28
- 29
- 30
- 31
- 32
- 33
- 34
- 35
- 36
- 37
- 38
- 39
- 40
- 41
- 42
- 43
- 44
- 45
- 46
- 47
- 48
- 49
- 50
- 51
- 52
- 53
- 54
- 55
- 56
- 57
- 58
- 59
- 60
- 61
- 62
- 63
- 64
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
- 22
- 23
- 24
- 25
- 26
- 27
- 28
- 29
- 30
- 31
- 32
- 33
- 34
- 35
- 36
- 37
- 38
- 39
- 40
- 41
- 42
- 43
- 44
- 45
- 46
- 47
- 48
- 49
- 50
- 51
- 52
- 53
- 54
- 55
- 56
- 57
- 58
- 59
- 60
- 61
- 62
- 63
- 64
- 访问网页
-
// CentOS7中iptables这个服务的配置文件没有了,采用了新的firewalld
-
// 输入以下指令后,可以在真机浏览器上访问hadoop网页
-
systemctl stop firewalld
-
-
// 输入以下网页,进入hadoop管理首页(IP地址为master 主机IP)
-
http://192.168.202.128:50070/dfshealth.html#tab-overview
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 1
- 2
- 3
- 4
- 5
- 6
- 7