centos7二进制安装Hadoop3

一、Hadoop 简介

1.1 Hadoop3核心组件

HDFS:分布式文件系统:解决海量数据存储

YARN:集群资源管理和任务调度框架:解决资源任务调度

MapReduce:分布式计算框架:解决海量数据计算

1.2 Hadoop集群简介 

Hadoop集群包括两个集群:HDFS  YARN
两个集群 逻辑上分离(互不影响、互不依赖)  物理上一起(部分进程在一台服务器上)
默认为主从架构
1.2.1 HDFS
主角色:NameNode(NN)
从角色:DataNode(DN)
主角色辅助角色:SecondrayNameNode(SNN)
1.2.2 YARN
主角色:ResourceManager(RM)
从角色:NodeManager(NM) 

二、环境信息及准备

2.1 机器及机器角色规划

 2.2 节点添加hosts解析

192.168.1.131 hdp01.dialev.com
192.168.1.132 hdp02.dialev.com
192.168.1.133 hdp03.dialev.com

2.3 关闭防火墙  

2.4 hdp01到三台机器免密

echo "StrictHostKeyChecking no" >~/.ssh/config
ssh-copy-id -i 192.168.1.13{1..3}

2.5 时间同步

yum -y install ntpdate 
ntpdate ntp.aliyun.com 
echo '*/5 * * * * ntpdate ntp.aliyun.com 2>&1'  >> /var/spool/cron/root

2.6 调大用户文件描述符

vim /etc/security/limits.conf
* soft nofile 65535
* hard nofile 65535
 
# 配置需要重启才能生效

2.7 安装Java环境

tar xf jdk-8u65-linux-x64.tar.gz -C /usr/local/
cd /usr/local/
ln -sv jdk1.8.0_65/  java
 
vim /etc/profile.d/java.sh
export JAVA_HOME=/usr/local/java
export CLASSPATH=$JAVA_HOME/lib/tools.jar
export PATH=$JAVA_HOME/bin:$PATH
 
source /etc/profile.d/java.sh
java -version

三、安装Hadoop

3.1 解压安装包

此篇文档及Hadoop相关文档相关软件包统一在此百度网盘:

链接:https://pan.baidu.com/s/11F4THdIfgrULMn2gNcObRA?pwd=cjll

# https://archive.apache.org/dist/hadoop/common/ 也可以根据实际部署版本下载
tar xf hadoop-3.1.4.tar.gz -C /usr/local/ cd /usr/local/ ln -sv hadoop-3.1.4 hadoop 目录结构: ├── bin Hadoop最基本的管理脚本和使用脚本的目录。 ├── etc Hadoop配置文件目录 ├── include 编程库头文件,用于C++程序访问HDFS或者编写MapReduce ├── lib Hadoop对外提供的编程动态和静态库 ├── libexec 各个服务对外用的shell配置文件目录,可用于日志输出,启动参数等基本信息 ├── sbin Hadoop管理脚本所在目录,主要保护HDFS和YARN中各类服务的启动/关闭脚本 └── share Hadoop各个模块编译后的jar包目录,官方自带示例

3.2 修改Hadoop环境配置变量

 参考:https://hadoop.apache.org/docs/r3.1.4/         #Configuration 章节,左侧最下方
cd etc/hadoop
cp hadoop-env.sh hadoop-env.sh-bak
vim hadoop-env.sh
export JAVA_HOME=/usr/local/java
export HDFS_NAMENODE_USER=root
export HDFS_DATANODE_USER=root
export HDFS_SECONDARYNAMENODE_USER=root
export YARN_RESOURCEMANAGER_USER=root
export YARN_NODEMANAGER_USER=root
export HADOOP_HOME=/usr/local/hadoop
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin

3.3 指定集群默认配置

cp core-site.xml  core-site.xml-bak
vim core-site.xml
<configuration>
    <!-- 指定HDFS老大(namenode)的通信地址 -->
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://hdp01.dialev.com:8020</value>
    </property>
    <!-- 指定hadoop运行时产生文件的存储路径 -->
    <property>
        <name>hadoop.tmp.dir</name>
        <value>/Hadoop/tmp</value>
    </property>
    <!-- 设置HDFS web UI访问用户 -->
    <property>
        <name>hadoop.http.staticuser.user</name>
        <value>root</value>
    </property>
</configuration>      

3.4 修改SNN配置  

cp hdfs-site.xml hdfs-site.xml-bak
vim hdfs-site.xml
<configuration>
    <!-- 设置namenode的http通讯地址 -->
    <property>
       <name>dfs.namenode.http-address</name>
       <value>hdp01.dialev.com:50070</value>
    </property>
    <!-- 设置secondarynamenode的http通讯地址 -->
    <property>
       <name>dfs.namenode.secondary.http-address</name>
       <value>hdp02.dialev.com:50090</value>
    </property>
     <!-- 设置namenode存放的路径 -->
    <property>
       <name>dfs.namenode.name.dir</name>
       <value>/Hadoop/name</value>
    </property>
    <!-- 设置hdfs副本数量 -->
    <property>
       <name>dfs.replication</name>
       <value>2</value>
    </property>
    <!-- 设置datanode存放的路径,如果不指定则使用hadoop.tmp.dir所指路径 -->
    <property>
        <name>dfs.datanode.data.dir</name>
        <value>/Hadoop/data</value>
        </property>
</configuration>

3.5 MapReduce配置

cp mapred-site.xml mapred-site.xml-bak
vim mapred-site.xml
<configuration>
    <!-- 通知框架MR使用YARN -->
    <property>
       <name>mapreduce.framework.name</name>
       <value>yarn</value>
    </property>
    <!-- MR App Mater 环境变量 -->
    <property>
      <name>yarn.app.mapreduce.am.env</name>
      <value>HADOOP_MAPRED_HOME=${HADOOP_HOME}</value>
    </property>
    
    <!-- MR Map Task 环境变量 -->
    <property>
      <name>mapreduce.map.env</name>
      <value>HADOOP_MAPRED_HOME=${HADOOP_HOME}</value>
    </property>
    
    <!-- MR Reduce Task 环境变量 -->
    <property>
      <name>mapreduce.reduce.env</name>
      <value>HADOOP_MAPRED_HOME=${HADOOP_HOME}</value>
    </property>
</configuration>

3.6 YARN配置

cp yarn-site.xml yarn-site.xml-bak
vim yarn-site.xml
<configuration>
<property>
    <name>yarn.resourcemanager.hostname</name>
    <value>hdp01.dialev.com</value>
</property>

<property>
    <name>yarn.nodemanager.aux-services</name>
    <value>mapreduce_shuffle</value>
</property>

<!-- 是否将对容器实施物理内存限制 -->
<property>
    <name>yarn.nodemanager.pmem-check-enabled</name>
    <value>false</value>
</property>

<!-- 是否将对容器实施虚拟内存限制。 -->
<property>
    <name>yarn.nodemanager.vmem-check-enabled</name>
    <value>false</value>
</property>

<!-- 开启日志聚集 -->
<property>
  <name>yarn.log-aggregation-enable</name>
  <value>true</value>
</property>

<!-- 保存的时间7天 -->
<property>
  <name>yarn.log-aggregation.retain-seconds</name>
  <value>604800</value>
</property>
</configuration>

3.7 配置从角色机器地址  

vim workers
hdp01.dialev.com
hdp02.dialev.com
hdp03.dialev.com

8.同步集群配置

scp -r -q hadoop-3.1.4 192.168.1.132:/usr/local/
scp -r -q hadoop-3.1.4 192.168.1.133:/usr/local/

# 2 3 两个节点创建软连接
ln -sv hadoop-3.1.4 hadoop

四、启动Hadoop

4.1 初始化名称节点

在hdp01.dialev.com上执行,仅此一次,误操作可以删除初始化目录 

hdfs namenode -format
......
2022-12-26 16:40:03,355 INFO util.GSet: 0.029999999329447746% max memory 940.5 MB = 288.9 KB
2022-12-26 16:40:03,355 INFO util.GSet: capacity      = 2^15 = 32768 entries
2022-12-26 16:40:03,406 INFO namenode.FSImage: Allocated new BlockPoolId: BP-631728325-192.168.1.131-1672044003397
2022-12-26 16:40:03,437 INFO common.Storage: Storage directory /Hadoop/name has been successfully formatted.         # /Hadoop/name初始化目录,这行信息表明对应的存储已经格式化成功。
2022-12-26 16:40:03,498 INFO namenode.FSImageFormatProtobuf: Saving image file /Hadoop/name/current/fsimage.ckpt_0000000000000000000 using no compression
2022-12-26 16:40:03,781 INFO namenode.FSImageFormatProtobuf: Image file /Hadoop/name/current/fsimage.ckpt_0000000000000000000 of size 391 bytes saved in 0 seconds .
2022-12-26 16:40:03,802 INFO namenode.NNStorageRetentionManager: Going to retain 1 images with txid >= 0
2022-12-26 16:40:03,820 INFO namenode.FSImage: FSImageSaver clean checkpoint: txid = 0 when meet shutdown.
2022-12-26 16:40:03,821 INFO namenode.NameNode: SHUTDOWN_MSG: 
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at hdp01.dialev.com/192.168.1.131
************************************************************/

4.2 启动服务

1.自带sh脚本
cd hadoop/sbin/
HDFS集群:
    start-dfs.sh
    stop-dfs.sh
YARN集群:
    start-yarn.sh
    stop-yarn.sh
Hadoop集群(HDFS+YARN):
	start-all.sh
	stop-all.sh
jps  #查看命令结果是否与集群规划一致
/usr/local/hadoop/logs 日志路径,默认安装目录下logs目录





2.手动操作(了解):
HDFS集群
hdfs --daemon start namenode | datanode |secondarynamenode
hdfs --daemon stop namenode | datanode | secondarynamenode
YARN集群
yarn --daemon start resourcemanager | nodemanager
yarn --daemon stop resourcemanager |nodemanager 

五、验证

5.1 访问相关web UI

1.打印集群状态
hdfs dfsadmin -report

2.访问YARN的管理界面
http://192.168.1.131:8088/cluster/nodes   #RM服务所在主机8088端口

3.访问namenode的管理页面
http://192.168.1.131:50070/dfshealth.html#tab-overview  #NN服务所在主机,配置项为  hdfs-site.xml配置文件中dfs.namenode.http-address的值  2.x端口默认为50070  3.x默认为9870 这里50070是因为我没注意新版本改变还是写的2.x配置

5.2 测试创建、上传功能  

hadoop fs -mkdir /bowen   #在Hadoop根目录下创建一个 bowen 目录
hadoop fs -put yarn-env.sh /bowen  #上传文件到bowen目录下

5.3 测试MapReduce执行

cd /usr/local/hadoop/share/hadoop/mapreduce
hadoop jar hadoop-mapreduce-examples-3.1.4.jar pi 2 4
Number of Maps  = 2
Samples per Map = 4
Wrote input for Map #0
Wrote input for Map #1
Starting Job
2022-12-27 09:20:12,868 INFO client.RMProxy: Connecting to ResourceManager at hdp01.dialev.com/192.168.1.131:8032  #首先会连接到RM上申请资源
2022-12-27 09:20:14,091 INFO mapreduce.JobResourceUploader: Disabling Erasure Coding for path: /tmp/hadoop-yarn/staging/root/.staging/job_1672045511416_0001
2022-12-27 09:20:14,503 INFO input.FileInputFormat: Total input files to process : 2
2022-12-27 09:20:14,707 INFO mapreduce.JobSubmitter: number of splits:2
2022-12-27 09:20:15,349 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1672045511416_0001
2022-12-27 09:20:15,351 INFO mapreduce.JobSubmitter: Executing with tokens: []
2022-12-27 09:20:16,072 INFO conf.Configuration: resource-types.xml not found
2022-12-27 09:20:16,073 INFO resource.ResourceUtils: Unable to find 'resource-types.xml'.
2022-12-27 09:20:16,974 INFO impl.YarnClientImpl: Submitted application application_1672045511416_0001
2022-12-27 09:20:17,204 INFO mapreduce.Job: The url to track the job: http://hdp01.dialev.com:8088/proxy/application_1672045511416_0001/
2022-12-27 09:20:17,206 INFO mapreduce.Job: Running job: job_1672045511416_0001
2022-12-27 09:20:33,618 INFO mapreduce.Job: Job job_1672045511416_0001 running in uber mode : false
2022-12-27 09:20:33,621 INFO mapreduce.Job:  map 0% reduce 0%    #MapReduce有两个阶段,分别是 map和reduce 
2022-12-27 09:20:47,862 INFO mapreduce.Job:  map 100% reduce 0%
2022-12-27 09:20:53,944 INFO mapreduce.Job:  map 100% reduce 100%
2022-12-27 09:20:53,968 INFO mapreduce.Job: Job job_1672045511416_0001 completed successfully
......

5.4 集群基准测试

 1.写测试:写入10个文件,每个10MB
hadoop jar hadoop-mapreduce-client-jobclient-3.1.4-tests.jar TestDFSIO -write -nrFiles 5 -fileSize 10MB
......
2022-12-27 09:51:12,775 INFO fs.TestDFSIO: ----- TestDFSIO ----- : write
2022-12-27 09:51:12,775 INFO fs.TestDFSIO:             Date & time: Tue Dec 27 09:51:12 CST 2022
2022-12-27 09:51:12,775 INFO fs.TestDFSIO:         Number of files: 5      #文件个数
2022-12-27 09:51:12,775 INFO fs.TestDFSIO:  Total MBytes processed: 50     #总大小
2022-12-27 09:51:12,775 INFO fs.TestDFSIO:       Throughput mb/sec: 12.49  #吞吐量
2022-12-27 09:51:12,775 INFO fs.TestDFSIO:  Average IO rate mb/sec: 15.46  #平均IO速率
2022-12-27 09:51:12,775 INFO fs.TestDFSIO:   IO rate std deviation: 7.51   #IO速率标准偏差
2022-12-27 09:51:12,776 INFO fs.TestDFSIO:      Test exec time sec: 32.95  #执行时间

2.读测试
hadoop jar hadoop-mapreduce-client-jobclient-3.1.4-tests.jar TestDFSIO -read -nrFiles 5 -fileSize 10MB
......
2022-12-27 09:54:23,826 INFO fs.TestDFSIO: ----- TestDFSIO ----- : read
2022-12-27 09:54:23,826 INFO fs.TestDFSIO:             Date & time: Tue Dec 27 09:54:23 CST 2022
2022-12-27 09:54:23,826 INFO fs.TestDFSIO:         Number of files: 5
2022-12-27 09:54:23,826 INFO fs.TestDFSIO:  Total MBytes processed: 50
2022-12-27 09:54:23,826 INFO fs.TestDFSIO:       Throughput mb/sec: 94.34
2022-12-27 09:54:23,827 INFO fs.TestDFSIO:  Average IO rate mb/sec: 101.26
2022-12-27 09:54:23,827 INFO fs.TestDFSIO:   IO rate std deviation: 30.07
2022-12-27 09:54:23,827 INFO fs.TestDFSIO:      Test exec time sec: 34.39

3.清理测试数据
hadoop jar hadoop-mapreduce-client-jobclient-3.1.4-tests.jar TestDFSIO -clean

  

 

posted @ 2023-02-17 14:52  百衲本  阅读(47)  评论(0编辑  收藏  举报
cnblogs_post_body { color: black; font: 0.875em/1.5em "微软雅黑" , "PTSans" , "Arial" ,sans-serif; font-size: 15px; } cnblogs_post_body h1 { text-align:center; background: #333366; border-radius: 6px 6px 6px 6px; box-shadow: 0 0 0 1px #5F5A4B, 1px 1px 6px 1px rgba(10, 10, 0, 0.5); color: #FFFFFF; font-family: "微软雅黑" , "宋体" , "黑体" ,Arial; font-size: 23px; font-weight: bold; height: 25px; line-height: 25px; margin: 18px 0 !important; padding: 8px 0 5px 5px; text-shadow: 2px 2px 3px #222222; } cnblogs_post_body h2 { text-align:center; background: #006699; border-radius: 6px 6px 6px 6px; box-shadow: 0 0 0 1px #5F5A4B, 1px 1px 6px 1px rgba(10, 10, 0, 0.5); color: #FFFFFF; font-family: "微软雅黑" , "宋体" , "黑体" ,Arial; font-size: 20px; font-weight: bold; height: 25px; line-height: 25px; margin: 18px 0 !important; padding: 8px 0 5px 5px; text-shadow: 2px 2px 3px #222222; } cnblogs_post_body h3 { background: #2B6695; border-radius: 6px 6px 6px 6px; box-shadow: 0 0 0 1px #5F5A4B, 1px 1px 6px 1px rgba(10, 10, 0, 0.5); color: #FFFFFF; font-family: "微软雅黑" , "宋体" , "黑体" ,Arial; font-size: 18px; font-weight: bold; height: 25px; line-height: 25px; margin: 18px 0 !important; padding: 8px 0 5px 5px; text-shadow: 2px 2px 3px #222222; } 回到顶部 博客侧边栏 回到顶部 页首代码 回到顶部 页脚代码