Hadoop运行环境搭建

              Hadoop运行环境搭建

                                      作者:尹正杰

版权声明:原创作品,谢绝转载!否则将追究法律责任。

 

 

 

一.安装JDK

  博主推荐阅读:
    https://www.cnblogs.com/yinzhengjie/p/12199413.html

 

二.安装Hadoop

1>.Apache Hadoop官方网站,点击"Download"

  博主推荐阅读:
    http://hadoop.apache.org/
    https://hadoop.apache.org/docs/r2.10.0/index.html
    http://hadoop.apache.org/docs/r2.10.0/hadoop-project-dist/hadoop-common/release/2.10.0/CHANGES.2.10.0.html
    https://hadoop.apache.org/docs/r3.1.3/index.html    
    https://hadoop.apache.org/docs/r3.1.3/hadoop-project-dist/hadoop-common/release/3.1.3/CHANGES.3.1.3.html

2>.选择要下载的Hadoop版本

  Apache Hadoop发行版本下载页面:
    https://hadoop.apache.org/releases.html

3>.下载Apache Hadoop软件

  下载地址:
    https://mirror.bit.edu.cn/apache/hadoop/common/hadoop-2.10.0/hadoop-2.10.0.tar.gz
    https://mirrors.tuna.tsinghua.edu.cn/apache/hadoop/common/hadoop-2.10.0/hadoop-2.10.0.tar.gz
    https://downloads.apache.org/hadoop/common/hadoop-2.10.0/hadoop-2.10.0.tar.gz
[root@hadoop101.yinzhengjie.org.cn ~]# wget https://mirror.bit.edu.cn/apache/hadoop/common/hadoop-2.10.0/hadoop-2.10.0.tar.gz
--2020-03-10 18:24:27--  https://mirror.bit.edu.cn/apache/hadoop/common/hadoop-2.10.0/hadoop-2.10.0.tar.gz
Resolving mirror.bit.edu.cn (mirror.bit.edu.cn)... 114.247.56.117, 2001:da8:204:1205::22
Connecting to mirror.bit.edu.cn (mirror.bit.edu.cn)|114.247.56.117|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 392115733 (374M) [application/octet-stream]
Saving to: ‘hadoop-2.10.0.tar.gz’

100%[========================================================================================>] 392,115,733 11.3MB/s   in 34s    

2020-03-10 18:25:01 (11.0 MB/s) - ‘hadoop-2.10.0.tar.gz’ saved [392115733/392115733]

[root@hadoop101.yinzhengjie.org.cn ~]# 
[root@hadoop101.yinzhengjie.org.cn ~]# wget https://mirror.bit.edu.cn/apache/hadoop/common/hadoop-2.10.0/hadoop-2.10.0.tar.gz

4>.解压安装文件到指定目录

[root@hadoop101.yinzhengjie.org.cn ~]# tar -zxf hadoop-2.10.0.tar.gz -C /yinzhengjie/softwares/
[root@hadoop101.yinzhengjie.org.cn ~]# 
[root@hadoop101.yinzhengjie.org.cn ~]# ll /yinzhengjie/softwares/hadoop-2.10.0/
total 128
drwxr-xr-x 2 12334 systemd-journal    194 Oct 23 03:23 bin          #是Hadoop最基本的管理脚本和使用脚本所在的目录,这些脚本是sbin目录下管理脚本的基础实现,用户可以直接使用这些脚本管理和使用Hadoop
drwxr-xr-x 3 12334 systemd-journal     20 Oct 23 03:23 etc          #存放Hadoop的配置文件目录
drwxr-xr-x 2 12334 systemd-journal    106 Oct 23 03:23 include        #对外提供的编程库头文件(具体的动态库和静态库在lib目录中),这些文件都是用C++定义的,通常用于C++程序访问HDFS或者编写MapReduce程序。
drwxr-xr-x 3 12334 systemd-journal     20 Oct 23 03:23 lib           #包含了Hadoop对外提供的编程动态库和静态库,与include目录中的头文件结合使用。
drwxr-xr-x 2 12334 systemd-journal    239 Oct 23 03:23 libexec        #各个服务对应的shell配置文件所在的目录,可用于配置日志输出目录、启动参数(比如JVM参数)等基本信息。
-rw-r--r-- 1 12334 systemd-journal 106210 Oct 23 03:23 LICENSE.txt
-rw-r--r-- 1 12334 systemd-journal  15841 Oct 23 03:23 NOTICE.txt
-rw-r--r-- 1 12334 systemd-journal   1366 Oct 23 03:23 README.txt
drwxr-xr-x 3 12334 systemd-journal   4096 Oct 23 03:23 sbin          #存放启动火停止Hadoop相关服务的脚本
drwxr-xr-x 4 12334 systemd-journal     31 Oct 23 03:23 share          #存放Hadoop的依赖jar包,文档,和官方案例
[root@hadoop101.yinzhengjie.org.cn ~]# 

5>.将Hadoop添加到环境变量

[root@hadoop101.yinzhengjie.org.cn ~]# cat /etc/profile.d/hadoop.sh 
#Add ${HADOOP_HOME} by yinzhengjie
HADOOP_HOME=/yinzhengjie/softwares/hadoop-2.10.0
PATH=$PATH:${HADOOP_HOME}/bin:${HADOOP_HOME}/sbin
[root@hadoop101.yinzhengjie.org.cn ~]# 
[root@hadoop101.yinzhengjie.org.cn ~]# 
[root@hadoop101.yinzhengjie.org.cn ~]# source /etc/profile.d/hadoop.sh 
[root@hadoop101.yinzhengjie.org.cn ~]# 
[root@hadoop101.yinzhengjie.org.cn ~]# echo $HADOOP_HOME
/yinzhengjie/softwares/hadoop-2.10.0
[root@hadoop101.yinzhengjie.org.cn ~]# 
[root@hadoop101.yinzhengjie.org.cn ~]# vim /yinzhengjie/softwares/hadoop-2.10.0/etc/hadoop/hadoop-env.sh
[root@hadoop101.yinzhengjie.org.cn ~]# 
[root@hadoop101.yinzhengjie.org.cn ~]# grep ^export /yinzhengjie/softwares/hadoop-2.10.0/etc/hadoop/hadoop-env.sh | grep JAVA_HOME
export JAVA_HOME=/yinzhengjie/softwares/jdk1.8.0_201
[root@hadoop101.yinzhengjie.org.cn ~]# 
[root@hadoop101.yinzhengjie.org.cn ~]# hadoop version
Hadoop 2.10.0
Subversion ssh://git.corp.linkedin.com:29418/hadoop/hadoop.git -r e2f1f118e465e787d8567dfa6e2f3b72a0eb9194
Compiled by jhung on 2019-10-22T19:10Z
Compiled with protoc 2.5.0
From source with checksum 7b2d8877c5ce8c9a2cca5c7e81aa4026
This command was run using /yinzhengjie/softwares/hadoop-2.10.0/share/hadoop/common/hadoop-common-2.10.0.jar
[root@hadoop101.yinzhengjie.org.cn ~]# 

6>.创建符号连接(目的是让Hadoop的多版本的运行方式共存)

[root@hadoop101.yinzhengjie.org.cn /yinzhengjie/softwares]# ll
total 0
drwxr-xr-x 9 12334 systemd-journal 149 Oct 23 03:23 hadoop-2.10.0
drwxr-xr-x 7    10             143 245 Dec 16  2018 jdk1.8.0_201
[root@hadoop101.yinzhengjie.org.cn /yinzhengjie/softwares]# 
[root@hadoop101.yinzhengjie.org.cn /yinzhengjie/softwares]# cp -r hadoop-2.10.0 local-mode          #同于本地模式
[root@hadoop101.yinzhengjie.org.cn /yinzhengjie/softwares]# 
[root@hadoop101.yinzhengjie.org.cn /yinzhengjie/softwares]# cp -r hadoop-2.10.0 pseudo-mode          #用于伪分布式模式
[root@hadoop101.yinzhengjie.org.cn /yinzhengjie/softwares]# 
[root@hadoop101.yinzhengjie.org.cn /yinzhengjie/softwares]# cp -r hadoop-2.10.0 fully-mode           #用于完全分布式模式
[root@hadoop101.yinzhengjie.org.cn /yinzhengjie/softwares]# 
[root@hadoop101.yinzhengjie.org.cn /yinzhengjie/softwares]# 
[root@hadoop101.yinzhengjie.org.cn /yinzhengjie/softwares]# rm -rf hadoop-2.10.0
[root@hadoop101.yinzhengjie.org.cn /yinzhengjie/softwares]# 
[root@hadoop101.yinzhengjie.org.cn /yinzhengjie/softwares]# ln -sv pseudo-mode hadoop-2.10.0
‘hadoop-2.10.0’ -> ‘pseudo-mode’
[root@hadoop101.yinzhengjie.org.cn /yinzhengjie/softwares]# 
[root@hadoop101.yinzhengjie.org.cn /yinzhengjie/softwares]# ll
total 0
drwxr-xr-x 9 root root 149 Mar 10 23:41 fully-mode
lrwxrwxrwx 1 root root  11 Mar 10 23:42 hadoop-2.10.0 -> pseudo-mode
drwxr-xr-x 7   10  143 245 Dec 16  2018 jdk1.8.0_201
drwxr-xr-x 9 root root 149 Mar 10 23:38 local-mode
drwxr-xr-x 9 root root 149 Mar 10 23:41 pseudo-mode
[root@hadoop101.yinzhengjie.org.cn /yinzhengjie/softwares]# 
[root@hadoop101.yinzhengjie.org.cn /yinzhengjie/softwares]# 

 

三.部署Hadoop集群

  Hadoop的运行模式包括本地模式(Local(Standalone) Mode),伪分布式(Pseudo-Distributed Mode),分布式(Fully-Distributed Mode)。

  本地模式:
    不会用到HDFS存储,而是利用本地操作系统进行存储;
    不会用到YARN进行资源申请,而是利用本地操作系统进行资源调度;
    MapReduce也运行在本地操作系统上。
    综上所述,本地模式不会启动任何Hadoop进程,无论是存储还是计算其实使用的都是本地操作系统的资源,默认情况下,Hadoop被配置为以非分布式模式作为单个Java进程运行。这对于调试很有用。

  伪分布式模式:
    和本地模式相同点:
      都是在同一个节点上运行。
    和本地模式的区别:
      Hadoop也可以以伪分布式模式在单节点上运行,其中每个Hadoop守护程序都在单独的Java进程中运行。换句话说,会在操作系统启动Hadoop进程,只不过Hadoop的所有进程分配到同一个节点上啦。

  完全分布式模式:
    和伪分布式的相同点:
      都需要启动进程。
    和伪分布式的区别:
      Hadoop也可以以分布式模式在多个节点上运行,其中每个Hadoop守护进程都在单独的节点中运行,换句话说,会在不同的操作系统上启动Hadoop进程,只不过hadoop的所有进程分配到不同的节点上啦。

    
  博主推荐阅读:
    http://hadoop.apache.org/docs/r2.10.0/
    http://hadoop.apache.org/docs/r2.10.0/hadoop-project-dist/hadoop-common/SingleCluster.html
    http://hadoop.apache.org/docs/r2.10.0/hadoop-project-dist/hadoop-common/ClusterSetup.html

1>.本地(独立)模式

  博主推荐阅读:
    https://www.cnblogs.com/yinzhengjie2020/p/12423980.html
    http://hadoop.apache.org/docs/r2.10.0/hadoop-project-dist/hadoop-common/SingleCluster.html#Standalone_Operation

2>.伪分布式模式

  博主推荐阅读:
    https://www.cnblogs.com/yinzhengjie2020/p/12424154.html
    http://hadoop.apache.org/docs/r2.10.0/hadoop-project-dist/hadoop-common/SingleCluster.html#Pseudo-Distributed_Operation

3>.全分布式模式

  博主推荐阅读:
    https://www.cnblogs.com/yinzhengjie2020/p/12424192.html
    http://hadoop.apache.org/docs/r2.10.0/hadoop-project-dist/hadoop-common/SingleCluster.html#Fully-Distributed_Operation
posted @ 2020-03-05 21:15  JasonYin2020  阅读(536)  评论(0编辑  收藏  举报