Hadoop 集群-伪分布式模式(Pseudo-Distributed Operation)

         Hadoop 集群-伪分布式模式(Pseudo-Distributed Operation)

                                        作者:尹正杰

版权声明:原创作品,谢绝转载!否则将追究法律责任。

 

 

 

   Hadoop也可以以伪分布式模式在单节点上运行,其中每个Hadoop守护程序都在单独的Java进程中运行。

 

一.查看官方文档

1>.查看对应Apache Hadoop版本的官方文档

  博主推荐阅读:
    http://hadoop.apache.org/docs/r2.10.0/

2>.查看Hadoop设置单节点群集的文档

  博主推荐阅读:
    http://hadoop.apache.org/docs/r2.10.0/hadoop-project-dist/hadoop-common/SingleCluster.html#Pseudo-Distributed_Operation

3>.Hadoop配置文件说明

  Hadoop配置文件分两类:默认配置文件和自定义配置文件,只有用户想修改某一默认配置值时,才需要修改自定义配置文件,更改相应属性值。

  默认配置文件:
    core-default.xml:
      默认配置文件存放在"${HADOOP_HOME}/share/doc/hadoop/hadoop-project-dist/hadoop-common/"路径下。
      博主推荐阅读:
        http://hadoop.apache.org/docs/r2.10.0/hadoop-project-dist/hadoop-common/core-default.xml

    hdfs-default.xml:
      默认配置文件存放在"${HADOOP_HOME}/share/doc/hadoop/hadoop-project-dist/hadoop-hdfs/"路径下。
      博主推荐阅读:
        http://hadoop.apache.org/docs/r2.10.0/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml

    yarn-default.xml:
      默认配置文件存放在"/share/doc/hadoop/hadoop-yarn/hadoop-yarn-common/"路径下。
      博主推荐阅读:
        http://hadoop.apache.org/docs/r2.10.0/hadoop-yarn/hadoop-yarn-common/yarn-default.xml
        
    mapred-default.xml:
      默认配置文件存放在""路径下。
      博主推荐阅读:    
        http://hadoop.apache.org/docs/r2.10.0/hadoop-mapreduce-client/hadoop-mapreduce-client-core/mapred-default.xml

  自定义配置文件:          core-site.xml,hdfs-site.xml,yarn-site.xml,mapred-site.xml这四个配置文件存放默认存放在"$HADOOP_HOME/etc/hadoop"这个路径下。     用户可以根据项目需求重新进行修改配置,自定义的值会覆盖上面对应默认配置文件的值哟。     博主推荐阅读:       http://hadoop.apache.org/docs/r2.10.0/hadoop-project-dist/hadoop-common/DeprecatedProperties.html
[root@hadoop101.yinzhengjie.org.cn ~]# ll ${HADOOP_HOME}/share/doc/hadoop/hadoop-project-dist/hadoop-common/core-default.xml
-rw-r--r-- 1 root root 104408 Mar 12 03:27 /yinzhengjie/softwares/hadoop-2.10.0/share/doc/hadoop/hadoop-project-dist/hadoop-common/core-default.xml
[root@hadoop101.yinzhengjie.org.cn ~]# 
[root@hadoop101.yinzhengjie.org.cn ~]# cat  ${HADOOP_HOME}/share/doc/hadoop/hadoop-project-dist/hadoop-common/core-default.xml | wc -l
3103
[root@hadoop101.yinzhengjie.org.cn ~]# 
[root@hadoop101.yinzhengjie.org.cn ~]# 
[root@hadoop101.yinzhengjie.org.cn ~]# ll ${HADOOP_HOME}/share/doc/hadoop/hadoop-project-dist/hadoop-common/core-default.xml
[root@hadoop101.yinzhengjie.org.cn ~]# ll ${HADOOP_HOME}/share/doc/hadoop/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml
-rw-r--r-- 1 root root 160516 Mar 12 03:27 /yinzhengjie/softwares/hadoop-2.10.0/share/doc/hadoop/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml
[root@hadoop101.yinzhengjie.org.cn ~]# 
[root@hadoop101.yinzhengjie.org.cn ~]# cat ${HADOOP_HOME}/share/doc/hadoop/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml | wc -l
4667
[root@hadoop101.yinzhengjie.org.cn ~]# 
[root@hadoop101.yinzhengjie.org.cn ~]# 
[root@hadoop101.yinzhengjie.org.cn ~]# ll ${HADOOP_HOME}/share/doc/hadoop/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml
[root@hadoop101.yinzhengjie.org.cn ~]# ll ${HADOOP_HOME}/share/doc/hadoop/hadoop-yarn/hadoop-yarn-common/yarn-default.xml
-rw-r--r-- 1 root root 132138 Mar 12 03:27 /yinzhengjie/softwares/hadoop-2.10.0/share/doc/hadoop/hadoop-yarn/hadoop-yarn-common/yarn-default.xml
[root@hadoop101.yinzhengjie.org.cn ~]# 
[root@hadoop101.yinzhengjie.org.cn ~]# cat ${HADOOP_HOME}/share/doc/hadoop/hadoop-yarn/hadoop-yarn-common/yarn-default.xml | wc -l
3580
[root@hadoop101.yinzhengjie.org.cn ~]# 
[root@hadoop101.yinzhengjie.org.cn ~]# 
[root@hadoop101.yinzhengjie.org.cn ~]# ll ${HADOOP_HOME}/share/doc/hadoop/hadoop-yarn/hadoop-yarn-common/yarn-default.xml
[root@hadoop101.yinzhengjie.org.cn ~]# ll ${HADOOP_HOME}/share/doc/hadoop/hadoop-mapreduce-client/hadoop-mapreduce-client-core/mapred-default.xml
-rw-r--r-- 1 root root 74650 Mar 12 03:27 /yinzhengjie/softwares/hadoop-2.10.0/share/doc/hadoop/hadoop-mapreduce-client/hadoop-mapreduce-client-core/mapred-default.xml
[root@hadoop101.yinzhengjie.org.cn ~]# 
[root@hadoop101.yinzhengjie.org.cn ~]# cat ${HADOOP_HOME}/share/doc/hadoop/hadoop-mapreduce-client/hadoop-mapreduce-client-core/mapred-default.xml | wc -l
2045
[root@hadoop101.yinzhengjie.org.cn ~]# 
[root@hadoop101.yinzhengjie.org.cn ~]# 
[root@hadoop101.yinzhengjie.org.cn ~]# ll ${HADOOP_HOME}/share/doc/hadoop/hadoop-mapreduce-client/hadoop-mapreduce-client-core/mapred-default.xml

 

二.伪分布式配置HDFS实操案例

1>.修改core-site.xml

[root@hadoop101.yinzhengjie.org.cn ~]# vim /yinzhengjie/softwares/hadoop-2.10.0/etc/hadoop/core-site.xml 
[root@hadoop101.yinzhengjie.org.cn ~]# 
[root@hadoop101.yinzhengjie.org.cn ~]# cat /yinzhengjie/softwares/hadoop-2.10.0/etc/hadoop/core-site.xml 
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->

<!-- Put site-specific property overrides in this file. -->

<configuration>
    <!-- 指定HDFS中NameNode的RPC地址 -->
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://hadoop101.yinzhengjie.org.cn:9000</value>
    </property>

    <property>
        <name>hadoop.tmp.dir</name>
        <value>/yinzhengjie/softwares/hadoop-2.10.0/data/tmp</value>
        <description>指定Hadoop运行时产生文件的存储目录</description>
    </property>

</configuration>
[root@hadoop101.yinzhengjie.org.cn ~]# 
[root@hadoop101.yinzhengjie.org.cn ~]# 
core-site.xml配置文件的作用:
    用于定义系统级别的参数,如HDFS URL、Hadoop的临时目录以及用于机架感知(rack-aware)集群中的配置文件的配置等,此中的参数定义会覆盖core-default.xml文件中的默认配置。

fs.defaultFS 参数的作用:
  声明namenode的地址,相当于声明hdfs文件系统。

hadoop.tmp.dir 参数的作用:
  声明hadoop工作目录的地址。
core-site.xml配置文件的作用

2>.修改hdfs-site.xml

[root@hadoop101.yinzhengjie.org.cn ~]# vim /yinzhengjie/softwares/hadoop-2.10.0/etc/hadoop/hdfs-site.xml 
[root@hadoop101.yinzhengjie.org.cn ~]# 
[root@hadoop101.yinzhengjie.org.cn ~]# cat /yinzhengjie/softwares/hadoop-2.10.0/etc/hadoop/hdfs-site.xml 
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->

<!-- Put site-specific property overrides in this file. -->

<configuration>
    <!-- 指定HDFS副本的数量 -->
    <property>
        <name>dfs.replication</name>
        <value>1</value>
    </property>
</configuration>
[root@hadoop101.yinzhengjie.org.cn ~]# 
hdfs-site.xml 配置文件的作用:
  HDFS的相关设定,如文件副本的个数、块大小及是否使用强制权限等,此中的参数定义会覆盖hdfs-default.xml文件中的默认配置.

dfs.replication 参数的作用:
  为了数据可用性及冗余的目的,HDFS会在多个节点上保存同一个数据块的多个副本,其默认为3个。而只有一个节点的伪分布式环境中其仅用保存一个副本即可,这可以通过dfs.replication属性进行定义。它是一个软件级备份。
hdfs-site.xml 配置文件的作用

3>.格式化namenode

[root@hadoop101.yinzhengjie.org.cn ~]# mkdir -pv /yinzhengjie/softwares/pseudo-mode/data/tmp/      #该目录是我们在配置文件中指定Hadoop在运行时产生的数据存放目录,需要咱们手动创建哟~
mkdir: created directory ‘/yinzhengjie/softwares/pseudo-mode/data/tmp/’
[root@hadoop101.yinzhengjie.org.cn ~]# 
[root@hadoop101.yinzhengjie.org.cn ~]# ll /yinzhengjie/softwares/hadoop-2.10.0/data/tmp/ -R
/yinzhengjie/softwares/hadoop-2.10.0/data/tmp/:
total 0
[root@hadoop101.yinzhengjie.org.cn ~]# 
[root@hadoop101.yinzhengjie.org.cn ~]# hdfs namenode -format                          #执行格式化命令
[root@hadoop101.yinzhengjie.org.cn ~]# 
[root@hadoop101.yinzhengjie.org.cn ~]# ll /yinzhengjie/softwares/hadoop-2.10.0/data/tmp/ -R
/yinzhengjie/softwares/hadoop-2.10.0/data/tmp/:
total 0
drwxr-xr-x 3 root root 18 Mar 11 00:51 dfs

/yinzhengjie/softwares/hadoop-2.10.0/data/tmp/dfs:
total 0
drwxr-xr-x 3 root root 21 Mar 11 00:51 name

/yinzhengjie/softwares/hadoop-2.10.0/data/tmp/dfs/name:
total 0
drwxr-xr-x 2 root root 112 Mar 11 00:51 current

/yinzhengjie/softwares/hadoop-2.10.0/data/tmp/dfs/name/current:
total 16
-rw-r--r-- 1 root root 323 Mar 11 00:51 fsimage_0000000000000000000        #HDFS文件系统元数据的一个永久性的检查点,其中包含HDFS文件系统的所有目录和文件idnode的序列化信息。 
-rw-r--r-- 1 root root  62 Mar 11 00:51 fsimage_0000000000000000000.md5     #存放对应镜像文件的md5校验值
-rw-r--r-- 1 root root   2 Mar 11 00:51 seen_txid                  #保存的是一个数字,就是最后一个edits_的数字
-rw-r--r-- 1 root root 217 Mar 11 00:51 VERSION                  #HDFS版本信息的描述文件
[root@hadoop101.yinzhengjie.org.cn ~]# 
[root@hadoop101.yinzhengjie.org.cn ~]# 
[root@hadoop101.yinzhengjie.org.cn ~]# cat /yinzhengjie/softwares/hadoop-2.10.0/data/tmp/dfs/name/current/VERSION 
#Wed Mar 11 00:51:16 CST 2020
namespaceID=1733780700
clusterID=CID-50dd1d9b-c90a-4928-a554-d2b4e777cfcb
cTime=1583859076195
storageType=NAME_NODE
blockpoolID=BP-746062383-172.200.4.101-1583859076195
layoutVersion=-63
[root@hadoop101.yinzhengjie.org.cn ~]# 


温馨提示:
  namespaceID:
    是文件系统命名空间的唯一标识,是在namenode首次格式化的时候创建的。

  clusterID:
    是将HDFS集群作为一个整体赋予的唯一标识符,对于联邦HDFS非常重要,因为在联邦HDFS机制下一个集群由多个命名空间组成,每个命名空间由一个namenode管理。

  cTime:
    标记了namenode存储系统的创建时间。对于刚刚格式化的文件系统这个属性值为0,在文件系统升级之后这个值就会更新到新的时间戳。

  storageType:
    说明该存储目录包含的是namenode的数据结构。

  blockpoolID:
    是数据块池的唯一标识符,数据块池中包含了由一个namenode管理的命名空间中的所有文件

  layoutVersion:
    是一个负整数,描述HDFS持久性数据结构(也称之为布局)的版本,但是这个与Hadoop发布包的版本号无关。
[root@hadoop101.yinzhengjie.org.cn ~]# cat /yinzhengjie/softwares/hadoop-2.10.0/data/tmp/dfs/name/current/VERSION

4>.启动namenode

[root@hadoop101.yinzhengjie.org.cn ~]# jps
10701 Jps
[root@hadoop101.yinzhengjie.org.cn ~]# 
[root@hadoop101.yinzhengjie.org.cn ~]# hadoop-daemon.sh start namenode
starting namenode, logging to /yinzhengjie/softwares/pseudo-mode/logs/hadoop-root-namenode-hadoop101.yinzhengjie.org.cn.out
[root@hadoop101.yinzhengjie.org.cn ~]# 
[root@hadoop101.yinzhengjie.org.cn ~]# jps
10808 Jps
10731 NameNode
[root@hadoop101.yinzhengjie.org.cn ~]# 
[root@hadoop101.yinzhengjie.org.cn ~]# tree /yinzhengjie/softwares/hadoop-2.10.0/data/tmp/dfs/name/
/yinzhengjie/softwares/hadoop-2.10.0/data/tmp/dfs/name/
├── current
│   ├── edits_0000000000000000001-0000000000000000001      #已滚动的编辑日志文件
│   ├── edits_inprogress_0000000000000000002            #当前可写的编辑日志文件
│   ├── fsimage_0000000000000000000
│   ├── fsimage_0000000000000000000.md5
│   ├── seen_txid
│   └── VERSION
└── in_use.lock                             #一个锁文件,namenode使用该文件为存储目录加锁,避免其它namenode实例同时使用同一个存储目录的情况。

1 directory, 7 files
[root@hadoop101.yinzhengjie.org.cn ~]# 
[root@hadoop101.yinzhengjie.org.cn ~]# ss -ntl
State      Recv-Q Send-Q                                                                                          Local Address:Port                                                                                                         Peer Address:Port              
LISTEN     0      128                                                                                             172.200.4.101:9000                                                                                                                    *:*                  
LISTEN     0      128                                                                                                         *:50070                                                                                                                   *:*                  
LISTEN     0      128                                                                                                         *:22                                                                                                                      *:*                  
LISTEN     0      128                                                                                                        :::22                                                                                                                     :::*                  
[root@hadoop101.yinzhengjie.org.cn ~]# 

5>.启动datanode

[root@hadoop101.yinzhengjie.org.cn ~]# jps
10832 Jps
10731 NameNode
[root@hadoop101.yinzhengjie.org.cn ~]# 
[root@hadoop101.yinzhengjie.org.cn ~]# hadoop-daemon.sh start datanode
starting datanode, logging to /yinzhengjie/softwares/pseudo-mode/logs/hadoop-root-datanode-hadoop101.yinzhengjie.org.cn.out
[root@hadoop101.yinzhengjie.org.cn ~]# 
[root@hadoop101.yinzhengjie.org.cn ~]# 
[root@hadoop101.yinzhengjie.org.cn ~]# jps
10857 DataNode
10731 NameNode
10940 Jps
[root@hadoop101.yinzhengjie.org.cn ~]# 
[root@hadoop101.yinzhengjie.org.cn ~]# ll /yinzhengjie/softwares/hadoop-2.10.0/data/tmp/dfs/data/current/
total 4
drwx------ 4 root root  54 Mar 11 01:34 BP-746062383-172.200.4.101-1583859076195
-rw-r--r-- 1 root root 229 Mar 11 01:34 VERSION
[root@hadoop101.yinzhengjie.org.cn ~]# 
[root@hadoop101.yinzhengjie.org.cn ~]# ll /yinzhengjie/softwares/hadoop-2.10.0/data/tmp/dfs/data/current/BP-746062383-172.200.4.101-1583859076195/current/
total 4
drwxr-xr-x 2 root root   6 Mar 11 01:34 finalized
drwxr-xr-x 2 root root   6 Mar 11 01:34 rbw
-rw-r--r-- 1 root root 144 Mar 11 01:34 VERSION
[root@hadoop101.yinzhengjie.org.cn ~]# 
[root@hadoop101.yinzhengjie.org.cn ~]# cat /yinzhengjie/softwares/hadoop-2.10.0/data/tmp/dfs/data/current/VERSION 
#Wed Mar 11 01:34:46 CST 2020
storageID=DS-3676d3ef-2dec-4e4e-b461-078884bf2251
clusterID=CID-50dd1d9b-c90a-4928-a554-d2b4e777cfcb
cTime=0
datanodeUuid=ac57c3e3-a212-45a7-a541-1792e84725fa
storageType=DATA_NODE
layoutVersion=-57
[root@hadoop101.yinzhengjie.org.cn ~]# 
[root@hadoop101.yinzhengjie.org.cn ~]# 


温馨提示:
  storageID:
    存储ID号,用于标识一个DataNode的存储一个存储路径,DataNode可以配置多个目录存储,因此每个目录都有存储ID编号哟~

  clusterID:
    集群ID,全局唯一

  cTime:
    标记了datanode存储系统的创建时间,对于刚刚格式化的存储系统,这个属性为0;但是在文件系统升级之后,该值会更新到新的时间戳。

  datanodeUuid:
    datanode 的唯一识别码

  storageType:
    存储类型

  layoutVersion:
    是一个负整数。通常只有 HDFS 增加新    特性时才会更新这个版本号。 
[root@hadoop101.yinzhengjie.org.cn ~]# cat /yinzhengjie/softwares/hadoop-2.10.0/data/tmp/dfs/data/current/VERSION
[root@hadoop101.yinzhengjie.org.cn ~]# cat /yinzhengjie/softwares/hadoop-2.10.0/data/tmp/dfs/data/current/BP-746062383-172.200.4.101-1583859076195/current/VERSION 
#Wed Mar 11 01:34:46 CST 2020
namespaceID=1733780700
cTime=1583859076195
blockpoolID=BP-746062383-172.200.4.101-1583859076195
layoutVersion=-57
[root@hadoop101.yinzhengjie.org.cn ~]# 
[root@hadoop101.yinzhengjie.org.cn ~]# 


温馨提示:
  namespaceID:
    是文件系统命名空间的唯一标识,是在namenode首次格式化的时候创建的。
 
  cTime:
    标记了namenode存储系统的创建时间。对于刚刚格式化的文件系统这个属性值为0,在文件系统升级之后这个值就会更新到新的时间戳。
 
  blockpoolID:
    是数据块池的唯一标识符,数据块池中包含了由一个namenode管理的命名空间中的所有文件

  layoutVersion:
    是一个负整数,描述HDFS持久性数据结构(也称之为布局)的版本,但是这个与Hadoop发布包的版本号无关。
[root@hadoop101.yinzhengjie.org.cn ~]# cat /yinzhengjie/softwares/hadoop-2.10.0/data/tmp/dfs/data/current/BP-746062383-172.200.4.101-1583859076195/current/VERSION

6>.上传本地文件到HDFS集群

[root@hadoop101.yinzhengjie.org.cn ~]# ll /yinzhengjie/softwares/hadoop-2.10.0/data/tmp/dfs/data/current/BP-746062383-172.200.4.101-1583859076195/current/finalized/
total 0
[root@hadoop101.yinzhengjie.org.cn ~]# 
[root@hadoop101.yinzhengjie.org.cn ~]# hdfs dfs -ls /
[root@hadoop101.yinzhengjie.org.cn ~]# 
[root@hadoop101.yinzhengjie.org.cn ~]# 
[root@hadoop101.yinzhengjie.org.cn ~]# hdfs dfs -put /var/log/messages /
[root@hadoop101.yinzhengjie.org.cn ~]# 
[root@hadoop101.yinzhengjie.org.cn ~]# 
[root@hadoop101.yinzhengjie.org.cn ~]# hdfs dfs -ls /
Found 1 items
-rw-r--r--   1 root supergroup       6881 2020-03-11 02:59 /messages
[root@hadoop101.yinzhengjie.org.cn ~]# 
[root@hadoop101.yinzhengjie.org.cn ~]# 
[root@hadoop101.yinzhengjie.org.cn ~]# ll /yinzhengjie/softwares/hadoop-2.10.0/data/tmp/dfs/data/current/BP-746062383-172.200.4.101-1583859076195/current/finalized/
total 0
drwxr-xr-x 3 root root 21 Mar 11 02:59 subdir0
[root@hadoop101.yinzhengjie.org.cn ~]# 
[root@hadoop101.yinzhengjie.org.cn ~]# ll /yinzhengjie/softwares/hadoop-2.10.0/data/tmp/dfs/data/current/BP-746062383-172.200.4.101-1583859076195/current/finalized/subdir0/
total 0
drwxr-xr-x 2 root root 60 Mar 11 02:59 subdir0
[root@hadoop101.yinzhengjie.org.cn ~]# 
[root@hadoop101.yinzhengjie.org.cn ~]# ll /yinzhengjie/softwares/hadoop-2.10.0/data/tmp/dfs/data/current/BP-746062383-172.200.4.101-1583859076195/current/finalized/subdir0/subdir0/
total 12
-rw-r--r-- 1 root root 6881 Mar 11 02:59 blk_1073741826
-rw-r--r-- 1 root root   63 Mar 11 02:59 blk_1073741826_1002.meta
[root@hadoop101.yinzhengjie.org.cn ~]# 

7>.温馨提示

  生产环境中NameNode格式化一次即可,如果格式化一次以上,则有以下事情发生:
    (1)如果一个NameNode被格式化了两次这意味着将之前的所有数据删除;
    (2)如果格式化NameNode后,NameNode可以正常启动,但DataNode无法启动,原因是DataNode的VERSION版本中保存的依旧是上一次格式化的集群ID,需要手动修改集群ID号或者将DataNode的数据删除掉重新启动DataNode。

 

.伪分布式配置YARN实操案例

1>.修改yarn-env.sh

[root@hadoop101.yinzhengjie.org.cn ~]# echo ${JAVA_HOME}
/yinzhengjie/softwares/jdk1.8.0_201
[root@hadoop101.yinzhengjie.org.cn ~]# 
[root@hadoop101.yinzhengjie.org.cn ~]# grep ^export /yinzhengjie/softwares/hadoop-2.10.0/etc/hadoop/yarn-env.sh | grep JAVA_HOME
export JAVA_HOME=/yinzhengjie/softwares/jdk1.8.0_201
[root@hadoop101.yinzhengjie.org.cn ~]# 
[root@hadoop101.yinzhengjie.org.cn ~]# 

2>.修改yarn-site.xml

[root@hadoop101.yinzhengjie.org.cn ~]# vim /yinzhengjie/softwares/hadoop-2.10.0/etc/hadoop/yarn-site.xml 
[root@hadoop101.yinzhengjie.org.cn ~]# 
[root@hadoop101.yinzhengjie.org.cn ~]# cat /yinzhengjie/softwares/hadoop-2.10.0/etc/hadoop/yarn-site.xml 
<?xml version="1.0"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->
<configuration>

<!-- Site specific YARN configuration properties -->

    <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
        <description>Reducer获取数据的方式</description>
    </property>

    <property>
        <name>yarn.resourcemanager.hostname</name>
        <value>hadoop101.yinzhengjie.org.cn</value>
        <description>指定YARN的ResourceManager的地址</description>
    </property>

</configuration>
[root@hadoop101.yinzhengjie.org.cn ~]# 
yarn-site.xml配置文件的作用:
  主要用于配置调度器级别的参数.

yarn.resourcemanager.hostname 参数的作用:
  指定资源管理器(resourcemanager)的主机名

yarn.nodemanager.aux-services 参数的作用:
  指定nodemanager使用shuffle
yarn-site.xml配置文件的作用

3>.修改mapred-env.sh文件

[root@hadoop101.yinzhengjie.org.cn ~]# echo ${JAVA_HOME}
/yinzhengjie/softwares/jdk1.8.0_201
[root@hadoop101.yinzhengjie.org.cn ~]# 
[root@hadoop101.yinzhengjie.org.cn ~]# grep ^export /yinzhengjie/softwares/hadoop-2.10.0/etc/hadoop/mapred-env.sh | grep JAVA_HOME
export JAVA_HOME=/yinzhengjie/softwares/jdk1.8.0_201
[root@hadoop101.yinzhengjie.org.cn ~]# 

4>.创建mapred-site.xml文件

[root@hadoop101.yinzhengjie.org.cn ~]# cp /yinzhengjie/softwares/hadoop-2.10.0/etc/hadoop/mapred-site.xml.template /yinzhengjie/softwares/hadoop-2.10.0/etc/hadoop/mapred-site.xml
[root@hadoop101.yinzhengjie.org.cn ~]# 
[root@hadoop101.yinzhengjie.org.cn ~]# vim /yinzhengjie/softwares/hadoop-2.10.0/etc/hadoop/mapred-site.xml
[root@hadoop101.yinzhengjie.org.cn ~]# 
[root@hadoop101.yinzhengjie.org.cn ~]# cat /yinzhengjie/softwares/hadoop-2.10.0/etc/hadoop/mapred-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->

<!-- Put site-specific property overrides in this file. -->

<configuration>

    <property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
        <description>指定MR运行在YARN上</description>
    </property>

</configuration>
[root@hadoop101.yinzhengjie.org.cn ~]# 
[root@hadoop101.yinzhengjie.org.cn ~]# 
mapred-site.xml 配置文件的作用:
  mapreduce的相关设定,如reduce任务的默认个数、任务所能够使用内存的默认上下限等,此中的参数定义会覆盖mapred-default.xml文件中的默认配置.

mapreduce.framework.name 参数的作用:
  指定MapReduce的计算框架,有三种可选,第一种:local(本地,默认就是本地模式),第二种是classic(hadoop一代执行框架),第三种是yarn(二代执行框架),我们这里配置用目前版本最新的计算框架yarn即可。
mapred-site.xml 配置文件的作用

5>.启动resourcemanager

[root@hadoop101.yinzhengjie.org.cn ~]# jps
12976 Jps
11397 DataNode
10731 NameNode
[root@hadoop101.yinzhengjie.org.cn ~]# 
[root@hadoop101.yinzhengjie.org.cn ~]# yarn-daemon.sh start resourcemanager
starting resourcemanager, logging to /yinzhengjie/softwares/pseudo-mode/logs/yarn-root-resourcemanager-hadoop101.yinzhengjie.org.cn.out
[root@hadoop101.yinzhengjie.org.cn ~]# 
[root@hadoop101.yinzhengjie.org.cn ~]# jps
13011 ResourceManager
11397 DataNode
13240 Jps
10731 NameNode
[root@hadoop101.yinzhengjie.org.cn ~]# 

6>.启动nodemanager

[root@hadoop101.yinzhengjie.org.cn ~]# jps
13011 ResourceManager
11397 DataNode
13271 Jps
10731 NameNode
[root@hadoop101.yinzhengjie.org.cn ~]# 
[root@hadoop101.yinzhengjie.org.cn ~]# yarn-daemon.sh start nodemanager
starting nodemanager, logging to /yinzhengjie/softwares/pseudo-mode/logs/yarn-root-nodemanager-hadoop101.yinzhengjie.org.cn.out
[root@hadoop101.yinzhengjie.org.cn ~]# 
[root@hadoop101.yinzhengjie.org.cn ~]# jps
13442 Jps
13011 ResourceManager
11397 DataNode
13306 NodeManager
10731 NameNode
[root@hadoop101.yinzhengjie.org.cn ~]# 

7>.将本地的测试文件上传到HDFS集群

[root@hadoop101.yinzhengjie.org.cn ~]# hdfs dfs -ls /
Found 1 items
-rw-r--r--   1 root supergroup       6881 2020-03-11 02:59 /messages
[root@hadoop101.yinzhengjie.org.cn ~]# 
[root@hadoop101.yinzhengjie.org.cn ~]# ll bigdata/
total 0
drwxr-xr-x 2 root root 187 Mar 10 22:24 inputDir
drwxr-xr-x 2 root root  88 Mar 10 22:35 outputDir
drwxr-xr-x 2 root root  24 Mar 10 22:55 wcinput
drwxr-xr-x 2 root root  88 Mar 10 23:01 wcoutput
[root@hadoop101.yinzhengjie.org.cn ~]# 
[root@hadoop101.yinzhengjie.org.cn ~]# hdfs dfs -put bigdata /
[root@hadoop101.yinzhengjie.org.cn ~]# 
[root@hadoop101.yinzhengjie.org.cn ~]# hdfs dfs -ls /
Found 2 items
drwxr-xr-x   - root supergroup          0 2020-03-11 03:55 /bigdata
-rw-r--r--   1 root supergroup       6881 2020-03-11 02:59 /messages
[root@hadoop101.yinzhengjie.org.cn ~]# 
[root@hadoop101.yinzhengjie.org.cn ~]# hdfs dfs -ls /bigdata
Found 4 items
drwxr-xr-x   - root supergroup          0 2020-03-11 03:55 /bigdata/inputDir
drwxr-xr-x   - root supergroup          0 2020-03-11 03:55 /bigdata/outputDir
drwxr-xr-x   - root supergroup          0 2020-03-11 03:55 /bigdata/wcinput
drwxr-xr-x   - root supergroup          0 2020-03-11 03:55 /bigdata/wcoutput
[root@hadoop101.yinzhengjie.org.cn ~]# 
[root@hadoop101.yinzhengjie.org.cn ~]# hdfs dfs -put bigdata /
[root@hadoop101.yinzhengjie.org.cn ~]# hadoop jar ${HADOOP_HOME}/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.10.0.jar wordcount /bigdata/wcinput /outputDir
20/03/11 03:56:35 INFO client.RMProxy: Connecting to ResourceManager at hadoop101.yinzhengjie.org.cn/172.200.4.101:8032
20/03/11 03:56:36 INFO input.FileInputFormat: Total input files to process : 1
20/03/11 03:56:36 INFO mapreduce.JobSubmitter: number of splits:1
20/03/11 03:56:36 INFO Configuration.deprecation: yarn.resourcemanager.system-metrics-publisher.enabled is deprecated. Instead, use yarn.system-metrics-publisher.enabled
20/03/11 03:56:36 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1583868707312_0002
20/03/11 03:56:36 INFO conf.Configuration: resource-types.xml not found
20/03/11 03:56:36 INFO resource.ResourceUtils: Unable to find 'resource-types.xml'.
20/03/11 03:56:36 INFO resource.ResourceUtils: Adding resource type - name = memory-mb, units = Mi, type = COUNTABLE
20/03/11 03:56:36 INFO resource.ResourceUtils: Adding resource type - name = vcores, units = , type = COUNTABLE
20/03/11 03:56:36 INFO impl.YarnClientImpl: Submitted application application_1583868707312_0002
20/03/11 03:56:36 INFO mapreduce.Job: The url to track the job: http://hadoop101.yinzhengjie.org.cn:8088/proxy/application_1583868707312_0002/
20/03/11 03:56:36 INFO mapreduce.Job: Running job: job_1583868707312_0002
20/03/11 03:56:41 INFO mapreduce.Job: Job job_1583868707312_0002 running in uber mode : false
20/03/11 03:56:41 INFO mapreduce.Job:  map 0% reduce 0%
20/03/11 03:56:46 INFO mapreduce.Job:  map 100% reduce 0%
20/03/11 03:56:51 INFO mapreduce.Job:  map 100% reduce 100%
20/03/11 03:56:52 INFO mapreduce.Job: Job job_1583868707312_0002 completed successfully
20/03/11 03:56:52 INFO mapreduce.Job: Counters: 49
    File System Counters
        FILE: Number of bytes read=1014
        FILE: Number of bytes written=412969
        FILE: Number of read operations=0
        FILE: Number of large read operations=0
        FILE: Number of write operations=0
        HDFS: Number of bytes read=794
        HDFS: Number of bytes written=708
        HDFS: Number of read operations=6
        HDFS: Number of large read operations=0
        HDFS: Number of write operations=2
    Job Counters 
        Launched map tasks=1
        Launched reduce tasks=1
        Data-local map tasks=1
        Total time spent by all maps in occupied slots (ms)=2585
        Total time spent by all reduces in occupied slots (ms)=2301
        Total time spent by all map tasks (ms)=2585
        Total time spent by all reduce tasks (ms)=2301
        Total vcore-milliseconds taken by all map tasks=2585
        Total vcore-milliseconds taken by all reduce tasks=2301
        Total megabyte-milliseconds taken by all map tasks=2647040
        Total megabyte-milliseconds taken by all reduce tasks=2356224
    Map-Reduce Framework
        Map input records=3
        Map output records=99
        Map output bytes=1057
        Map output materialized bytes=1014
        Input split bytes=132
        Combine input records=99
        Combine output records=75
        Reduce input groups=75
        Reduce shuffle bytes=1014
        Reduce input records=75
        Reduce output records=75
        Spilled Records=150
        Shuffled Maps =1
        Failed Shuffles=0
        Merged Map outputs=1
        GC time elapsed (ms)=206
        CPU time spent (ms)=1320
        Physical memory (bytes) snapshot=501587968
        Virtual memory (bytes) snapshot=4330676224
        Total committed heap usage (bytes)=295698432
    Shuffle Errors
        BAD_ID=0
        CONNECTION=0
        IO_ERROR=0
        WRONG_LENGTH=0
        WRONG_MAP=0
        WRONG_REDUCE=0
    File Input Format Counters 
        Bytes Read=662
    File Output Format Counters 
        Bytes Written=708
[root@hadoop101.yinzhengjie.org.cn ~]# 
[root@hadoop101.yinzhengjie.org.cn ~]# hadoop jar ${HADOOP_HOME}/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.10.0.jar wordcount /bigdata/wcinput /outputDir
[root@hadoop101.yinzhengjie.org.cn ~]# hdfs dfs -ls /
Found 4 items
drwxr-xr-x   - root supergroup          0 2020-03-11 03:55 /bigdata
-rw-r--r--   1 root supergroup       6881 2020-03-11 02:59 /messages
drwxr-xr-x   - root supergroup          0 2020-03-11 03:56 /outputDir
drwx------   - root supergroup          0 2020-03-11 03:56 /tmp
[root@hadoop101.yinzhengjie.org.cn ~]# 
[root@hadoop101.yinzhengjie.org.cn ~]# hdfs dfs -ls /outputDir
Found 2 items
-rw-r--r--   1 root supergroup          0 2020-03-11 03:56 /outputDir/_SUCCESS
-rw-r--r--   1 root supergroup        708 2020-03-11 03:56 /outputDir/part-r-00000
[root@hadoop101.yinzhengjie.org.cn ~]# 
[root@hadoop101.yinzhengjie.org.cn ~]# 
[root@hadoop101.yinzhengjie.org.cn ~]# hdfs dfs -cat /outputDir/part-r-00000
Apache    1
Apache™    1
Hadoop    1
Hadoop®    1
It    1
Rather    1
The    2
a    3
across    1
allows    1
and    2
application    1
at    1
be    1
cluster    1
clusters    1
computation    1
computers    1
computers,    1
computing.    1
data    1
deliver    1
delivering    1
designed    2
detect    1
develops    1
distributed    2
each    2
failures    1
failures.    1
for    2
framework    1
from    1
handle    1
hardware    1
high-availability,    1
highly-available    1
is    3
itself    1
large    1
layer,    1
library    2
local    1
machines,    1
may    1
models.    1
of    6
offering    1
on    2
open-source    1
processing    1
programming    1
project    1
prone    1
reliable,    1
rely    1
scalable,    1
scale    1
servers    1
service    1
sets    1
simple    1
single    1
so    1
software    2
storage.    1
than    1
that    1
the    3
thousands    1
to    5
top    1
up    1
using    1
which    1
[root@hadoop101.yinzhengjie.org.cn ~]# 
[root@hadoop101.yinzhengjie.org.cn ~]# hdfs dfs -cat /outputDir/part-r-00000

8>.查看WebUI

  浏览器访问ResourceManager的WebUI界面,查看MapReduce服务:
    http://hadoop101.yinzhengjie.org.cn:8088/cluster/apps

 

posted @ 2020-03-06 00:46  JasonYin2020  阅读(441)  评论(0编辑  收藏  举报