0610 hadoop01

Hadoop第一天

1. 数据的分布式存储

 

2. 什么是HDFS?

  1. 海量数据是存储在集群上的(利用多台机器作为存储资源)
  2. 多台机器组成一个有组织的群体(主节点,从节点)
  3. 从节点启动后,向主节点汇报自已的资源
  4. 主节点接收到从节点的注册后,维护集群(列表有几个节点,每个节点的存储容量信息)
  5. 客户端存储数据时,请求主节点进行存储
  6. 主节点接收到客户端请求后,验证,返回存储位置给客户端
  7. 客户端请求对应的存储节点存储数据
  8. 数据在集群上存储,保存多个副本秋保证数据安全性

3. HDFS架构

 

4. 搭建Hadoop HDFS存储集群

4.1. 集群规划

主机名称

ip地址

安装节点名称

hadoop01

192.168.254.101

NameNode DataNode

hadoop02

192.168.254.102

DataNode

hadoop03

192.168.254.103

DataNode

4.2. 虚拟机集群环境准备(每台机器需要准备)

1) 安装jdk并配置好环境变量

2) 修改主机名

3) 修改主机名与ip映射关系

4) 关闭防火墙

5) 配置统一时间同步

4.3. NTP时间同步服务器配

4.3.1. 原理

 

4.3.2. 配置hadoop01时间同步服务器

查看时间同步状态:

[root@hadoop01 ~]# service ntpd status

ntpd is stopped

4.3.3. ntp配置文件

[root@hadoop01 ~]# vim /etc/ntp.conf

添加:

restrict 192.168.254.101 nomodify notrap nopeer noquery

修改:

restrict 192.168.254.1 mask 255.255.255.0 nomodify notrap

注释以下内容:

#server 0.centos.pool.ntp.org iburst

#server 1.centos.pool.ntp.org iburst

#server 2.centos.pool.ntp.org iburst

#server 3.centos.pool.ntp.org iburst

添加:

server 127.127.1.0

fudge 127.127.0.1 stratum 10

启动NTP服务:

[root@hadoop01 ~]# service ntpd start

设置开机启动:

[root@hadoop01 ~]# chkconfig ntpd on

4.3.4. 时间命令

查看当前的时间

[root@hadoop01 ~]# date

Thu Jun 10 10:45:39 EDT 2021

[root@hadoop03 ~]# service ntpd stop

[root@hadoop03 ~]# ntpdate hadoop01

10 Jun 11:23:57 ntpdate[2362]: step time server 192.168.254.101 offset 115685179.275931 sec

 

4.4. 客户端同步配置

[root@hadoop02 ~]# vim /etc/ntp.conf

修改:

# the administrative functions.

restrict 192.168.254.101 nomodify notrap nopeer noquery

# Hosts on local network are less restricted.

restrict 192.168.254.1 mask 255.255.255.0 nomodify notrap

 

注释掉:

#server 0.centos.pool.ntp.org iburst

#server 1.centos.pool.ntp.org iburst

#server 2.centos.pool.ntp.org iburst

#server 3.centos.pool.ntp.org iburst

添加:

server 192.168.254.101

fudge 127.127.0.1 stratum 10

 

4.5. 配置HDFS集群

4.5.1. 上传hadoop包到服务器

 

4.5.2. 解压安装包

[root@hadoop01 software]# tar -zxvf hadoop-2.7.2.tar.gz -C /opt/module/

4.5.3. 配置core-site.xml核心配置文件

[root@hadoop01 hadoop]# vim /opt/module/hadoop-2.7.2/etc/hadoop/core-site.xml

<configuration>

        <!--配置NameNode的地址-->

        <property>

                <name>fs.defaultFS</name>

                <value>hdfs://hadoop01:9000</value>

        </property>

        <!-- 指定hadoop运行时存储的临时文件目录-->

        <property>

          <name>hadoop.tmp.dir</name>

          <value>/opt/module/hadoop-2.7.2/data/tmp</value>

        </property>

</configuration>

 

4.5.4. 配置java环境变量

[root@hadoop01 hadoop]# vim /opt/module/hadoop-2.7.2/etc/hadoop/hadoop-env.sh

修改:

export JAVA_HOME=/opt/module/jdk1.8.0_144

4.5.5. 配置hdfs-site.xml

[root@hadoop01 hadoop]# vim /opt/module/hadoop-2.7.2/etc/hadoop/hdfs-site.xml

<configuration>

        <!-- 指定hdfs副本的数量-->

        <property>

          <name>dfs.replication</name>

          <value>3</value>

        </property>

        <property>

                <name>dfs.namenode.http-address</name>

                <value>hadoop01:50070</value>

        </property>

</configuration>

4.6. 配置yarn

4.6.1. 配置jdk路径

[root@hadoop01 hadoop]# vim yarn-env.sh

export JAVA_HOME=/opt/module/jdk1.8.0_144

4.6.2. 配置yarn-site.xml

[root@hadoop01 hadoop]# vim /opt/module/hadoop-2.7.2/etc/hadoop/yarn-site.xml

<configuration>

        <!--reduce获取数据方式-->

        <property>

                <name>yarn.nodemanager.aux-services</name>

                <value>mapreduce_shuffle</value>

        </property>

        <!--配置yarnresourcemanager地址-->

        <property>

                <name>yarn.resourcemanager.hostname</name>

                <value>hadoop01</value>

        </property>

</configuration>

4.6.3. 配置mapreducejdk环境

[root@hadoop01 hadoop]# vim mapred-env.sh

export JAVA_HOME=/opt/module/jdk1.8.0_144

4.6.4. 配置mapred-site.xml

[root@hadoop01 hadoop]# mv mapred-site.xml.template mapred-site.xml

<configuration>

        <!--指定mapreduce的运行方式yarn-->

        <property>

          <name>mapreduce.framework.name</name>

          <value>yarn</value>

        </property>

</configuration>

 

4.6.5. 配置dataNode节点

[root@hadoop01 hadoop]# vim slaves

hadoop01

hadoop02

hadoop03

4.6.6. 同步集群配置

[root@hadoop01 module]# xsync /opt/module/hadoop-2.7.2/

4.6.7. 启动集群方式

方式一: 单节点启动HDFS

[root@hadoop01 sbin]# ./hadoop-daemons.sh start/stop/restart  namenode|datanode|secondarynamenode

方式二: 多节点启动HDFS

[root@hadoop01 sbin]# ./stop-dfs.sh 或者

[root@hadoop01 sbin]# ./start-dfs.sh

Starting namenodes on [hadoop01]

hadoop01: starting namenode, logging to /opt/module/hadoop-2.7.2/logs/hadoop-root-namenode-hadoop01.out

hadoop03: starting datanode, logging to /opt/module/hadoop-2.7.2/logs/hadoop-root-datanode-hadoop03.out

hadoop02: starting datanode, logging to /opt/module/hadoop-2.7.2/logs/hadoop-root-datanode-hadoop02.out

hadoop01: starting datanode, logging to /opt/module/hadoop-2.7.2/logs/hadoop-root-datanode-hadoop01.out

 

4.7. 启动集群

第一次启动需要格式化namenode,以后再次启动不需要再执行以下命令:

[root@hadoop01 bin]# ./hadoop namenode -format

启动HDFS集群:

[root@hadoop01 sbin]# ./start-dfs.sh

4.8. 查看集群存储

 

 

5. HDFSshell操作

5.1. 基本命令格式

bin/hadoop fs 命令

5.2. 命令大全

Usage: hadoop fs [generic options]

        [-appendToFile <localsrc> ... <dst>]

        [-cat [-ignoreCrc] <src> ...]

        [-checksum <src> ...]

        [-chgrp [-R] GROUP PATH...]

        [-chmod [-R] <MODE[,MODE]... | OCTALMODE> PATH...]

        [-chown [-R] [OWNER][:[GROUP]] PATH...]

        [-copyFromLocal [-f] [-p] [-l] <localsrc> ... <dst>]

        [-copyToLocal [-p] [-ignoreCrc] [-crc] <src> ... <localdst>]

        [-count [-q] [-h] <path> ...]

        [-cp [-f] [-p | -p[topax]] <src> ... <dst>]

        [-createSnapshot <snapshotDir> [<snapshotName>]]

        [-deleteSnapshot <snapshotDir> <snapshotName>]

        [-df [-h] [<path> ...]]

        [-du [-s] [-h] <path> ...]

        [-expunge]

        [-find <path> ... <expression> ...]

        [-get [-p] [-ignoreCrc] [-crc] <src> ... <localdst>]

        [-getfacl [-R] <path>]

        [-getfattr [-R] {-n name | -d} [-e en] <path>]

        [-getmerge [-nl] <src> <localdst>]

        [-help [cmd ...]]

        [-ls [-d] [-h] [-R] [<path> ...]]

        [-mkdir [-p] <path> ...]

        [-moveFromLocal <localsrc> ... <dst>]

        [-moveToLocal <src> <localdst>]

        [-mv <src> ... <dst>]

        [-put [-f] [-p] [-l] <localsrc> ... <dst>]

        [-renameSnapshot <snapshotDir> <oldName> <newName>]

        [-rm [-f] [-r|-R] [-skipTrash] <src> ...]

        [-rmdir [--ignore-fail-on-non-empty] <dir> ...]

        [-setfacl [-R] [{-b|-k} {-m|-x <acl_spec>} <path>]|[--set <acl_spec> <path>]]

        [-setfattr {-n name [-v value] | -x name} <path>]

        [-setrep [-R] [-w] <rep> <path> ...]

        [-stat [format] <path> ...]

        [-tail [-f] <file>]

        [-test -[defsz] <path>]

        [-text [-ignoreCrc] <src> ...]

        [-touchz <path> ...]

        [-truncate [-w] <length> <path> ...]

        [-usage [cmd ...]]

 

查看帮助:

[root@hadoop01 hadoop-2.7.2]# bin/hadoop fs -命令

5.3. 常用的命令

5.3.1. 显示目录信息

[root@hadoop01 hadoop-2.7.2]# bin/hadoop fs -ls /

5.3.2. 上传文件到hdfs

[root@hadoop01 hadoop-2.7.2]# bin/hadoop fs -put /opt/software/hadoop-2.7.2.tar.gz hdfs://hadoop01:9000/

5.3.3. -mkdir 创建目录

[root@hadoop01 hadoop-2.7.2]# bin/hadoop fs -mkdir -p /abc/dbf/ccc

5.3.4. 删除目录

[root@hadoop01 hadoop-2.7.2]# bin/hadoop fs -mkdir -p /abc

 

6. HDFSjava客户端操作

6.1. 配置win10环境变量

 

 

 

 

6.2. 创建工程并导入依赖

<dependencies>
    <!--单元测试-->
    <dependency>
        <groupId>junit</groupId>
        <artifactId>junit</artifactId>
        <version>4.12</version>
    </dependency>
    <!--导入日志依赖-->
    <dependency>
        <groupId>org.apache.logging.log4j</groupId>
        <artifactId>log4j-core</artifactId>
        <version>2.13.3</version>
    </dependency>
    <!-- hadoop-common 公共依赖-->
    <dependency>
        <groupId>org.apache.hadoop</groupId>
        <artifactId>hadoop-common</artifactId>
        <version>2.7.2</version>
    </dependency>

    <!-- hadoop-client 客户端依赖-->
    <dependency>
        <groupId>org.apache.hadoop</groupId>
        <artifactId>hadoop-client</artifactId>
        <version>2.7.2</version>
    </dependency>

    <!-- hadoop-hdfs 依赖-->
    <dependency>
        <groupId>org.apache.hadoop</groupId>
        <artifactId>hadoop-hdfs</artifactId>
        <version>2.7.2</version>
    </dependency>


</dependencies>

6.3. 创建HdfsCient


public class HdfsCient {

    @Test
    public void testMkdir() throws  Exception{
        //获取文件系统
        Configuration configuration = new Configuration();
        //配置连接的集群
        configuration.set("fs.defaultFS","hdfs://hadoop01:9000");

        //获取文件系统
        FileSystem fs = FileSystem.get(new URI("hdfs://hadoop01:9000"),configuration,"root");

        //创建目录
        fs.mkdirs(new Path("/cc"));

        //关闭fs
        fs.close();

    }
}

 

小结: 

  1. 获取配置
  2. 设置配置信息
  3. 创建命令
  4. 关闭对象

 

posted @   linzm14  阅读(62)  评论(0编辑  收藏  举报
编辑推荐:
· go语言实现终端里的倒计时
· 如何编写易于单元测试的代码
· 10年+ .NET Coder 心语,封装的思维:从隐藏、稳定开始理解其本质意义
· .NET Core 中如何实现缓存的预热?
· 从 HTTP 原因短语缺失研究 HTTP/2 和 HTTP/3 的设计差异
阅读排行:
· 周边上新:园子的第一款马克杯温暖上架
· 分享 3 个 .NET 开源的文件压缩处理库,助力快速实现文件压缩解压功能!
· Ollama——大语言模型本地部署的极速利器
· DeepSeek如何颠覆传统软件测试?测试工程师会被淘汰吗?
· 使用C#创建一个MCP客户端
点击右上角即可分享
微信分享提示