HDFS集群搭建:伪分布式

参考网址:hadoop官网

前期准备:JAVA环境 + SSH,hadoop用java开发,java移动性好,C++移植性好。

问题:ssh远程登录有个弊端:通过SSH远程登录启动其JVM进程,由于SSH远程执行的时候是不会加载profile文件里面的环境变量的

实操论证:

在node1的profile中创建一个环境变量BIGDATA=hello,在Node1中打印

在node2节点远程登录node3,并且在node3中创建一个文件夹,去node3查看 ==> 可以查看是OK的

在node2中远程登录node1且打印node1中的环境变量====> 会发现无法打印,即不会加载pfofile环境配置。

那么可以远程链接node1并且加载其环境变量配置:ssh root@192.168.182.111 ‘source etc/profile ; echo $BIGDATA’

意味着虽然每台节点自己虽然可以正常获取环境变量,但是如果是远程的话无法获取环境变量配置信息=>意味着hadoop集群内要手动配置,即java_home变量信息除了告诉操作系统同时也要告诉Hadoop自己。

官网推荐的三种模式:

由于多节点涉及在当前节点操作不同的节点,所以最好在之前为各个节点设置免密操作。

搭建思路:

  • 基础设施

  • 部署配置

  • 初始化运行

  • 命令行使用

基础设施

操作系统、环境、网络、必须软件

  1. 设置IP以及主机名
  2. 关闭防火墙$selinux
  3. 设置host映射
  4. 时间同步
  5. 安装JDK
  6. 设置SSH免秘钥

设置IP以及主机名

设置IP地址/网关等

[root@localhost usr]# vim /etc/sysconfig/network-scripts/ifcfg-ens33 TYPE="Ethernet" PROXY_METHOD="none" BROWSER_ONLY="no" BOOTPROTO="static" DEFROUTE="yes" IPV4_FAILURE_FATAL="no" IPV6INIT="yes" IPV6_AUTOCONF="yes" IPV6_DEFROUTE="yes" IPV6_FAILURE_FATAL="no" IPV6_ADDR_GEN_MODE="stable-privacy" NAME="ens33" UUID="c8477fcc-505d-44b4-ae50-fc60d0b43f0d" DEVICE="ens33" ONBOOT="yes" IPADDR=192.168.182.111 GATEWAY=192.168.182.2 NETMASK=255.255.255.0 DNS1=114.114.114.114 DNS2=8.8.8.8

设置网卡

[root@localhost usr]# vim /etc/sysconfig/network # Created by anaconda NETWORKING=yes HOSTNAME=node01

设置hosts文件

[root@localhost ~]# vi /etc/hosts 127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4 ::1 localhost localhost.localdomain localhost6 localhost6.localdomain6 192.168.182.111 node01 192.168.182.112 node02 192.168.182.113 node03 192.168.182.114 node04

关闭防火墙&SElinux

踩坑:

[root@localhost ~]# service iptables stop Redirecting to /bin/systemctl stop iptables.service Failed to stop iptables.service: Unit iptables.service not loaded. [root@localhost ~]# chkconfig iptables off error reading information on service iptables: No such file or directory

解决方案:

解决方法: yum install -y iptables-services 实际上,centos7后是使用的基于iptable的systemctl stop firewalld systemctl stop firewalld

关闭SElinux

SElinux类似就是一种安全机制,假如现在是2023年一关机之后电脑重启可能出现一种现象就是时间回退了,变为2022年,那么根据Linux安全策略模式会触发SELinux,将计算机置为只读模式,将无法修改,SElinux有的版本有有的版本没有,但这毕竟是属于运维的知识,咱先不关注,详情可以参照[SELinux简介](# 附录一、SELinux简介)

SELinux 有三个运行状态,分别是disabled, permissive 和 enforcing

  • Disable: 禁用SELinux,不会给任何新资源打Label,如果重新启用的话,将会给资源重新打上Lable,过程会比较缓慢。
  • Permissive:如果违反安全策略,并不会真正的执行拒绝操作,替代的方式是记录一条log信息。
  • Enforcing: 默认模式,SELinux的正常状态,会实际禁用违反策略的操作

查看当前的运行状态

~]# getenforce Enforcing

临时改变运行状态为Permissive

# 临时关闭 ~]# setenforce 0 ~]# getenforce Permissive

临时改变运行状态为 Enforcing

~]# setenforce 1 ~]# getenforce Enforcing

使用sestatus可以查看完整的状态信息

~]# sestatus SELinux status: enabled SELinuxfs mount: /sys/fs/selinux SELinux root directory: /etc/selinux Loaded policy name: targeted Current mode: enforcing Mode from config file: enforcing Policy MLS status: enabled Policy deny_unknown status: allowed Max kernel policy version: 30

长久关闭

[root@localhost usr]# vim /etc/selinux/config # This file controls the state of SELinux on the system. # SELINUX= can take one of these three values: # enforcing - SELinux security policy is enforced. # permissive - SELinux prints warnings instead of enforcing. # disabled - No SELinux policy is loaded. #SELINUX=enforcing SELINUX=disable # SELINUXTYPE= can take one of three values: # targeted - Targeted processes are protected, # minimum - Modification of targeted policy. Only selected processes are protected. # mls - Multi Level Security protection. SELINUXTYPE=targeted

时间同步

由于集群之间的机器涉及到心跳问题,所以必须要做时间同步,不然心跳检测容易出现问题

[root@localhost usr]# yum install -y ntp Loaded plugins: fastestmirror, langpacks Loading mirror speeds from cached hostfile * base: mirrors.aliyun.com * epel: mirrors.bfsu.edu.cn * extras: mirrors.aliyun.com * updates: mirrors.aliyun.com Resolving Dependencies ........ ntpdate.x86_64 0:4.2.6p5-29.el7.centos.2 Complete!

同步阿里云时间

[root@localhost usr]# vim /etc/ntp.conf # For more information about this file, see the man pages # ntp.conf(5), ntp_acc(5), ntp_auth(5), ntp_clock(5), ntp_misc(5), ntp_mon(5). driftfile /var/lib/ntp/drift # Permit time synchronization with our time source, but do not # permit the source to query or modify the service on this system. restrict default nomodify notrap nopeer noquery # Permit all access over the loopback interface. This could # be tightened as well, but to do so would effect some of # the administrative functions. restrict 127.0.0.1 restrict ::1 # Hosts on local network are less restricted. #restrict 192.168.1.0 mask 255.255.255.0 nomodify notrap # Use public servers from the pool.ntp.org project. # Please consider joining the pool (http://www.pool.ntp.org/join.html). server ntp1.aliyun.com server 0.centos.pool.ntp.org iburst server 1.centos.pool.ntp.org iburst server 2.centos.pool.ntp.org iburst server 3.centos.pool.ntp.org iburst #broadcast 192.168.1.255 autokey # broadcast server #broadcastclient # broadcast client

重启ntpd并且设置为开机运行

[root@localhost ~]# service ntpd start Redirecting to /bin/systemctl start ntpd.service [root@localhost ~]# chkconfig ntpd on Note: Forwarding request to 'systemctl enable ntpd.service'. Created symlink from /etc/systemd/system/multi-user.target.wants/ntpd.service to /usr/lib/systemd/system/ntpd.service.

安装JDK

省略,老早之前就装了

记得设置环境变量。

设置SSH免密

Now check that you can ssh to the localhost without a passphrase:

$ ssh localhost

If you cannot ssh to localhost without a passphrase, execute the following commands:

$ ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa $ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys $ chmod 0600 ~/.ssh/authorized_keys

登录自己都需要密码,肯定不是免密

[root@localhost ~]# ssh localhost root@localhost's password: Last login: Sun Feb 26 23:18:48 2023 from ::1

进入ssh目录

[root@localhost ~]# ll -a total 40 dr-xr-x---. 5 root root 215 Feb 26 23:18 . dr-xr-xr-x. 17 root root 224 Nov 24 16:57 .. drwx------. 2 root root 25 Feb 26 23:18 .ssh

创建公钥

[root@localhost .ssh]# ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa Generating public/private rsa key pair. Your identification has been saved in /root/.ssh/id_rsa. Your public key has been saved in /root/.ssh/id_rsa.pub. The key fingerprint is: SHA256:nDSSn3kmPOzwK5iCXmH/Xdh9F6oveIkwXCC1WPDuiHs root@localhost.localdomain The key's randomart image is: +---[RSA 2048]----+ | .oo | | .+o. | | .+o+ | | .B * | | o o.S o . | | . + oB =o . . .| | . o = .=.oo.o ..| |. o +Eo .+.+. . .| |.. o. o....o. | +----[SHA256]-----+ [root@localhost .ssh]# ll total 12 -rw-------. 1 root root 1679 Feb 26 23:25 id_rsa -rw-r--r--. 1 root root 408 Feb 26 23:25 id_rsa.pub -rw-r--r--. 1 root root 171 Feb 26 23:18 known_hosts [root@localhost .ssh]# cat id_rsa.pub ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDMwB+mHmD89rIP0dIInwHMaToH0ZFWvksw5yfx5uatiRsNzVwk/BihDJYgr7NKZJVgKkCL5TCuTkKFQsL9cf/ypazOOnjTyuFEWm5XOLJbYAVA1T4cOtysSqK9GVC9HeFqk+bz5AGSR4QA3N5UzfQpXBfw5sl1b73qKBmyWkv0LcXRMexSeYYnof9rntOXVyWg7uFR2FTF4Lih+RWnWaMY/alGqjvvQq9lk+cqrvHytn+KNtIDko2PfK9W3K48rHYq27reAxa3YWKAn0qt2/bN2D5OzcbqpOntElvUEcq8uHyUFNTSdkcnawA0zz1IBQH86zms0mCaTKvKY9ZbCnRT root@localhost.localdomain [root@localhost .ssh]# cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys [root@localhost .ssh]# ssh localhost Last login: Sun Feb 26 23:19:00 2023 from ::1

如果A想远程登录B,那么

A本地创建公钥与私钥,同时将公钥追加到B的~/.ssh/authorized_keys文件里

如果是远程就涉及公钥分发。

部署配置

伪分布式:(单一节点)

  • 部署路径

  • 配置文件

    • hadoop-env.sh
    • core-site.xml
    • hdfs-site.xml
    • slaves

    hadoop配置

    [root@localhost opt]# whereis hadoop hadoop: /usr/local/lib/hadoop [root@localhost opt]# cd /usr/local/lib/hadoop [root@localhost hadoop]# ll total 330156 drwxr-xr-x. 9 courage courage 149 Sep 11 2019 hadoop-3.1.3 -rw-rw-r--. 1 courage courage 338075860 Nov 29 04:54 hadoop-3.1.3.tar.gz [root@localhost hadoop]# cd hadoop-3.1.3/ [root@localhost hadoop-3.1.3]# ll total 176 drwxr-xr-x. 2 courage courage 183 Sep 11 2019 bin # hadoop自身运行的一些功能命令 drwxr-xr-x. 3 courage courage 20 Sep 11 2019 etc # 配置 drwxr-xr-x. 2 courage courage 106 Sep 11 2019 include drwxr-xr-x. 3 courage courage 20 Sep 11 2019 lib # 库 drwxr-xr-x. 4 courage courage 288 Sep 11 2019 libexec -rw-rw-r--. 1 courage courage 147145 Sep 4 2019 LICENSE.txt -rw-rw-r--. 1 courage courage 21867 Sep 4 2019 NOTICE.txt -rw-rw-r--. 1 courage courage 1366 Sep 4 2019 README.txt drwxr-xr-x. 3 courage courage 4096 Sep 11 2019 sbin # 与服务器有关的一些服务脚本命令 drwxr-xr-x. 4 courage courage 31 Sep 11 2019 share # 放一些包

    为了hadoop到处可以运行,需要为hadoop配置环境变量,即etc/profile,o新开一行

[root@localhost bin]# vim /etc/profile unset i unset -f pathmunge export JAVA_HOME=/usr/local/lib/java/jdk1.8.0_212 #jdk安装目录 export JRE_HOME=${JAVA_HOME}/jre export CLASSPATH=.:${JAVA_HOME}/lib:${JRE_HOME}/lib:$CLASSPATH export JAVA_PATH=${JAVA_HOME}/bin:${JRE_HOME}/bin export ZK_HOME=/usr/local/lib/zookeeper/apache-zookeeper-3.5.7-bin export HADOOP_HOME=/usr/local/lib/hadoop/hadoop-3.1.3 # hadoop 1 export HADOOP_BIN=${HADOOP_HOME}/bin:${HADOOP_HOME}/sbin # hadoop 2 export ZK_PATH=${ZK_HOME}/bin export Rust_Home=/root/.cargo export Rust_PATH=${Rust_Home}/bin export PATH=$PATH:${JAVA_PATH}:${ZK_PATH}:${Rust_PATH}:${HADOOP_BIN} # hadoop 3 [root@localhost bin]# . /etc/profile [root@localhost bin]# hdfs hdfs hdfs.cmd

Hadoop配置

[root@localhost hadoop]# vim hadoop-3.1.3/etc/hadoop/hadoop-env.sh export JAVA_HOME=/usr/local/lib/java/jdk1.8.0_212 # 定义NameNode端口等信息 [root@localhost hadoop]# vim core-site.xml
<configuration> <property> <name>fs.defaultFS</name> <value>hdfs://node01:9000</value> </property> </configuration>

配置副本个数

etc/hadoop/hdfs-site.xml
<configuration> <property> <name>dfs.replication</name> <value>1</value> </property> </configuration>

slaves配置DataNode放的位置,在3.X版本文件名更改为workers

[root@localhost hadoop]# vim workers node01

secondNameNode配置

[root@localhost hadoop]# vim hdfs-site.xml <configuration> <property> <name>dfs.replication</name> <value>1</value> </property> <property> <name>dfs.namenode.name.dir</name> <value>/var/bigdata/hadoop/local/dfs/name</value> </property> <property> <name>dfs.datanode.data.dir</name> <value>/var/bigdata/hadoop/local/dfs/data</value> </property> <property> <name>dfs.namenode.secondary.http-address</name> <value>node01:50090</value> </property> <property> <name>dfs.namenode.checkpoint.dir</name> <value>/var/bigdata/hadoop/local/dfs/secondary</value> </property> </configuration>

初始化运行

[root@localhost current]# bin/hdfs namenode -format #初始化hdfs [root@localhost current]# ll total 16 -rw-r--r--. 1 root root 391 Feb 28 05:37 fsimage_0000000000000000000 -rw-r--r--. 1 root root 62 Feb 28 05:37 fsimage_0000000000000000000.md5 -rw-r--r--. 1 root root 2 Feb 28 05:37 seen_txid -rw-r--r--. 1 root root 213 Feb 28 05:37 VERSION [root@localhost current]# pwd /var/bigdata/hadoop/local/dfs/name/current [root@localhost current]# cat VERSION #Tue Feb 28 05:37:29 PST 2023 namespaceID=1019611069 clusterID=CID-a1b5b123-6b5b-48c4-a876-f27a34349d0b cTime=1677591449697 storageType=NAME_NODE blockpoolID=BP-278644475-127.0.0.1-1677591449697 layoutVersion=-64

修改win与hafs的映射

C:\windows\system32\drivers\etc

打开前端界面

创建一个文件,里面填充字符,后面进行

[root@localhost hadoop]# for i in `seq 1000000`;do echo 'hello co│ urage $i' >> data.txt;done

将新建的文件上传到HDFS,同时指定文件块大小

hdfs dfs -D dfs.blocksize=1048576 -put data.txt

image-20230305174527002

[root@localhost subdir0]# pwd /var/bigdata/hadoop/local/dfs/data/current/BP-278644475-127.0.0.1-1677591449697/current/finalized/subdir0/subdir0 [root@localhost subdir0]# vim blk_1073741828

可以看到hdfs并不关心文件里面的含义,只是根据byte进行切割。


__EOF__

本文作者等不到的口琴
本文链接https://www.cnblogs.com/Courage129/p/17528364.html
关于博主:评论和私信会在第一时间回复。或者直接私信我。
版权声明:本博客所有文章除特别声明外,均采用 BY-NC-SA 许可协议。转载请注明出处!
声援博主:如果您觉得文章对您有帮助,可以点击文章右下角推荐一下。您的鼓励是博主的最大动力!
posted @   等不到的口琴  阅读(110)  评论(0编辑  收藏  举报
相关博文:
阅读排行:
· 阿里最新开源QwQ-32B,效果媲美deepseek-r1满血版,部署成本又又又降低了!
· 单线程的Redis速度为什么快?
· SQL Server 2025 AI相关能力初探
· AI编程工具终极对决:字节Trae VS Cursor,谁才是开发者新宠?
· 展开说说关于C#中ORM框架的用法!
点击右上角即可分享
微信分享提示