Hadoop伪分布式安装

伪分布式只需要一台服务器就可以完成,搭建集群之前需要selinux和防火墙

1.安装java并配置环境变量

1
2
3
4
5
6
7
8
9
10
11
12
13
[root@node1 ~]# tar -xf jdk-8u144-linux-x64.gz -C /usr/
[root@node1 ~]# ln -sv /usr/jdk1.8.0_144/ /usr/java
"/usr/java" -> "/usr/jdk1.8.0_144/"
 
[root@node1 ~]# cat /etc/profile.d/java.sh
export JAVA_HOME=/usr/java
export PATH=$PATH:$JAVA_HOME/bin
 
[root@node1 ~]# source /etc/profile.d/java.sh
[root@node1 ~]# java -version                
java version "1.8.0_144"
Java(TM) SE Runtime Environment (build 1.8.0_144-b01)
Java HotSpot(TM) 64-Bit Server VM (build 25.144-b01, mixed mode)

2.安装hadoop程序并配置环境变量

1
2
3
4
5
6
7
8
9
10
11
12
13
[root@node1 ~]# tar xf hadoop-2.9.2.tar.gz -C /usr
[root@node1 ~]# ln -sv /usr/hadoop-2.9.2/ /usr/hadoop
"/usr/hadoop" -> "/usr/hadoop-2.9.2/"
 
[root@node1 ~]# cat /etc/profile.d/hadoop.sh
export HADOOP_HOME=/usr/hadoop-2.9.2
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
 
更改hadoop程序包内 hadoop-env.sh,mapred-env.sh,yarn-env.sh中的JAVA_HOME环境变量
[root@node1 ~]# grep 'export JAVA_HOME' /usr/hadoop/etc/hadoop/{hadoop-env.sh,mapred-env.sh,yarn-env.sh}
/usr/hadoop/etc/hadoop/hadoop-env.sh:export JAVA_HOME=/usr/java
/usr/hadoop/etc/hadoop/mapred-env.sh:export JAVA_HOME=/usr/java
/usr/hadoop/etc/hadoop/yarn-env.sh:export JAVA_HOME=/usr/java

3.配置主机名和hosts文件

1
2
3
4
5
6
[root@localhost ~]# hostnamectl set-hostname node1
 
[root@localhost ~]# cat /etc/hosts
127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6
192.168.159.129 node1  

4.core-site.xml

1
2
3
4
5
6
7
8
9
10
<configuration>
    <property> 
        <name>fs.defaultFS</name
        <value>hdfs://node1:9000</value>
    </property>
    <property>
        <name>hadoop.tmp.dir</name>
        <value>/usr/data/hadoop-local</value>
    </property>
</configuration>

5.hdfs-site.xml

1
2
3
4
5
6
7
8
9
10
<configuration>
    <property> 
        <name>dfs.replication</name
        <value>3</value
    </property
    <property>
        <name>dfs.namenode.secondary.http-address</name>
        <value>node1:50090</value>
    </property>
</configuration>

6.slaves 

1
node1

7.mapred-site.xml

1
2
3
4
5
6
<configuration>
    <property> 
        <name>mapreduce.framework.name</name
        <value>yarn</value
    </property
</configuration>

8.yarn-site.xml

1
2
3
4
5
6
7
8
9
10
<configuration>
    <property> 
        <name>yarn.nodemanager.aux-services</name
        <value>mapreduce_shuffle</value
    </property>
    <property>
        <name>yarn.resourcemanager.hostname</name>
        <value>node1</value>
    </property>
</configuration>

9.创建hadoop数据存储目录

1
mkdir /usr/data/hadoop-local

10.格式化hdfs集群

1
/usr/hadoop-w/bin/hdfs namenode -format

11.启动各个组建

1
2
3
4
5
6
7
8
9
10
[root@node1 hadoop-w]# /usr/hadoop-w/sbin/start-all.sh
This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh
Starting namenodes on [node1]
node1: starting namenode, logging to /usr/hadoop-w/logs/hadoop-root-namenode-node1.out
node1: starting datanode, logging to /usr/hadoop-w/logs/hadoop-root-datanode-node1.out
Starting secondary namenodes [node1]
node1: starting secondarynamenode, logging to /usr/hadoop-w/logs/hadoop-root-secondarynamenode-node1.out
starting yarn daemons
starting resourcemanager, logging to /usr/hadoop-w/logs/yarn-root-resourcemanager-node1.out
node1: starting nodemanager, logging to /usr/hadoop-w/logs/yarn-root-nodemanager-node1.out

12.查看各个组件启动情况

1
2
3
4
5
6
7
8
[root@node1 hadoop-w]# jps
3840 Jps
3430 ResourceManager
2264 JobHistoryServer
2985 NameNode
3116 DataNode
3532 NodeManager
3277 SecondaryNameNode
  • hadoop所有存储路径,如果不指定,都会根据core-stite.xml中的hadoop.tmp.dir创建 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
[root@node1 hadoop-w]# tree /usr/data/hadoop-local/
/usr/data/hadoop-local/
├── dfs
│   ├── data
│   │   ├── current
│   │   │   ├── BP-1191695345-192.168.159.129-1582271980457
│   │   │   │   ├── current
│   │   │   │   │   ├── dfsUsed
│   │   │   │   │   ├── finalized
│   │   │   │   │   ├── rbw
│   │   │   │   │   └── VERSION
│   │   │   │   ├── scanner.cursor
│   │   │   │   └── tmp
│   │   │   └── VERSION
│   │   └── in_use.lock
│   ├── name
│   │   ├── current
│   │   │   ├── edits_0000000000000000001-0000000000000000008
│   │   │   ├── edits_inprogress_0000000000000000009
│   │   │   ├── fsimage_0000000000000000000
│   │   │   ├── fsimage_0000000000000000000.md5
│   │   │   ├── seen_txid
│   │   │   └── VERSION
│   │   └── in_use.lock
│   └── namesecondary
│       └── in_use.lock
└── nm-local-dir
    ├── filecache
    ├── nmPrivate
    └── usercache

  

  

  

  

 

 

  

posted @   ForLivetoLearn  阅读(163)  评论(0编辑  收藏  举报
编辑推荐:
· Linux系列:如何用heaptrack跟踪.NET程序的非托管内存泄露
· 开发者必知的日志记录最佳实践
· SQL Server 2025 AI相关能力初探
· Linux系列:如何用 C#调用 C方法造成内存泄露
· AI与.NET技术实操系列(二):开始使用ML.NET
阅读排行:
· 没有Manus邀请码?试试免邀请码的MGX或者开源的OpenManus吧
· 无需6万激活码!GitHub神秘组织3小时极速复刻Manus,手把手教你使用OpenManus搭建本
· C#/.NET/.NET Core优秀项目和框架2025年2月简报
· DeepSeek在M芯片Mac上本地化部署
· 葡萄城 AI 搜索升级:DeepSeek 加持,客户体验更智能
点击右上角即可分享
微信分享提示