Hadoop伪分布式安装
伪分布式只需要一台服务器就可以完成,搭建集群之前需要selinux和防火墙
1.安装java并配置环境变量
1 2 3 4 5 6 7 8 9 10 11 12 13 | [root@node1 ~] # tar -xf jdk-8u144-linux-x64.gz -C /usr/ [root@node1 ~] # ln -sv /usr/jdk1.8.0_144/ /usr/java "/usr/java" -> "/usr/jdk1.8.0_144/" [root@node1 ~] # cat /etc/profile.d/java.sh export JAVA_HOME= /usr/java export PATH=$PATH:$JAVA_HOME /bin [root@node1 ~] # source /etc/profile.d/java.sh [root@node1 ~] # java -version java version "1.8.0_144" Java(TM) SE Runtime Environment (build 1.8.0_144-b01) Java HotSpot(TM) 64-Bit Server VM (build 25.144-b01, mixed mode) |
2.安装hadoop程序并配置环境变量
1 2 3 4 5 6 7 8 9 10 11 12 13 | [root@node1 ~] # tar xf hadoop-2.9.2.tar.gz -C /usr [root@node1 ~] # ln -sv /usr/hadoop-2.9.2/ /usr/hadoop "/usr/hadoop" -> "/usr/hadoop-2.9.2/" [root@node1 ~] # cat /etc/profile.d/hadoop.sh export HADOOP_HOME= /usr/hadoop-2 .9.2 export PATH=$PATH:$HADOOP_HOME /bin :$HADOOP_HOME /sbin 更改hadoop程序包内 hadoop- env .sh,mapred- env .sh,yarn- env .sh中的JAVA_HOME环境变量 [root@node1 ~] # grep 'export JAVA_HOME' /usr/hadoop/etc/hadoop/{hadoop-env.sh,mapred-env.sh,yarn-env.sh} /usr/hadoop/etc/hadoop/hadoop-env .sh: export JAVA_HOME= /usr/java /usr/hadoop/etc/hadoop/mapred-env .sh: export JAVA_HOME= /usr/java /usr/hadoop/etc/hadoop/yarn-env .sh: export JAVA_HOME= /usr/java |
3.配置主机名和hosts文件
1 2 3 4 5 6 | [root@localhost ~] # hostnamectl set-hostname node1 [root@localhost ~] # cat /etc/hosts 127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4 ::1 localhost localhost.localdomain localhost6 localhost6.localdomain6 192.168.159.129 node1 |
4.core-site.xml
1 2 3 4 5 6 7 8 9 10 | <configuration> <property> <name>fs.defaultFS< /name > <value>hdfs: //node1 :9000< /value > < /property > <property> <name>hadoop.tmp. dir < /name > <value> /usr/data/hadoop-local < /value > < /property > < /configuration > |
5.hdfs-site.xml
1 2 3 4 5 6 7 8 9 10 | <configuration> <property> <name>dfs.replication< /name > <value>3< /value > < /property > <property> <name>dfs.namenode.secondary.http-address< /name > <value>node1:50090< /value > < /property > < /configuration > |
6.slaves
1 | node1 |
7.mapred-site.xml
1 2 3 4 5 6 | <configuration> <property> <name>mapreduce.framework.name< /name > <value>yarn< /value > < /property > < /configuration > |
8.yarn-site.xml
1 2 3 4 5 6 7 8 9 10 | <configuration> <property> <name>yarn.nodemanager.aux-services< /name > <value>mapreduce_shuffle< /value > < /property > <property> <name>yarn.resourcemanager. hostname < /name > <value>node1< /value > < /property > < /configuration > |
9.创建hadoop数据存储目录
1 | mkdir /usr/data/hadoop-local |
10.格式化hdfs集群
1 | /usr/hadoop-w/bin/hdfs namenode - format |
11.启动各个组建
1 2 3 4 5 6 7 8 9 10 | [root@node1 hadoop-w] # /usr/hadoop-w/sbin/start-all.sh This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh Starting namenodes on [node1] node1: starting namenode, logging to /usr/hadoop-w/logs/hadoop-root-namenode-node1 .out node1: starting datanode, logging to /usr/hadoop-w/logs/hadoop-root-datanode-node1 .out Starting secondary namenodes [node1] node1: starting secondarynamenode, logging to /usr/hadoop-w/logs/hadoop-root-secondarynamenode-node1 .out starting yarn daemons starting resourcemanager, logging to /usr/hadoop-w/logs/yarn-root-resourcemanager-node1 .out node1: starting nodemanager, logging to /usr/hadoop-w/logs/yarn-root-nodemanager-node1 .out |
12.查看各个组件启动情况
1 2 3 4 5 6 7 8 | [root@node1 hadoop-w] # jps 3840 Jps 3430 ResourceManager 2264 JobHistoryServer 2985 NameNode 3116 DataNode 3532 NodeManager 3277 SecondaryNameNode |
- hadoop所有存储路径,如果不指定,都会根据core-stite.xml中的hadoop.tmp.dir创建
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 | [root@node1 hadoop-w] # tree /usr/data/hadoop-local/ /usr/data/hadoop-local/ ├── dfs │ ├── data │ │ ├── current │ │ │ ├── BP-1191695345-192.168.159.129-1582271980457 │ │ │ │ ├── current │ │ │ │ │ ├── dfsUsed │ │ │ │ │ ├── finalized │ │ │ │ │ ├── rbw │ │ │ │ │ └── VERSION │ │ │ │ ├── scanner.cursor │ │ │ │ └── tmp │ │ │ └── VERSION │ │ └── in_use.lock │ ├── name │ │ ├── current │ │ │ ├── edits_0000000000000000001-0000000000000000008 │ │ │ ├── edits_inprogress_0000000000000000009 │ │ │ ├── fsimage_0000000000000000000 │ │ │ ├── fsimage_0000000000000000000.md5 │ │ │ ├── seen_txid │ │ │ └── VERSION │ │ └── in_use.lock │ └── namesecondary │ └── in_use.lock └── nm- local - dir ├── filecache ├── nmPrivate └── usercache |
初学linux,每学到一点东西就写一点,如有不对的地方,恳请包涵!
【推荐】国内首个AI IDE,深度理解中文开发场景,立即下载体验Trae
【推荐】编程新体验,更懂你的AI,立即体验豆包MarsCode编程助手
【推荐】抖音旗下AI助手豆包,你的智能百科全书,全免费不限次数
【推荐】轻量又高性能的 SSH 工具 IShell:AI 加持,快人一步
· Linux系列:如何用heaptrack跟踪.NET程序的非托管内存泄露
· 开发者必知的日志记录最佳实践
· SQL Server 2025 AI相关能力初探
· Linux系列:如何用 C#调用 C方法造成内存泄露
· AI与.NET技术实操系列(二):开始使用ML.NET
· 没有Manus邀请码?试试免邀请码的MGX或者开源的OpenManus吧
· 无需6万激活码!GitHub神秘组织3小时极速复刻Manus,手把手教你使用OpenManus搭建本
· C#/.NET/.NET Core优秀项目和框架2025年2月简报
· DeepSeek在M芯片Mac上本地化部署
· 葡萄城 AI 搜索升级:DeepSeek 加持,客户体验更智能