Hadoop集群搭建
集群规划
主机IP | 主机名 | 任务 |
---|---|---|
172.16.1.221 | hadoop01 | namenode、datanode、resourcemanager、nodemanager、jobhistoryserver |
172.16.1.222 | hadoop02 | secondarynamenode、datanode、nodemanager |
172.16.1.223 | hadoop03 | datanode、nodemanager |
PS: ip和主机名根据实际情况修改
下载
官网下载 http://hadoop.apache.org/releases.html
解压
tar -zxvf /home/hadoop-2.7.6.tar.gz -C /usr/local
配置环境变量
vi /etc/profile
点击查看代码
#hadoop environment
export HADOOP_HOME=/usr/local/hadoop-2.7.6
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:
分发hadoop02和hadoop03
scp /etc/profile hadoop02:/etc/
scp /etc/profile hadoop03:/etc/
刷新环境变量
三台主机分别执行以下命令
source /etc/profile
配置6个核心配置文件
- hadoop-env.sh
- core-site.xml
- hdfs-site.xml
- mapred-site.xml 复制mapred-site.xml.template而来
- yarn-site.xml
- slaves
配置hadoop-env.sh
点击查看代码
export JAVA_HOME=/usr/local/jdk1.8.0_152/
配置core-site.xml
点击查看代码
<!--文件系统的命名空间 hdfs的文件系统的总入口-->
<property>
<name>fs.defaultFS</name>
<value>hdfs://hadoop01:9000</value>
</property>
<!--hadoop 运行产生的临时数据目录-->
<property> <name>hadoop.tmp.dir</name>
<value>/usr/local/hadoop-2.7.6/hadoopdata/tmp</value>
</property>
<!--buffer大小-->
<property>
<name>io.file.buffer.size</name>
<value>4096</value>
</property>
配置hdfs-site.xml
点击查看代码
<!--配置块的副本数-->
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
<!--hdfs的块大小-->
<property>
<name>dfs.blocksize</name>
<value>134217728</value>
</property>
<!--namenode的web ui监控端口-->
<property>
<name>dfs.http.address</name>
<value>hadoop01:50070</value>
</property>
<!--secondarynamenode的web ui监控端口-->
<property>
<name>dfs.secondary.http.address</name>
<value>hadoop02:50090</value>
</property>
<!--hdfs元数据存储目录-->
<property>
<name>dfs.namenode.name.dir</name>
<value>/home/hadoopdata/dfs/name</value>
</property>
<!--真正的数据内容(数据块)存储目录-->
<property>
<name>dfs.datanode.data.dir</name>
<value>/home/hadoopdata/dfs/data</value>
</property>
<!--文件系统检测目录-->
<property>
<name>fs.checkpoint.dir</name>
<value>/home/hadoopdata/checkpoint/dfs/cname</value>
</property>
<!--edits的检测目录-->
<property>
<name>fs.checkpoint.edits.dir</name>
<value>/home/hadoopdata/checkpoint/dfs/cname</value>
</property>
<!--是否开启文件系统权限-->
<property>
<name>dfs.permissions</name>
<value>false</value>
</property>
配置mapred-site.xml
mv ./etc/hadoop/mapred-site.xml.template ./etc/hadoop/mapred-site.xml
点击查看代码
<!--指定mapreudce的运行平台-->
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<!--历史作业记录的内部通信地址-->
<property>
<name>mapreduce.jobhistory.address</name>
<value>hadoop01:10020</value>
</property>
<!--历史作业记录的web ui通信地址-->
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>hadoop01:19888</value>
</property>
配置yarn-site.xml
点击查看代码
<!--指定mapreduce使用shuffle-->
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<!--指定yarn的rm所启动的主机-->
<property>
<name>yarn.resourcemanager.hostname</name>
<value>hadoop01</value>
</property>
<!--指定rm的内部通信地址-->
<property>
<name>yarn.resourcemanager.address</name>
<value>hadoop01:8032</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>hadoop01:8030</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>hadoop01:8031</value>
</property>
<property>
<name>yarn.resourcemanager.admin.address</name>
<value>hadoop01:8033</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>hadoop01:8088</value>
</property>
配置slaves
点击查看代码
hadoop01
hadoop02
hadoop03
分发
将配置完的hadoop分发到hadoop02\hadoop03
scp -r ../hadoop-2.7.6/ hadoop02:/usr/local/
scp -r ../hadoop-2.7.6/ hadoop03:/usr/local/
格式化hdfs
在hadoop01上格式化hdfs
hdfs namenode -format
启动停止
全启动或停止
start-all.sh
stop-all.sh
模块启动
start-dfs.sh
stop-dfs.sh
start-yarn.sh
stop-yarn.sh
单个启动
hadoop-daemon.sh start namenode/datanode/secondarynamenode
hadoop-daemon.sh stop namenode/datanode/secondarynamenode
hadoop-daemons.sh start namenode/datanode
hadoop-daemons.sh stop namenode/datanode
yarn-daemon.sh start resourcemanager/nodemanager
yarn-daemon.sh stop resourcemanager/nodemanager
yarn-daemons.sh start nodemanager
yarn-daemons.sh stop nodemanager
启动历史服务
mr-jobhistory-daemon.sh start historyserver
进程检测
jps
测试
上传下载执行测试
hdfs dfs -put /home/words /
hdfs dfs -get /words /home/word
yarn测试
yarn jar ./share/hadoop/mapreduce/hadoop- mapreduce-examples-2.7.6.jar wordcount /words /home/out/00
页面访问测试
http://hadoop01:50070
http://hadoop01:50090
http://hadoop01:8088