Hadoop伪分布式部署
参考了很多文章,不过环境的差别导致问题也不完全相同,这里记录一下我部署的情况。
环境:腾讯云 centos7
软件版本:Hadoop 2.7.7 (后续准备上hbase2.1.5+phoenix5.0.0-HBase2.0)
部署过程网上有很多了,这里只贴我遇到的问题和配置
最开始是按最基本的配置跑,都启动之后执行MapReduce任务运行到running job卡住,然后就开始修改各种配置了
- core-site.xml
<?xml version="1.0" encoding="UTF-8"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <configuration> <property> <name>fs.defaultFS</name> <value>hdfs://172.21.x.x:9000</value> </property> <property> <name>hadoop.tmp.dir</name> <value>/home/hadoop/tmp</value> </property> </configuration>
ps:这里注意fs.defaultFS要设置内网地址
- hdfs-site.xml
<?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <configuration> <property> <name>dfs.namenode.name.dir</name> <value>file:/home/hadoop/dfs/name</value> <final>true</final> </property> <property> <name>dfs.datanode.data.dir</name> <value>file:/home/hadoop/dfs/data</value> <final>true</final> </property> <property> <name>dfs.replication</name> <value>1</value> </property> <property> <name>dfs.permissions</name> <value>false</value> </property> </configuration>
- mapred-xml
<?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <configuration> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> <property> <name>mapred.system.dir</name> <value>file:/home/hadoop/mapred/system</value> <final>true</final> </property> <property> <name>mapred.local.dir</name> <value>file:/home/hadoop/mapred/local</value> <final>true</final> </property> </configuration>
- yarn-site.xml
<?xml version="1.0"?> <configuration> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <property> <name>yarn.resourcemanager.hostname</name> <value>mytccloud</value> </property> <property> <name>yarn.log-aggregation-enable</name> <value>true</value> </property> <property> <name>yarn.log-aggregation.retain-seconds</name> <value>604800</value> </property> <property> <name>yarn.resourcemanager.webapp.address</name> <value>mytccloud:8088</value> </property> <property> <name>yarn.resourcemanager.resource-tracker.address</name> <value>mytccloud:8031</value> </property> <property> <name>yarn.resourcemanager.address</name> <value>mytccloud:8032</value> </property> <property> <name>yarn.resourcemanager.admin.address</name> <value>mytccloud:8033</value> </property> <property> <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name> <value>org.apache.hadoop.mapred.ShuffleHandler</value> </property> </configuration>
ps:这里注意 yarn.resourcemanager.webapp.address 我设置了机器名+端口,但是需要再设置host,否则可能无法启动resoucemanager
- hostname
centod7 修改hostname: hostnamectl set-hostname mytccloud
- hosts: 172.21.x.x mytccloud (也是映射到内网,将localhost的部分都注释掉)
- 目录结构
[hadoop@mytccloud hdfs]$ pwd
/home/hadoop/hdfs