hadoop2集群中关键配置文件的记录
配置HDFS 高可用
1.配置HDFS配置文件
$ vi hdfs-site.xml #写入 <configuration> #配置NameService 名字随便起 <property> <name>dfs.nameservices</name> <value>raphael</value> </property> # 这里的最后一个名字就是上面的nameService value是两台NameNode的节点 <property> <name>dfs.ha.namenodes.raphael</name> <value>node5,node8</value> </property> # node5和node8的rpc地址 <property> <name>dfs.namenode.rpc-address.raphael.node5</name> <value>node5:8020</value> </property> <property> <name>dfs.namenode.rpc-address.raphael.node8</name> <value>node8:8020</value> </property> # node5和node8的http地址 <property> <name>dfs.namenode.http-address.raphael.node5</name> <value>node5:50070</value> </property> <property> <name>dfs.namenode.http-address.raphael.node8</name> <value>node8:50070</value> </property> # 3台JournalNode地址,后台跟名字,但后面的名字不能与nameService相同 <property> <name>dfs.namenode.shared.edits.dir</name> <value>qjournal://node6:8485;node7:8485;node8:8485/raphael5200</value> </property> #配置客户端调用接口 <property> <name>dfs.client.failover.proxy.provider.raphael</name> <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value> </property> <property> <name>dfs.ha.fencing.methods</name> <value>sshfence</value> </property> <property> <name>dfs.ha.fencing.ssh.private-key-files</name> <value>/root/.ssh/id_dsa</value> </property> <property> <name>dfs.ha.fencing.methods</name> <value>sshfence</value> </property> <property> <name>dfs.ha.fencing.ssh.connect-timeout</name> <value>30000</value> </property> #配置journalnode目录 <property> <name>dfs.journalnode.edits.dir</name> <value>/opt/journalnode</value> </property> <property> <name>dfs.ha.automatic-failover.enabled</name> <value>true</value> </property>
$ vi core-site.xml #这里的value就是NameService的名字 <property> <name>fs.defaultFS</name> <value>hdfs://raphael</value> </property> <property> <name>hadoop.tmp.dir</name> <value>/opt/hadoop</value> </property> #3台zookeeper节点 <property> <name>ha.zookeeper.quorum</name> <value>node5:2181,node6:2181,node7:2181</value> </property>
使用Yarn来调度HDFS
1.配置yarn-site.xml
$ cd /usr/local/hadoop/ $ vim etc/hadoop/yarn-site.xml <property> <name>yarn.resourcemanager.ha.enabled</name> <value>true</value> </property> # 该cluster-id不能与nameService相同 <property> <name>yarn.resourcemanager.cluster-id</name> <value>raphael521</value> </property> #指定2台Resource Manager (即Name Node )节点 <property> <name>yarn.resourcemanager.ha.rm-ids</name> <value>rm1,rm2</value> </property> <property> <name>yarn.resourcemanager.hostname.rm1</name> <value>node5</value> </property> <property> <name>yarn.resourcemanager.hostname.rm2</name> <value>node8</value> </property> #指定zookeeper 节点 <property> <name>yarn.resourcemanager.zk-address</name> <value>node6:2181,node7:2181,node8:2181</value> </property>
<property>
<name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property>
</configuration>
2.配置etc/hadoop/mapred-site.xml
$ vim etc/hadoop/mapred-site.xml <configuration> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> </configuration>