在ssh无需密码登录以及jdk、hadoop路径配置好后,我们以master、slave1为例:
配置hadoop集群
一共有7个文件要修改:
hadoop-2.6.0/etc/hadoop/hadoop-env.sh
hadoop-2.6.0/etc/hadoop/yarn-env.sh
hadoop-2.6.0/etc/hadoop/core-site.xml
hadoop-2.6.0/etc/hadoop/hdfs-site.xml
hadoop-2.6.0/etc/hadoop/mapred-site.xml
hadoop-2.6.0/etc/hadoop/yarn-site.xml
hadoop-2.6.0/etc/hadoop/slaves
1.hadoop-env.sh 、yarn-env.sh
二个文件主要是修改JAVA_HOME改成实际本机jdk所在目录位置
执行命令
Sudo gedit etc/hadoop/hadoop-env.sh (及 vi etc/hadoop/yarn-env.sh)
打开文件找到下面这行的位置,改成(jdk目录位置,大家根据实际情况修改)
export JAVA_HOME=/home/hadoop/jdk_1.7
在 hadoop-env.sh中加上这句:
export HADOOP_PREFIX=/home/hadoop/hadoop-2.6.0
2.core-site.xml
参考下面的内容修改:
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://master:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/hadoop/tmp</value>
</property>
</configuration>
注:/home/hadoop/tmp 目录如不存在,则先mkdir手动创建
core-site.xml的完整参数请参考
http://hadoop.apache.org/docs/r2.6.0/hadoop-project-dist/hadoop-common/core-default.xml
3.hdfs-site.xml
参考下面的内容修改:
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>dfs.datanode.ipc.address</name>
<value>0.0.0.0:50020</value>
</property>
<property>
<name>dfs.datanode.http.address</name>
<value>0.0.0.0:50075</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/home/hadoop/data/namenode</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/home/hadoop/data/datanode</value>
</property>
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>slave1:9001</value>
</property>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.permissions</name>
<value>false</value>
</property>
</configuration>
hdfs-site.xml的完整参数请参考
http://hadoop.apache.org/docs/r2.6.0/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml
4.mapred-site.xml
参考下面的内容修改:
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.jobhistory.address</name>
<value>master:10020</value>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>master:19888</value>
</property>
</configuration>
mapred-site.xml的完整参数请参考
http://hadoop.apache.org/docs/r2.6.0/hadoop-mapreduce-client/hadoop-mapreduce-client-core/mapred-default.xml
5.yarn-site.xml
<?xml version="1.0"?>
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>master:8030</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>master8025</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>master:8040</value>
</property>
</configuration>
yarn-site.xml的完整参数请参考
http://hadoop.apache.org/docs/r2.6.0/hadoop-yarn/hadoop-yarn-common/yarn-default.xml
6.Slaves
执行命令:
执行命令
$gedit slaves
编辑该文件,输入
slave1
master
7.修改/etc/profile设置环境变量
执行命令
$sudo gedit /etc/profile
8.配置/etc/profile
打开/etc/profile,添加hadoop配置内容。注意CLASSPATH,PATH是在原来的配置项上添加hadoop的包路径
export HADOOP_HOME=/home/hadoop/hadoop-2.6.0
export CLASSPATH=.:$JAVA_HOME/lib:$JRE_HOME/lib:$CLASSPATH:$HADOOP_HOME/share/hadoop/common/hadoop-common-2.6.0.jar:$HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-client-core-2.6.0.jar:$HADOOP_HOME/share/hadoop/common/lib/commons-cli-1.2.jar
export PATH=$JAVA_HOME/bin:$JRE_HOME/bin:$HADOOP_HOME/sbin:$HADOOP_HOME/bin:$PATH
9.分发到集群的其它机器
scp -r hadoop-2.6.0/ hadoop@slave1: hadoop-2.6.0
测试hadoop配置
在master上启用 NameNode测试
10.检测:
a.执行命令 格式化
$ hdfs namenode -format
15/02/12 21:29:53 INFO namenode.FSImage: Allocated new BlockPoolId: BP-85825581-192.168.187.102-1423747793784
15/02/12 21:29:53 INFO common.Storage: Storage directory /home/hadoop/tmp/dfs/name has been successfully formatted.
等看到执行信息有has been successfully formatted表示格式化ok
b.执行命令启动hadoop集群
Start-dfs.sh
c.启动完成后,输入jps查看进程
$jps
5161 SecondaryNameNode
4989 NameNode
如果看到上面二个进程,表示master节点成功
执行命令
$start-yarn.sh
$jps
5161 SecondaryNameNode
5320 ResourceManager
4989 NameNode
如果看到上面3个进程,表示 yarn启动完成
执行命令
$ stop-dfs.sh
$ stop-yarn.sh
保存退出停掉刚才启动的服务
11.复制
将master上的hadoop目录复制到slave1
master机器上
cd 先进入主目录
scp -r hadoop-2.6.0 hadoop@slave1:/home/hadoop/
slave1上的hadoop临时目录(tmp)及数据目录(data),仍然要先手动创建。
测试:master节点上,重新启动
执行命令
$ start-dfs.sh
$ start-yarn.sh
master节点上有几下3个进程:
7482 ResourceManager
7335 SecondaryNameNode
7159 NameNode
slave1、slave2上有几下2个进程:
2296 DataNode
2398 NodeManager
如果出现上述情况,则证明你的集群创建成功
其中比较必须重要的:
Core-site.xml--->hdfs://hostnameOrIp:9000
Hdfs-site.xml--->replication=1 //副本数
Slaves hostname
调试时Rm -rf data tmp
hdfs namenode -format
Start-dfs.sh