在ssh无需密码登录以及jdk、hadoop路径配置好后,我们以master、slave1为例:

配置hadoop集群

一共有7个文件要修改:

hadoop-2.6.0/etc/hadoop/hadoop-env.sh

hadoop-2.6.0/etc/hadoop/yarn-env.sh

hadoop-2.6.0/etc/hadoop/core-site.xml

hadoop-2.6.0/etc/hadoop/hdfs-site.xml

hadoop-2.6.0/etc/hadoop/mapred-site.xml

hadoop-2.6.0/etc/hadoop/yarn-site.xml

hadoop-2.6.0/etc/hadoop/slaves

 

1.hadoop-env.sh yarn-env.sh

二个文件主要是修改JAVA_HOME改成实际本机jdk所在目录位置

执行命令

Sudo gedit  etc/hadoop/hadoop-env.sh (及 vi etc/hadoop/yarn-env.sh

打开文件找到下面这行的位置,改成(jdk目录位置,大家根据实际情况修改) 

export JAVA_HOME=/home/hadoop/jdk_1.7 

hadoop-env.sh中加上这句

export HADOOP_PREFIX=/home/hadoop/hadoop-2.6.0

 

2.core-site.xml

参考下面的内容修改:

<?xml version="1.0" encoding="UTF-8"?>

<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<configuration>

 <property>

   <name>fs.defaultFS</name>

   <value>hdfs://master:9000</value>       

 </property>

 <property>

     <name>hadoop.tmp.dir</name>

     <value>/home/hadoop/tmp</value>

 </property> 

</configuration> 

注:/home/hadoop/tmp 目录如不存在,则先mkdir手动创建

core-site.xml的完整参数请参考

http://hadoop.apache.org/docs/r2.6.0/hadoop-project-dist/hadoop-common/core-default.xml

3.hdfs-site.xml

参考下面的内容修改:

<?xml version="1.0" encoding="UTF-8"?>

   <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

   <configuration>

     <property>

       <name>dfs.datanode.ipc.address</name>

       <value>0.0.0.0:50020</value>

     </property>

     <property>

       <name>dfs.datanode.http.address</name>

      <value>0.0.0.0:50075</value>

    </property>  

<property>

<name>dfs.namenode.name.dir</name>

<value>file:/home/hadoop/data/namenode</value>

</property>

 

<property>

<name>dfs.datanode.data.dir</name>

<value>file:/home/hadoop/data/datanode</value>

</property>

<property>

<name>dfs.namenode.secondary.http-address</name>

<value>slave1:9001</value>

</property>    

    <property>

      <name>dfs.replication</name>

      <value>1</value>

    </property>

  <property>

   <name>dfs.permissions</name>

     <value>false</value>

  </property>

 </configuration>

 

hdfs-site.xml的完整参数请参考

http://hadoop.apache.org/docs/r2.6.0/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml

4.mapred-site.xml

参考下面的内容修改:

<?xml version="1.0"?>

<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<configuration>

    <property>

      <name>mapreduce.framework.name</name>

      <value>yarn</value>

</property>

<property>

        <name>mapreduce.jobhistory.address</name>

        <value>master:10020</value>

</property>

<property>

        <name>mapreduce.jobhistory.webapp.address</name>

        <value>master:19888</value>

 </property>

 </configuration>

mapred-site.xml的完整参数请参考

http://hadoop.apache.org/docs/r2.6.0/hadoop-mapreduce-client/hadoop-mapreduce-client-core/mapred-default.xml

5.yarn-site.xml

<?xml version="1.0"?>

<configuration>

<property>

<name>yarn.nodemanager.aux-services</name>

<value>mapreduce_shuffle</value>

</property>

<property>

<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>

<value>org.apache.hadoop.mapred.ShuffleHandler</value>

</property>

<property>

<name>yarn.resourcemanager.scheduler.address</name>

<value>master:8030</value>

</property>

<property>

<name>yarn.resourcemanager.resource-tracker.address</name>

<value>master8025</value>

</property>

<property>

<name>yarn.resourcemanager.address</name>

<value>master:8040</value>

 </property>

</configuration>

 yarn-site.xml的完整参数请参考

http://hadoop.apache.org/docs/r2.6.0/hadoop-yarn/hadoop-yarn-common/yarn-default.xml

 

6.Slaves

执行命令:

执行命令

$gedit slaves

编辑该文件,输入

slave1

master

 

7.修改/etc/profile设置环境变量

执行命令

$sudo gedit /etc/profile

 

8.配置/etc/profile

打开/etc/profile,添加hadoop配置内容。注意CLASSPATH,PATH是在原来的配置项上添加hadoop的包路径

export HADOOP_HOME=/home/hadoop/hadoop-2.6.0

export  CLASSPATH=.:$JAVA_HOME/lib:$JRE_HOME/lib:$CLASSPATH:$HADOOP_HOME/share/hadoop/common/hadoop-common-2.6.0.jar:$HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-client-core-2.6.0.jar:$HADOOP_HOME/share/hadoop/common/lib/commons-cli-1.2.jar

export PATH=$JAVA_HOME/bin:$JRE_HOME/bin:$HADOOP_HOME/sbin:$HADOOP_HOME/bin:$PATH

 

9.分发到集群的其它机器

scp -r hadoop-2.6.0/  hadoop@slave1: hadoop-2.6.0

测试hadoop配置

master上启用 NameNode测试

 

10.检测:

a.执行命令 格式化

$ hdfs namenode -format

15/02/12 21:29:53 INFO namenode.FSImage: Allocated new BlockPoolId: BP-85825581-192.168.187.102-1423747793784

15/02/12 21:29:53 INFO common.Storage: Storage directory /home/hadoop/tmp/dfs/name has been successfully formatted.

等看到执行信息有has been successfully formatted表示格式化ok

 

 

b.执行命令启动hadoop集群

Start-dfs.sh

c.启动完成后,输入jps查看进程

$jps

5161 SecondaryNameNode

4989 NameNode

 如果看到上面二个进程,表示master节点成功

执行命令

$start-yarn.sh

$jps

5161 SecondaryNameNode

5320 ResourceManager

4989 NameNode

如果看到上面3个进程,表示 yarn启动完成

执行命令

$ stop-dfs.sh

$ stop-yarn.sh

保存退出停掉刚才启动的服务

 

11.复制

master上的hadoop目录复制到slave1

master机器上

cd 先进入主目录

scp -r hadoop-2.6.0 hadoop@slave1:/home/hadoop/

slave1上的hadoop临时目录(tmp)及数据目录(data),仍然要先手动创建。

测试:master节点上,重新启动

执行命令

$ start-dfs.sh

$ start-yarn.sh

 

master节点上有几下3个进程:

7482 ResourceManager

7335 SecondaryNameNode

7159 NameNode

 

slave1slave2上有几下2个进程:

2296 DataNode

2398 NodeManager

如果出现上述情况,则证明你的集群创建成功

 

 其中比较必须重要的:

Core-site.xml--->hdfs://hostnameOrIp:9000

Hdfs-site.xml--->replication=1  //副本数

Slaves hostname

调试时Rm -rf data tmp

hdfs namenode -format

Start-dfs.sh

posted on 2018-01-24 19:23  NightRaven  阅读(160)  评论(0编辑  收藏  举报