Hadoop集群配置文档

目录

  Hadoop集群配置文档

    1.      准备工作

    2.      安装jdk.

    3.      设定三台机器登入免密码

    4.      安装hadoop

    5.      设定 *-site.xml(所有节点相同)

    6.      设定 主从节点(所有节点相同)

    7.      格式化HDFS

    8.      启动hadoop

    9.      测试wordcount

 

1.      准备工作

准备机器:一台master,两台slave

安装软件:

  Hadoop版本 1.0.2

  Jdk版本 1.7.0_03

安装路径:/usr/local/webserver/hadoop

数据目录:/data/hadoop/

集群信息:

机器名

Ip地址

作用

master

192.168.1.1

NameNode、JobTracker

slave1

192.168.1.2

DataNode、TaskTracker

slave2

192.168.1.3

DataNode、TaskTracker

 

 

先设定主机名,每台主机名都不一样

在192.168.1.1设

[root@localhost src]# echo "master" > hostname

[root@localhost src]# mv hostname /etc/hostname

[root@localhost src]# hostname -F /etc/hostname

#检查

[root@localhost src]# hostname

master

 

 

在192.168.1.2设

[root@localhost src]# echo "slave1" > hostname

[root@localhost src]# mv hostname /etc/hostname

[root@localhost src]# hostname -F /etc/hostname

#检查

[root@localhost src]# hostname

slave1

 

 

在192.168.1.3设

[root@localhost src]# echo "slave2" > hostname

[root@localhost src]# mv hostname /etc/hostname

[root@localhost src]# hostname -F /etc/hostname

#检查

[root@localhost src]# hostname

slave2

 

 

配置每台机器的/etc/hosts保证各台机器之间通过机器名可以互访,例如:

[root@localhost src]# vi /etc/hosts

# Do not remove the following line, or various programs

# that require network functionality will fail.

127.0.0.1               link-masterdb localhost.localdomain localhost

::1             localhost6.localdomain6 localhost6

#hadoop

192.168.1.1 master
192.168.1.2 slave1

192.168.1.3 slave2

 

2.      安装jdk

安装程序

[root@localhost src]# rpm -ivh jdk-7u3-linux-x64.rpm

 

 

Linux安装JDK步骤2.设置环境变量。

[root@localhost src]# vi /etc/profile

#在最后面加入

#set java environment

JAVA_HOME=/usr/java/jdk1.7.0_03

CLASSPATH=.:$JAVA_HOME/lib.tools.jar

PATH=$JAVA_HOME/bin:$PATH

export JAVA_HOME CLASSPATH PATH

 

 

执行下面命令,让环境变量生效

[root@localhost src]# source /etc/profile

 

 

Linux安装JDK步骤3.在终端使用echo命令检查环境变量设置情况。

[root@localhost src]# echo $JAVA_HOME

/usr/java/jdk1.7.0_03

[root@localhost src]# echo $CLASSPATH

.:/usr/java/jdk1.7.0_03/lib.tools.jar

[root@localhost src]# echo $PATH

/usr/java/jdk1.7.0_03/bin:/usr/java/jdk-1_5_0_02/bin:/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin:/root/bin

 

4.检查JDK是否安装成功。

[root@localhost src]#  java -version

java version "1.7.0_03"

Java(TM) SE Runtime Environment (build 1.7.0_03-b04)

Java HotSpot(TM) 64-Bit Server VM (build 22.1-b02, mixed mode)

如果看到JVM版本及相关信息,即安装成功!

 

3.      设定三台机器登入免密码

请注意我們试验检查环境已经把 /etc/ssh/ssh_config里的StrictHostKeyChecking改成no,下面的指令可以检查,如果你的设定不同的话,请修改此档会比较顺。

[root@localhost src]# cat /etc/ssh/ssh_config |grep StrictHostKeyChecking

StrictHostKeyChecking no

 

 

在"master" 上操作

接着将key产生并复制到其他node上

[root@localhost src]# ssh-keygen -t rsa -f ~/.ssh/id_rsa -P ""

[root@localhost src]# cp ~/.ssh/id_rsa.pub ~/.ssh/authorized_keys

[root@localhost src]# scp -r ~/.ssh slave1:~/

[root@localhost src]# scp -r ~/.ssh slave2:~/

 

 

测试看看是否登入免密码

[root@localhost src]# ssh slave1

[root@localhost src]# ssh master

[root@localhost src]# ssh slave2

[root@localhost src]# ssh master

[root@localhost src]# exit

 

 

4.      安装hadoop

[root@localhost src]# tar zxvf hadoop-1.0.2.tar.gz

[root@localhost src]# mkdir /usr/local/webserver

[root@localhost src]# mv hadoop-1.0.2 /usr/local/webserver/hadoop

[root@localhost src]# cd /usr/local/webserver/hadoop

 

 

在hadoop-env.sh 中添加export JAVA_HOME=/root/jdk1.6.0_14

[root@localhost src]# vim conf/hadoop-env.sh

export JAVA_HOME=/usr/java/jdk1.7.0_03

 

测试hadoop是否安装成功

[root@localhost hadoop]# bin/hadoop

Usage: hadoop [--config confdir] COMMAND

where COMMAND is one of:

  namenode -format     format the DFS filesystem

  secondarynamenode    run the DFS secondary namenode

  namenode             run the DFS namenode

  datanode             run a DFS datanode

  dfsadmin             run a DFS admin client

  mradmin              run a Map-Reduce admin client

  fsck                 run a DFS filesystem checking utility

  fs                   run a generic filesystem user client

  balancer             run a cluster balancing utility

  fetchdt              fetch a delegation token from the NameNode

  jobtracker           run the MapReduce job Tracker node

  pipes                run a Pipes job

  tasktracker          run a MapReduce task Tracker node

  historyserver        run job history servers as a standalone daemon

  job                  manipulate MapReduce jobs

  queue                get information regarding JobQueues

  version              print the version

  jar <jar>            run a jar file

  distcp <srcurl> <desturl> copy file or directories recursively

  archive -archiveName NAME -p <parent path> <src>* <dest> create a hadoop archive

  classpath            prints the class path needed to get the

                       Hadoop jar and the required libraries

  daemonlog            get/set the log level for each daemon

 or

  CLASSNAME            run the class named CLASSNAME

Most commands print help when invoked w/o parameters.

 

 

5.      设定 *-site.xml(所有节点相同)

接下来的设定档共有3個 core-site.xml, hdfs-site.xml, mapred-site.xml

[root@localhost hadoop]# vim conf/core-site.xml

<?xml version="1.0"?>

<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

 

<!-- Put site-specific property overrides in this file. -->

 

<configuration>

<property>

        <name>fs.default.name</name>

        <value>hdfs://master:9000</value>

</property>

<property>

        <name>hadoop.tmp.dir</name>

        <value>/data/hadoop/hadoop_home/var</value>

</property>

</configuration>

 

 

 

[root@localhost hadoop]# vim conf/hdfs-site.xml

<?xml version="1.0"?>

<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

 

<!-- Put site-specific property overrides in this file. -->

 

<configuration>

<property>

        <name>dfs.name.dir</name>

        <value>/data/hadoop/name1, /data/hadoop/name2</value>

        <description> </description>

</property>

<property>

        <name>dfs.data.dir</name>

        <value>/data/hadoop/data1, /data/hadoop/data2</value>

</property>

<property>

        <name>dfs.replication</name>

        <value>2</value>

</property>

</configuration>

 

 

 

[root@localhost hadoop]# vim conf/mapred-site.xml

<?xml version="1.0"?>

<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

 

<!-- Put site-specific property overrides in this file. -->

 

<configuration>

<property>

        <name>mapred.job.tracker</name>

        <value>master:9001</value>

</property>

<property>

        <name>mapred.local.dir</name>

        <value>/data/hadoop/hadoop_home/var</value>

</property>

</configuration>

 

 

6.      设定 主从节点(所有节点相同)

配置conf/masters和conf/slaves来设置主从结点,注意最好使用主机名,并且保证机器之间通过主机名可以互相访问,每个主机名一行

[root@localhost hadoop]# vim conf/masters

master

 

[root@localhost hadoop]# vim conf/slaves

slave1

slave2

 

7.      格式化HDFS

以上我們已经设定好 Hadoop 单机测试的环境,接著让我们来启动 Hadoop 相关服务,格式化 namenode, secondarynamenode, tasktracker

[root@localhost hadoop]# bin/hadoop namenode -format

12/04/17 11:21:55 INFO namenode.NameNode: STARTUP_MSG:

/************************************************************

STARTUP_MSG: Starting NameNode

STARTUP_MSG:   host = localhost.localdomain/127.0.0.1

STARTUP_MSG:   args = [-format]

STARTUP_MSG:   version = 1.0.2

STARTUP_MSG:   build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.0.2 -r 1304954; compiled by 'hortonfo' on Sat Mar 24 23:58:21 UTC 2012

************************************************************/

12/04/17 11:21:55 INFO util.GSet: VM type       = 64-bit

12/04/17 11:21:55 INFO util.GSet: 2% max memory = 17.77875 MB

12/04/17 11:21:55 INFO util.GSet: capacity      = 2^21 = 2097152 entries

12/04/17 11:21:55 INFO util.GSet: recommended=2097152, actual=2097152

12/04/17 11:21:56 INFO namenode.FSNamesystem: fsOwner=root

12/04/17 11:21:56 INFO namenode.FSNamesystem: supergroup=supergroup

12/04/17 11:21:56 INFO namenode.FSNamesystem: isPermissionEnabled=true

12/04/17 11:21:56 INFO namenode.FSNamesystem: dfs.block.invalidate.limit=100

12/04/17 11:21:56 INFO namenode.FSNamesystem: isAccessTokenEnabled=false accessKeyUpdateInterval=0 min(s), accessTokenLifetime=0 min(s)

12/04/17 11:21:56 INFO namenode.NameNode: Caching file names occuring more than 10 times

12/04/17 11:21:56 INFO common.Storage: Image file of size 110 saved in 0 seconds.

12/04/17 11:21:56 INFO common.Storage: Storage directory /tmp/hadoop-root/dfs/name has been successfully formatted.

12/04/17 11:21:56 INFO namenode.NameNode: SHUTDOWN_MSG:

/************************************************************

SHUTDOWN_MSG: Shutting down NameNode at localhost.localdomain/127.0.0.1

************************************************************/

 

8.      启动hadoop

接着用 start-all.sh 来启动所有服务,包含 namenode, datanode,

[root@localhost hadoop]# bin/start-all.sh

starting namenode, logging to /usr/local/webserver/hadoop/libexec/../logs/hadoop-root-namenode-localhost.localdomain.out

localhost: starting datanode, logging to /usr/local/webserver/hadoop/libexec/../logs/hadoop-root-datanode-localhost.localdomain.out

localhost: starting secondarynamenode, logging to /usr/local/webserver/hadoop/libexec/../logs/hadoop-root-secondarynamenode-localhost.localdomain.out

starting jobtracker, logging to /usr/local/webserver/hadoop/libexec/../logs/hadoop-root-jobtracker-localhost.localdomain.out

localhost: starting tasktracker, logging to /usr/local/webserver/hadoop/libexec/../logs/hadoop-root-tasktracker-localhost.localdomain.out

 

完成!检查运作状态

启动之后,可以检查以下网址,来观看服务是否正常。Hadoop 管理

     http://localhost:50030/ - Hadoop 管理介面

     http://localhost:50060/ - Hadoop Task Tracker 状态

          http://localhost:50070/ - Hadoop DFS 状态

 

9.      测试wordcount

[root@localhost hadoop]# mkdir input

[root@localhost hadoop]# echo "hello world" >> input/a.txt

[root@localhost hadoop]# echo "hello hadoop" >> input/b.txt

 

 

将本地数据复制到HDFS中:

[root@localhost hadoop]# bin/hadoop fs -put input in

 

 

执行测试任务:

[root hadoop]# bin/hadoop jar hadoop-examples-1.0.2.jar wordcount in out

****hdfs://localhost:9000/user/root/in

12/04/16 20:37:27 INFO input.FileInputFormat: Total input paths to process : 2

12/04/16 20:37:27 INFO util.NativeCodeLoader: Loaded the native-hadoop library

12/04/16 20:37:27 WARN snappy.LoadSnappy: Snappy native library not loaded

12/04/16 20:37:27 INFO mapred.JobClient: Running job: job_201204161711_0008

12/04/16 20:37:28 INFO mapred.JobClient:  map 0% reduce 0%

12/04/16 20:37:42 INFO mapred.JobClient:  map 100% reduce 0%

12/04/16 20:37:54 INFO mapred.JobClient:  map 100% reduce 100%

12/04/16 20:37:59 INFO mapred.JobClient: Job complete: job_201204161711_0008

12/04/16 20:37:59 INFO mapred.JobClient: Counters: 29

12/04/16 20:37:59 INFO mapred.JobClient:   Job Counters

12/04/16 20:37:59 INFO mapred.JobClient:     Launched reduce tasks=1

12/04/16 20:37:59 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=19236

12/04/16 20:37:59 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=0

12/04/16 20:37:59 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=0

12/04/16 20:37:59 INFO mapred.JobClient:     Launched map tasks=2

12/04/16 20:37:59 INFO mapred.JobClient:     Data-local map tasks=2

12/04/16 20:37:59 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=11028

12/04/16 20:37:59 INFO mapred.JobClient:   File Output Format Counters

12/04/16 20:37:59 INFO mapred.JobClient:     Bytes Written=25

12/04/16 20:37:59 INFO mapred.JobClient:   FileSystemCounters

12/04/16 20:37:59 INFO mapred.JobClient:     FILE_BYTES_READ=55

12/04/16 20:37:59 INFO mapred.JobClient:     HDFS_BYTES_READ=235

12/04/16 20:37:59 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=64792

12/04/16 20:37:59 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=25

12/04/16 20:37:59 INFO mapred.JobClient:   File Input Format Counters

12/04/16 20:37:59 INFO mapred.JobClient:     Bytes Read=25

12/04/16 20:37:59 INFO mapred.JobClient:   Map-Reduce Framework

12/04/16 20:37:59 INFO mapred.JobClient:     Map output materialized bytes=61

12/04/16 20:37:59 INFO mapred.JobClient:     Map input records=2

12/04/16 20:37:59 INFO mapred.JobClient:     Reduce shuffle bytes=31

12/04/16 20:37:59 INFO mapred.JobClient:     Spilled Records=8

12/04/16 20:37:59 INFO mapred.JobClient:     Map output bytes=41

12/04/16 20:37:59 INFO mapred.JobClient:     CPU time spent (ms)=2900

12/04/16 20:37:59 INFO mapred.JobClient:     Total committed heap usage (bytes)=602996736

12/04/16 20:37:59 INFO mapred.JobClient:     Combine input records=4

12/04/16 20:37:59 INFO mapred.JobClient:     SPLIT_RAW_BYTES=210

12/04/16 20:37:59 INFO mapred.JobClient:     Reduce input records=4

12/04/16 20:37:59 INFO mapred.JobClient:     Reduce input groups=3

12/04/16 20:37:59 INFO mapred.JobClient:     Combine output records=4

12/04/16 20:37:59 INFO mapred.JobClient:     Physical memory (bytes) snapshot=499847168

12/04/16 20:37:59 INFO mapred.JobClient:     Reduce output records=3

12/04/16 20:37:59 INFO mapred.JobClient:     Virtual memory (bytes) snapshot=1718542336

12/04/16 20:37:59 INFO mapred.JobClient:     Map output records=4

 

 

 

查看结果:

[root@localhost hadoop]# bin/hadoop fs -cat out/*

hadoop  1

hello   2

world   1

cat: File does not exist: /user/root/out/_logs

 

 

 

参考博文:實作七: Hadoop 叢集安裝

              Hadoop集群配置(最全面总结)

              官网Hadoop集群搭建

posted @ 2012-07-02 20:02  toroy  阅读(233)  评论(0)    收藏  举报