Hadoop-1.2.1集群虚拟机搭建

VM虚拟机配置：

NAT网络配置参考：

http://www.cnblogs.com/gongice/p/4337379.html

安装Hadoop前的装备（在每一台主机上）：

配置sudo（可选）：

[root@hadoop01 hadoop]# chmod u+w /etc/sudoers
[root@hadoop01 hadoop]# vi /etc/sudoers

添加一行数据：hadoop ALL=(ALL) NOPASSWD: ALL，hadoop为sudo免密码用户

主机名设置：

[root@hadoop03 hadoop]# vi /etc/hosts
127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6
192.168.153.101 hadoop01
192.168.153.102 hadoop02
192.168.153.103 hadoop03

关闭iptable：

[root@hadoop01 hadoop]# sudo chkconfig iptables off
[root@hadoop01 hadoop]# sudo /etc/init.d/iptables stop
iptables: Flushing firewall rules:                         [  OK  ]
iptables: Setting chains to policy ACCEPT: filter          [  OK  ]
iptables: Unloading modules:                               [  OK  ]

关闭SELinux：

[root@hadoop02 install]# sudo vi /etc/selinux/config

# This file controls the state of SELinux on the system.
# SELINUX= can take one of these three values:
#     enforcing - SELinux security policy is enforced.
#     permissive - SELinux prints warnings instead of enforcing.
#     disabled - No SELinux policy is loaded.
SELINUX=disabled
# SELINUXTYPE= can take one of these two values:
#     targeted - Targeted processes are protected,
#     mls - Multi Level Security protection.
SELINUXTYPE=targeted

NTP服务配置：

参考：

http://www.cnblogs.com/gongice/p/4338204.html

配置免登录：

在每一台主机生成公钥和私钥：

[hadoop@hadoop01 ~]$ mkdir ~/.ssh 
[hadoop@hadoop01 ~]$ chmod 700 ~/.ssh 
[hadoop@hadoop01 ~]$ ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
 
[hadoop@hadoop02 ~]$ mkdir ~/.ssh 
[hadoop@hadoop02 ~]$ chmod 700 ~/.ssh 
[hadoop@hadoop02~]$ ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
 
[hadoop@hadoop03 ~]$ mkdir ~/.ssh 
[hadoop@hadoop03 ~]$ chmod 700 ~/.ssh 
[hadoop@hadoop03 ~]$ ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa

将slave主机的 id_dsa.pub复制到master主机上：

[hadoop@hadoop02 .ssh]$ scp id_dsa.pub hadoop@hadoop01:/home/hadoop/.ssh/id_dsa.pub.hadoop02
[hadoop@hadoop03 .ssh]$ scp id_dsa.pub hadoop@hadoop01:/home/hadoop/.ssh/id_dsa.pub.hadoop03

Masters上将所有id_dsa.pub集中：

[hadoop@hadoop01 .ssh]$  cat id_dsa.pub >> authorized_keys
[hadoop@hadoop01 .ssh]$  cat id_dsa.pub.hadoop02 >> authorized_keys
[hadoop@hadoop01 .ssh]$  cat id_dsa.pub.hadoop03 >> authorized_keys

将Master主机上的authorized_keys分发到各slave主机上：

[hadoop@hadoop01 .ssh]$ scp authorized_keys hadoop@hadoop02:/home/hadoop/.ssh/authorized_keys
[hadoop@hadoop01 .ssh]$ scp authorized_keys hadoop@hadoop03:/home/hadoop/.ssh/authorized_keys

安装JDK：

sudo tar -xvf jdk-7u55-linux-x64.gz
sudo chown -R root:root jdk1.7.0_55

安装成功后会生成jdk1.7.0_55，确认该目录权限为drwxr-xr-x (755)

vi /etc/profile
export JAVA_HOME=/usr/lib/java/jdk1.7.0_55
export PATH=$JAVA_HOME/bin:$PATH

[hadoop@hadoop01 java]$ source /etc/profile
[hadoop@hadoop01 java]$ java -version
java version "1.7.0_55"
Java(TM) SE Runtime Environment (build 1.7.0_55-b13)
Java HotSpot(TM) 64-Bit Server VM (build 24.55-b03, mixed mode)

Hadoop安装：

解压hadoop-1.2.1.tar.gz:

[hadoop@hadoop01 hadoop]$ tar -xvf hadoop-1.2.1.tar.gz

将整个hadoop目录授权给hadoop用户

[hadoop@hadoop01 local]$ sudo chown -R hadoop:hadoop ./hadoop/

修改配置文件：

修改hadoop-env.sh 文件

添加：

# The java implementation to use.  Required.
export JAVA_HOME=/usr/lib/java/jdk1.7.0_55

修改core-site.xml：

[hadoop@hadoop01 hadoop-1.2.1]$ vi ./conf/core-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!-- Put site-specific property overrides in this file. -->
<configuration>
  <property>
    <name>fs.default.name</name>
    <value>hdfs://hadoop01:9000</value>
  </property>
<property>
    <name>hadoop.tmp.dir</name>
    <value>/usr/local/hadoop/tmp</value>
  </property>
</configuration>

修改hdfs-site.xml文件：

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!-- Put site-specific property overrides in this file. -->
<configuration>
 <property>
    <name>dfs.replication</name>
    <value>3</value>
  </property>
<property>
    <name>dfs.permissions</name>
        <value>false</value>
          </property>
</configuration>

修改mapred-stie.xml文件：

[hadoop@hadoop01 hadoop-1.2.1]$ sudo vi ./conf/mapred-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!-- Put site-specific property overrides in this file. -->

<configuration>
<property>
    <name>mapred.job.tracker</name>
    <value>hadoop01:9001</value>
    <description>The name of the group of super-users.</description>
  </property>
</configuration>

修改masters和slaves文件：

[hadoop@hadoop01 hadoop-1.2.1]$ sudo vi ./conf/masters
hadoop01
 
[hadoop@hadoop01 hadoop-1.2.1]$  sudo vi ./conf/slaves 
hadoop01
hadoop02
hadoop03

向各节点复制hadoop：

[hadoop@hadoop01 hadoop]$ sudo scp -r hadoop-1.2.1 root@hadoop02:/usr/local/hadoop/
 
[hadoop@hadoop01 hadoop]$ sudo scp -r hadoop-1.2.1 root@hadoop03:/usr/local/hadoop/

格式化分布式文件系统

[hadoop@hadoop01 hadoop-1.2.1]$ sudo bin/hadoop namenode -format
15/03/15 15:56:19 INFO namenode.NameNode: STARTUP_MSG: 
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG:   host = hadoop01/192.168.153.101
STARTUP_MSG:   args = [-format]
STARTUP_MSG:   version = 1.2.1
STARTUP_MSG:   build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.2 -r 1503152; compiled by 'mattf' on Mon Jul 22 15:23:09 PDT 2013
STARTUP_MSG:   java = 1.7.0_55
************************************************************/
15/03/15 15:56:20 INFO util.GSet: Computing capacity for map BlocksMap
15/03/15 15:56:20 INFO util.GSet: VM type       = 64-bit
15/03/15 15:56:20 INFO util.GSet: 2.0% max memory = 1013645312
15/03/15 15:56:20 INFO util.GSet: capacity      = 2^21 = 2097152 entries
15/03/15 15:56:20 INFO util.GSet: recommended=2097152, actual=2097152
15/03/15 15:56:20 INFO namenode.FSNamesystem: fsOwner=root
15/03/15 15:56:20 INFO namenode.FSNamesystem: supergroup=supergroup
15/03/15 15:56:20 INFO namenode.FSNamesystem: isPermissionEnabled=true
15/03/15 15:56:20 INFO namenode.FSNamesystem: dfs.block.invalidate.limit=100
15/03/15 15:56:20 INFO namenode.FSNamesystem: isAccessTokenEnabled=false accessKeyUpdateInterval=0 min(s), accessTokenLifetime=0 min(s)
15/03/15 15:56:20 INFO namenode.FSEditLog: dfs.namenode.edits.toleration.length = 0
15/03/15 15:56:20 INFO namenode.NameNode: Caching file names occuring more than 10 times 
15/03/15 15:56:21 INFO common.Storage: Image file /usr/local/hadoop/tmp/dfs/name/current/fsimage of size 110 bytes saved in 0 seconds.
15/03/15 15:56:21 INFO namenode.FSEditLog: closing edit log: position=4, editlog=/usr/local/hadoop/tmp/dfs/name/current/edits
15/03/15 15:56:21 INFO namenode.FSEditLog: close success: truncate to 4, editlog=/usr/local/hadoop/tmp/dfs/name/current/edits
15/03/15 15:56:21 INFO common.Storage: Storage directory /usr/local/hadoop/tmp/dfs/name has been successfully formatted.
15/03/15 15:56:21 INFO namenode.NameNode: SHUTDOWN_MSG: 
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at hadoop01/192.168.153.101
************************************************************/

启动守护进程：

[hadoop@hadoop01 ~]$ start-all.sh 
starting namenode, logging to /usr/local/hadoop/hadoop-1.2.1/libexec/../logs/hadoop-hadoop-namenode-hadoop01.out
hadoop02: starting datanode, logging to /usr/local/hadoop/hadoop-1.2.1/libexec/../logs/hadoop-hadoop-datanode-hadoop02.out
hadoop03: starting datanode, logging to /usr/local/hadoop/hadoop-1.2.1/libexec/../logs/hadoop-hadoop-datanode-hadoop03.out
hadoop01: starting datanode, logging to /usr/local/hadoop/hadoop-1.2.1/libexec/../logs/hadoop-hadoop-datanode-hadoop01.out
hadoop01: starting secondarynamenode, logging to /usr/local/hadoop/hadoop-1.2.1/libexec/../logs/hadoop-hadoop-secondarynamenode-hadoop01.out
starting jobtracker, logging to /usr/local/hadoop/hadoop-1.2.1/libexec/../logs/hadoop-hadoop-jobtracker-hadoop01.out
hadoop02: starting tasktracker, logging to /usr/local/hadoop/hadoop-1.2.1/libexec/../logs/hadoop-hadoop-tasktracker-hadoop02.out
hadoop03: starting tasktracker, logging to /usr/local/hadoop/hadoop-1.2.1/libexec/../logs/hadoop-hadoop-tasktracker-hadoop03.out
hadoop01: starting tasktracker, logging to /usr/local/hadoop/hadoop-1.2.1/libexec/../logs/hadoop-hadoop-tasktracker-hadoop01.out

查看守护进程:

[hadoop@hadoop01 local]$ jps
6312 TaskTracker
7604 Jps
6105 SecondaryNameNode
5871 NameNode
6190 JobTracker
5991 DataNode

如果之前的hadoop-1.x安装目录没有给hadoop安装用户赋权，可能要赋权hdfs文件目录给hadoop用户

[hadoop@hadoop01 hadoop-1.2.1]$ sudo ./bin/hadoop fs -chown -R hadoop /userdir4hadoop

解决hadoop命令出现Warning: $HADOOP_HOME is deprecated.

Hadoop本身对HADOOP_HOME做了判断，具体在bin/hadoop和bin/hadoop-config.sh里。在hadoop-config.sh里有如下的配置：

if [ "$HADOOP_HOME_WARN_SUPPRESS" = "" ] && [ "$HADOOP_HOME" != "" ]; then 
  echo "Warning: \$HADOOP_HOME is deprecated." 1>&2  
  echo 1>&2  
fi

增加一个环境变量：

##  solve  Warning: $HADOOP_HOME is deprecated.
export HADOOP_HOME_WARN_SUPPRESS=1

在Eclipse测试自带的WordCount：

/**
 *  Licensed under the Apache License, Version 2.0 (the "License");
 *  you may not use this file except in compliance with the License.
 *  You may obtain a copy of the License at
 *
 *      http://www.apache.org/licenses/LICENSE-2.0
 *
 *  Unless required by applicable law or agreed to in writing, software
 *  distributed under the License is distributed on an "AS IS" BASIS,
 *  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 *  See the License for the specific language governing permissions and
 *  limitations under the License.
 */


package org.apache.hadoop.examples;

import java.io.IOException;
import java.util.StringTokenizer;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.GenericOptionsParser;

public class WordCount {

  public static class TokenizerMapper 
       extends Mapper<Object, Text, Text, IntWritable>{
    
    private final static IntWritable one = new IntWritable(1);
    private Text word = new Text();
      
    public void map(Object key, Text value, Context context
                    ) throws IOException, InterruptedException {
      StringTokenizer itr = new StringTokenizer(value.toString());
      while (itr.hasMoreTokens()) {
        word.set(itr.nextToken());
        context.write(word, one);
      }
    }
  }
  
  public static class IntSumReducer 
       extends Reducer<Text,IntWritable,Text,IntWritable> {
    private IntWritable result = new IntWritable();

    public void reduce(Text key, Iterable<IntWritable> values, 
                       Context context
                       ) throws IOException, InterruptedException {
      int sum = 0;
      for (IntWritable val : values) {
        sum += val.get();
      }
      result.set(sum);
      context.write(key, result);
    }
  }

  public static void main(String[] args) throws Exception {
    Configuration conf = new Configuration();
    String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs();
    if (otherArgs.length != 2) {
      System.err.println("Usage: wordcount <in> <out>");
      System.exit(2);
    }
    Job job = new Job(conf, "word count");
    job.setJarByClass(WordCount.class);
    job.setMapperClass(TokenizerMapper.class);
    job.setCombinerClass(IntSumReducer.class);
    job.setReducerClass(IntSumReducer.class);
    job.setOutputKeyClass(Text.class);
    job.setOutputValueClass(IntWritable.class);
    FileInputFormat.addInputPath(job, new Path(otherArgs[0]));
    FileOutputFormat.setOutputPath(job, new Path(otherArgs[1]));
    System.exit(job.waitForCompletion(true) ? 0 : 1);
  }
}

15/03/15 21:39:37 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
15/03/15 21:39:37 WARN mapred.JobClient: No job jar file set.  User classes may not be found. See JobConf(Class) or JobConf#setJar(String).
15/03/15 21:39:37 INFO input.FileInputFormat: Total input paths to process : 1
15/03/15 21:39:37 WARN snappy.LoadSnappy: Snappy native library not loaded
15/03/15 21:39:37 INFO mapred.JobClient: Running job: job_local1689089896_0001
15/03/15 21:39:37 INFO mapred.LocalJobRunner: Waiting for map tasks
15/03/15 21:39:37 INFO mapred.LocalJobRunner: Starting task: attempt_local1689089896_0001_m_000000_0
15/03/15 21:39:37 INFO mapred.Task:  Using ResourceCalculatorPlugin : null
15/03/15 21:39:37 INFO mapred.MapTask: Processing split: hdfs://192.168.153.101:9000/dir4userhadoop/test/in/hadoop-config.sh:0+2643
15/03/15 21:39:37 INFO mapred.MapTask: io.sort.mb = 100
15/03/15 21:39:37 INFO mapred.MapTask: data buffer = 79691776/99614720
15/03/15 21:39:37 INFO mapred.MapTask: record buffer = 262144/327680
15/03/15 21:39:37 INFO mapred.MapTask: Starting flush of map output
15/03/15 21:39:37 INFO mapred.MapTask: Finished spill 0
15/03/15 21:39:37 INFO mapred.Task: Task:attempt_local1689089896_0001_m_000000_0 is done. And is in the process of commiting
15/03/15 21:39:37 INFO mapred.LocalJobRunner: 
15/03/15 21:39:37 INFO mapred.Task: Task 'attempt_local1689089896_0001_m_000000_0' done.
15/03/15 21:39:37 INFO mapred.LocalJobRunner: Finishing task: attempt_local1689089896_0001_m_000000_0
15/03/15 21:39:37 INFO mapred.LocalJobRunner: Map task executor complete.
15/03/15 21:39:37 INFO mapred.Task:  Using ResourceCalculatorPlugin : null
15/03/15 21:39:37 INFO mapred.LocalJobRunner: 
15/03/15 21:39:37 INFO mapred.Merger: Merging 1 sorted segments
15/03/15 21:39:37 INFO mapred.Merger: Down to the last merge-pass, with 1 segments left of total size: 3182 bytes
15/03/15 21:39:37 INFO mapred.LocalJobRunner: 
15/03/15 21:39:37 INFO mapred.Task: Task:attempt_local1689089896_0001_r_000000_0 is done. And is in the process of commiting
15/03/15 21:39:37 INFO mapred.LocalJobRunner: 
15/03/15 21:39:37 INFO mapred.Task: Task attempt_local1689089896_0001_r_000000_0 is allowed to commit now
15/03/15 21:39:37 INFO output.FileOutputCommitter: Saved output of task 'attempt_local1689089896_0001_r_000000_0' to hdfs://192.168.153.101:9000/dir4userhadoop/test/out2
15/03/15 21:39:37 INFO mapred.LocalJobRunner: reduce > reduce
15/03/15 21:39:37 INFO mapred.Task: Task 'attempt_local1689089896_0001_r_000000_0' done.
15/03/15 21:39:38 INFO mapred.JobClient:  map 100% reduce 100%
15/03/15 21:39:38 INFO mapred.JobClient: Job complete: job_local1689089896_0001
15/03/15 21:39:38 INFO mapred.JobClient: Counters: 19
15/03/15 21:39:38 INFO mapred.JobClient:   File Output Format Counters 
15/03/15 21:39:38 INFO mapred.JobClient:     Bytes Written=2334
15/03/15 21:39:38 INFO mapred.JobClient:   File Input Format Counters 
15/03/15 21:39:38 INFO mapred.JobClient:     Bytes Read=2643
15/03/15 21:39:38 INFO mapred.JobClient:   FileSystemCounters
15/03/15 21:39:38 INFO mapred.JobClient:     FILE_BYTES_READ=3578
15/03/15 21:39:38 INFO mapred.JobClient:     HDFS_BYTES_READ=5286
15/03/15 21:39:38 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=143646
15/03/15 21:39:38 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=2334
15/03/15 21:39:38 INFO mapred.JobClient:   Map-Reduce Framework
15/03/15 21:39:38 INFO mapred.JobClient:     Map output materialized bytes=3186
15/03/15 21:39:38 INFO mapred.JobClient:     Map input records=86
15/03/15 21:39:38 INFO mapred.JobClient:     Reduce shuffle bytes=0
15/03/15 21:39:38 INFO mapred.JobClient:     Spilled Records=424
15/03/15 21:39:38 INFO mapred.JobClient:     Map output bytes=3993
15/03/15 21:39:38 INFO mapred.JobClient:     Total committed heap usage (bytes)=452984832
15/03/15 21:39:38 INFO mapred.JobClient:     Combine input records=366
15/03/15 21:39:38 INFO mapred.JobClient:     SPLIT_RAW_BYTES=132
15/03/15 21:39:38 INFO mapred.JobClient:     Reduce input records=212
15/03/15 21:39:38 INFO mapred.JobClient:     Reduce input groups=212
15/03/15 21:39:38 INFO mapred.JobClient:     Combine output records=212
15/03/15 21:39:38 INFO mapred.JobClient:     Reduce output records=212
15/03/15 21:39:38 INFO mapred.JobClient:     Map output records=366

posted on 2015-03-14 23:18 gongice 阅读(289) 评论(0) 编辑收藏举报

刷新页面返回顶部

gongice