Hadoop-1.2.1集群虚拟机搭建
VM虚拟机配置:
NAT网络配置参考:
安装Hadoop前的装备(在每一台主机上):
配置sudo(可选):
[root@hadoop01 hadoop]# chmod u+w /etc/sudoers [root@hadoop01 hadoop]# vi /etc/sudoers
添加一行数据:hadoop ALL=(ALL) NOPASSWD: ALL,hadoop为sudo免密码用户
主机名设置:
[root@hadoop03 hadoop]# vi /etc/hosts 127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4 ::1 localhost localhost.localdomain localhost6 localhost6.localdomain6 192.168.153.101 hadoop01 192.168.153.102 hadoop02 192.168.153.103 hadoop03
关闭iptable:
[root@hadoop01 hadoop]# sudo chkconfig iptables off [root@hadoop01 hadoop]# sudo /etc/init.d/iptables stop iptables: Flushing firewall rules: [ OK ] iptables: Setting chains to policy ACCEPT: filter [ OK ] iptables: Unloading modules: [ OK ]
关闭SELinux:
[root@hadoop02 install]# sudo vi /etc/selinux/config # This file controls the state of SELinux on the system. # SELINUX= can take one of these three values: # enforcing - SELinux security policy is enforced. # permissive - SELinux prints warnings instead of enforcing. # disabled - No SELinux policy is loaded. SELINUX=disabled # SELINUXTYPE= can take one of these two values: # targeted - Targeted processes are protected, # mls - Multi Level Security protection. SELINUXTYPE=targeted
NTP服务配置:
参考:
配置免登录:
在每一台主机生成公钥和私钥:
[hadoop@hadoop01 ~]$ mkdir ~/.ssh [hadoop@hadoop01 ~]$ chmod 700 ~/.ssh [hadoop@hadoop01 ~]$ ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa [hadoop@hadoop02 ~]$ mkdir ~/.ssh [hadoop@hadoop02 ~]$ chmod 700 ~/.ssh [hadoop@hadoop02~]$ ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa [hadoop@hadoop03 ~]$ mkdir ~/.ssh [hadoop@hadoop03 ~]$ chmod 700 ~/.ssh [hadoop@hadoop03 ~]$ ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
将slave主机的 id_dsa.pub复制到master主机上:
[hadoop@hadoop02 .ssh]$ scp id_dsa.pub hadoop@hadoop01:/home/hadoop/.ssh/id_dsa.pub.hadoop02 [hadoop@hadoop03 .ssh]$ scp id_dsa.pub hadoop@hadoop01:/home/hadoop/.ssh/id_dsa.pub.hadoop03
Masters上将所有id_dsa.pub集中:
[hadoop@hadoop01 .ssh]$ cat id_dsa.pub >> authorized_keys [hadoop@hadoop01 .ssh]$ cat id_dsa.pub.hadoop02 >> authorized_keys [hadoop@hadoop01 .ssh]$ cat id_dsa.pub.hadoop03 >> authorized_keys
将Master主机上的authorized_keys分发到各slave主机上:
[hadoop@hadoop01 .ssh]$ scp authorized_keys hadoop@hadoop02:/home/hadoop/.ssh/authorized_keys [hadoop@hadoop01 .ssh]$ scp authorized_keys hadoop@hadoop03:/home/hadoop/.ssh/authorized_keys
安装JDK:
sudo tar -xvf jdk-7u55-linux-x64.gz
sudo chown -R root:root jdk1.7.0_55
安装成功后会生成jdk1.7.0_55,确认该目录权限为drwxr-xr-x (755)
vi /etc/profile export JAVA_HOME=/usr/lib/java/jdk1.7.0_55 export PATH=$JAVA_HOME/bin:$PATH
[hadoop@hadoop01 java]$ source /etc/profile [hadoop@hadoop01 java]$ java -version java version "1.7.0_55" Java(TM) SE Runtime Environment (build 1.7.0_55-b13) Java HotSpot(TM) 64-Bit Server VM (build 24.55-b03, mixed mode)
Hadoop安装:
解压hadoop-1.2.1.tar.gz:
[hadoop@hadoop01 hadoop]$ tar -xvf hadoop-1.2.1.tar.gz
将整个hadoop目录授权给hadoop用户
[hadoop@hadoop01 local]$ sudo chown -R hadoop:hadoop ./hadoop/
修改配置文件:
修改hadoop-env.sh 文件
添加:
# The java implementation to use. Required. export JAVA_HOME=/usr/lib/java/jdk1.7.0_55
修改core-site.xml:
[hadoop@hadoop01 hadoop-1.2.1]$ vi ./conf/core-site.xml <?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!-- Put site-specific property overrides in this file. --> <configuration> <property> <name>fs.default.name</name> <value>hdfs://hadoop01:9000</value> </property> <property> <name>hadoop.tmp.dir</name> <value>/usr/local/hadoop/tmp</value> </property> </configuration>
修改hdfs-site.xml文件:
<?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!-- Put site-specific property overrides in this file. --> <configuration> <property> <name>dfs.replication</name> <value>3</value> </property> <property> <name>dfs.permissions</name> <value>false</value> </property> </configuration>
修改mapred-stie.xml文件:
[hadoop@hadoop01 hadoop-1.2.1]$ sudo vi ./conf/mapred-site.xml <?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!-- Put site-specific property overrides in this file. --> <configuration> <property> <name>mapred.job.tracker</name> <value>hadoop01:9001</value> <description>The name of the group of super-users.</description> </property> </configuration>
修改masters和slaves文件:
[hadoop@hadoop01 hadoop-1.2.1]$ sudo vi ./conf/masters hadoop01 [hadoop@hadoop01 hadoop-1.2.1]$ sudo vi ./conf/slaves hadoop01 hadoop02 hadoop03
向各节点复制hadoop:
[hadoop@hadoop01 hadoop]$ sudo scp -r hadoop-1.2.1 root@hadoop02:/usr/local/hadoop/
[hadoop@hadoop01 hadoop]$ sudo scp -r hadoop-1.2.1 root@hadoop03:/usr/local/hadoop/
格式化分布式文件系统
[hadoop@hadoop01 hadoop-1.2.1]$ sudo bin/hadoop namenode -format 15/03/15 15:56:19 INFO namenode.NameNode: STARTUP_MSG: /************************************************************ STARTUP_MSG: Starting NameNode STARTUP_MSG: host = hadoop01/192.168.153.101 STARTUP_MSG: args = [-format] STARTUP_MSG: version = 1.2.1 STARTUP_MSG: build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.2 -r 1503152; compiled by 'mattf' on Mon Jul 22 15:23:09 PDT 2013 STARTUP_MSG: java = 1.7.0_55 ************************************************************/ 15/03/15 15:56:20 INFO util.GSet: Computing capacity for map BlocksMap 15/03/15 15:56:20 INFO util.GSet: VM type = 64-bit 15/03/15 15:56:20 INFO util.GSet: 2.0% max memory = 1013645312 15/03/15 15:56:20 INFO util.GSet: capacity = 2^21 = 2097152 entries 15/03/15 15:56:20 INFO util.GSet: recommended=2097152, actual=2097152 15/03/15 15:56:20 INFO namenode.FSNamesystem: fsOwner=root 15/03/15 15:56:20 INFO namenode.FSNamesystem: supergroup=supergroup 15/03/15 15:56:20 INFO namenode.FSNamesystem: isPermissionEnabled=true 15/03/15 15:56:20 INFO namenode.FSNamesystem: dfs.block.invalidate.limit=100 15/03/15 15:56:20 INFO namenode.FSNamesystem: isAccessTokenEnabled=false accessKeyUpdateInterval=0 min(s), accessTokenLifetime=0 min(s) 15/03/15 15:56:20 INFO namenode.FSEditLog: dfs.namenode.edits.toleration.length = 0 15/03/15 15:56:20 INFO namenode.NameNode: Caching file names occuring more than 10 times 15/03/15 15:56:21 INFO common.Storage: Image file /usr/local/hadoop/tmp/dfs/name/current/fsimage of size 110 bytes saved in 0 seconds. 15/03/15 15:56:21 INFO namenode.FSEditLog: closing edit log: position=4, editlog=/usr/local/hadoop/tmp/dfs/name/current/edits 15/03/15 15:56:21 INFO namenode.FSEditLog: close success: truncate to 4, editlog=/usr/local/hadoop/tmp/dfs/name/current/edits 15/03/15 15:56:21 INFO common.Storage: Storage directory /usr/local/hadoop/tmp/dfs/name has been successfully formatted. 15/03/15 15:56:21 INFO namenode.NameNode: SHUTDOWN_MSG: /************************************************************ SHUTDOWN_MSG: Shutting down NameNode at hadoop01/192.168.153.101 ************************************************************/
启动守护进程:
[hadoop@hadoop01 ~]$ start-all.sh starting namenode, logging to /usr/local/hadoop/hadoop-1.2.1/libexec/../logs/hadoop-hadoop-namenode-hadoop01.out hadoop02: starting datanode, logging to /usr/local/hadoop/hadoop-1.2.1/libexec/../logs/hadoop-hadoop-datanode-hadoop02.out hadoop03: starting datanode, logging to /usr/local/hadoop/hadoop-1.2.1/libexec/../logs/hadoop-hadoop-datanode-hadoop03.out hadoop01: starting datanode, logging to /usr/local/hadoop/hadoop-1.2.1/libexec/../logs/hadoop-hadoop-datanode-hadoop01.out hadoop01: starting secondarynamenode, logging to /usr/local/hadoop/hadoop-1.2.1/libexec/../logs/hadoop-hadoop-secondarynamenode-hadoop01.out starting jobtracker, logging to /usr/local/hadoop/hadoop-1.2.1/libexec/../logs/hadoop-hadoop-jobtracker-hadoop01.out hadoop02: starting tasktracker, logging to /usr/local/hadoop/hadoop-1.2.1/libexec/../logs/hadoop-hadoop-tasktracker-hadoop02.out hadoop03: starting tasktracker, logging to /usr/local/hadoop/hadoop-1.2.1/libexec/../logs/hadoop-hadoop-tasktracker-hadoop03.out hadoop01: starting tasktracker, logging to /usr/local/hadoop/hadoop-1.2.1/libexec/../logs/hadoop-hadoop-tasktracker-hadoop01.out
查看守护进程:
[hadoop@hadoop01 local]$ jps 6312 TaskTracker 7604 Jps 6105 SecondaryNameNode 5871 NameNode 6190 JobTracker 5991 DataNode
如果之前的hadoop-1.x安装目录没有给hadoop安装用户赋权,可能要赋权hdfs文件目录给hadoop用户
[hadoop@hadoop01 hadoop-1.2.1]$ sudo ./bin/hadoop fs -chown -R hadoop /userdir4hadoop
解决hadoop命令出现Warning: $HADOOP_HOME is deprecated.
Hadoop本身对HADOOP_HOME做了判断,具体在bin/hadoop和bin/hadoop-config.sh里。在hadoop-config.sh里有如下的配置:
if [ "$HADOOP_HOME_WARN_SUPPRESS" = "" ] && [ "$HADOOP_HOME" != "" ]; then echo "Warning: \$HADOOP_HOME is deprecated." 1>&2 echo 1>&2 fi
增加一个环境变量:
## solve Warning: $HADOOP_HOME is deprecated. export HADOOP_HOME_WARN_SUPPRESS=1
在Eclipse测试自带的WordCount:
/** * Licensed under the Apache License, Version 2.0 (the "License"); * you may not use this file except in compliance with the License. * You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ package org.apache.hadoop.examples; import java.io.IOException; import java.util.StringTokenizer; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Job; import org.apache.hadoop.mapreduce.Mapper; import org.apache.hadoop.mapreduce.Reducer; import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; import org.apache.hadoop.util.GenericOptionsParser; public class WordCount { public static class TokenizerMapper extends Mapper<Object, Text, Text, IntWritable>{ private final static IntWritable one = new IntWritable(1); private Text word = new Text(); public void map(Object key, Text value, Context context ) throws IOException, InterruptedException { StringTokenizer itr = new StringTokenizer(value.toString()); while (itr.hasMoreTokens()) { word.set(itr.nextToken()); context.write(word, one); } } } public static class IntSumReducer extends Reducer<Text,IntWritable,Text,IntWritable> { private IntWritable result = new IntWritable(); public void reduce(Text key, Iterable<IntWritable> values, Context context ) throws IOException, InterruptedException { int sum = 0; for (IntWritable val : values) { sum += val.get(); } result.set(sum); context.write(key, result); } } public static void main(String[] args) throws Exception { Configuration conf = new Configuration(); String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs(); if (otherArgs.length != 2) { System.err.println("Usage: wordcount <in> <out>"); System.exit(2); } Job job = new Job(conf, "word count"); job.setJarByClass(WordCount.class); job.setMapperClass(TokenizerMapper.class); job.setCombinerClass(IntSumReducer.class); job.setReducerClass(IntSumReducer.class); job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); FileInputFormat.addInputPath(job, new Path(otherArgs[0])); FileOutputFormat.setOutputPath(job, new Path(otherArgs[1])); System.exit(job.waitForCompletion(true) ? 0 : 1); } }
15/03/15 21:39:37 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 15/03/15 21:39:37 WARN mapred.JobClient: No job jar file set. User classes may not be found. See JobConf(Class) or JobConf#setJar(String). 15/03/15 21:39:37 INFO input.FileInputFormat: Total input paths to process : 1 15/03/15 21:39:37 WARN snappy.LoadSnappy: Snappy native library not loaded 15/03/15 21:39:37 INFO mapred.JobClient: Running job: job_local1689089896_0001 15/03/15 21:39:37 INFO mapred.LocalJobRunner: Waiting for map tasks 15/03/15 21:39:37 INFO mapred.LocalJobRunner: Starting task: attempt_local1689089896_0001_m_000000_0 15/03/15 21:39:37 INFO mapred.Task: Using ResourceCalculatorPlugin : null 15/03/15 21:39:37 INFO mapred.MapTask: Processing split: hdfs://192.168.153.101:9000/dir4userhadoop/test/in/hadoop-config.sh:0+2643 15/03/15 21:39:37 INFO mapred.MapTask: io.sort.mb = 100 15/03/15 21:39:37 INFO mapred.MapTask: data buffer = 79691776/99614720 15/03/15 21:39:37 INFO mapred.MapTask: record buffer = 262144/327680 15/03/15 21:39:37 INFO mapred.MapTask: Starting flush of map output 15/03/15 21:39:37 INFO mapred.MapTask: Finished spill 0 15/03/15 21:39:37 INFO mapred.Task: Task:attempt_local1689089896_0001_m_000000_0 is done. And is in the process of commiting 15/03/15 21:39:37 INFO mapred.LocalJobRunner: 15/03/15 21:39:37 INFO mapred.Task: Task 'attempt_local1689089896_0001_m_000000_0' done. 15/03/15 21:39:37 INFO mapred.LocalJobRunner: Finishing task: attempt_local1689089896_0001_m_000000_0 15/03/15 21:39:37 INFO mapred.LocalJobRunner: Map task executor complete. 15/03/15 21:39:37 INFO mapred.Task: Using ResourceCalculatorPlugin : null 15/03/15 21:39:37 INFO mapred.LocalJobRunner: 15/03/15 21:39:37 INFO mapred.Merger: Merging 1 sorted segments 15/03/15 21:39:37 INFO mapred.Merger: Down to the last merge-pass, with 1 segments left of total size: 3182 bytes 15/03/15 21:39:37 INFO mapred.LocalJobRunner: 15/03/15 21:39:37 INFO mapred.Task: Task:attempt_local1689089896_0001_r_000000_0 is done. And is in the process of commiting 15/03/15 21:39:37 INFO mapred.LocalJobRunner: 15/03/15 21:39:37 INFO mapred.Task: Task attempt_local1689089896_0001_r_000000_0 is allowed to commit now 15/03/15 21:39:37 INFO output.FileOutputCommitter: Saved output of task 'attempt_local1689089896_0001_r_000000_0' to hdfs://192.168.153.101:9000/dir4userhadoop/test/out2 15/03/15 21:39:37 INFO mapred.LocalJobRunner: reduce > reduce 15/03/15 21:39:37 INFO mapred.Task: Task 'attempt_local1689089896_0001_r_000000_0' done. 15/03/15 21:39:38 INFO mapred.JobClient: map 100% reduce 100% 15/03/15 21:39:38 INFO mapred.JobClient: Job complete: job_local1689089896_0001 15/03/15 21:39:38 INFO mapred.JobClient: Counters: 19 15/03/15 21:39:38 INFO mapred.JobClient: File Output Format Counters 15/03/15 21:39:38 INFO mapred.JobClient: Bytes Written=2334 15/03/15 21:39:38 INFO mapred.JobClient: File Input Format Counters 15/03/15 21:39:38 INFO mapred.JobClient: Bytes Read=2643 15/03/15 21:39:38 INFO mapred.JobClient: FileSystemCounters 15/03/15 21:39:38 INFO mapred.JobClient: FILE_BYTES_READ=3578 15/03/15 21:39:38 INFO mapred.JobClient: HDFS_BYTES_READ=5286 15/03/15 21:39:38 INFO mapred.JobClient: FILE_BYTES_WRITTEN=143646 15/03/15 21:39:38 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=2334 15/03/15 21:39:38 INFO mapred.JobClient: Map-Reduce Framework 15/03/15 21:39:38 INFO mapred.JobClient: Map output materialized bytes=3186 15/03/15 21:39:38 INFO mapred.JobClient: Map input records=86 15/03/15 21:39:38 INFO mapred.JobClient: Reduce shuffle bytes=0 15/03/15 21:39:38 INFO mapred.JobClient: Spilled Records=424 15/03/15 21:39:38 INFO mapred.JobClient: Map output bytes=3993 15/03/15 21:39:38 INFO mapred.JobClient: Total committed heap usage (bytes)=452984832 15/03/15 21:39:38 INFO mapred.JobClient: Combine input records=366 15/03/15 21:39:38 INFO mapred.JobClient: SPLIT_RAW_BYTES=132 15/03/15 21:39:38 INFO mapred.JobClient: Reduce input records=212 15/03/15 21:39:38 INFO mapred.JobClient: Reduce input groups=212 15/03/15 21:39:38 INFO mapred.JobClient: Combine output records=212 15/03/15 21:39:38 INFO mapred.JobClient: Reduce output records=212 15/03/15 21:39:38 INFO mapred.JobClient: Map output records=366