1.配置hadoop环境变量,本文已hadoop版本2.5.2为例。
下载hadoop2.5.2后解压,配置环境变量如下(若不生效,需要重启)
将winutils.exe文件放到hadoop的bin目录下
hadoop2.x版本未发布winutils.exe,没有该文件会报如下错误:
2.安装eclipse插件
在hadoop1的较早版本中提供了该插件,hadoop2中未提供该插件,需要到github中自己下载。此处使用:hadoop-eclipse-plugin-2.5.2.jar。
将hadoop-eclipse-plugin-2.5.2.jar复制到eclipse的dropins目录下解压后重启eclipse:
3.配置hadoop插件
将Hadoop installation directory设置为hadoop的根目录
显示Hadoop连接配置窗口:Window--Show View--Other-MapReduce Tools,如下图所示
配置连接Hadoop
4.检查是否与服务器连接
能够显示hadoop服务器上的文件和目录即已连接
5.新建一个mapreduce项目
hadoop相关的jar文件会本自动引入到项目中
6.运行wordCount程序
hadoop2.5.2源码自带的WordCount程序所在目录如下:
hadoop-2.5.2-src\hadoop-mapreduce-project\hadoop-mapreduce-examples\src\main\java\org\apache\hadoop\examples\WordCount.java
(对代码的main方法稍作了修改)
package mapreduce; import java.io.IOException; import java.util.StringTokenizer; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.LongWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Job; import org.apache.hadoop.mapreduce.Mapper; import org.apache.hadoop.mapreduce.Reducer; import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; import org.apache.hadoop.mapreduce.lib.input.TextInputFormat; import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat; public class WordCount { public static class WordCountMap extends Mapper<LongWritable, Text, Text, IntWritable> { private final IntWritable one = new IntWritable(1); private Text word = new Text(); public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException { String line = value.toString(); StringTokenizer token = new StringTokenizer(line); while (token.hasMoreTokens()) { word.set(token.nextToken()); context.write(word, one); } } } public static class WordCountReduce extends Reducer<Text, IntWritable, Text, IntWritable> { public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException { int sum = 0; for (IntWritable val : values) { sum += val.get(); } context.write(key, new IntWritable(sum)); } } public static void main(String[] args) throws Exception { Configuration conf = new Configuration(); Job job = new Job(conf); job.setJarByClass(WordCount.class); job.setJobName("wordcount"); job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); job.setMapperClass(WordCountMap.class); job.setReducerClass(WordCountReduce.class); job.setInputFormatClass(TextInputFormat.class); job.setOutputFormatClass(TextOutputFormat.class); FileInputFormat.addInputPath(job, new Path("hdfs://192.168.107.167:9000/input/test")); FileOutputFormat.setOutputPath(job, new Path("hdfs://192.168.107.167:9000/output/test")); job.waitForCompletion(true); } }
使用eclipse在hadoop上运行上述代码,报如下错误
拷贝源码文件hadoop-2.5.2-src\hadoop-common-project\hadoop-common\src\main\java\org\apache\hadoop\io\nativeio\NativeIO.java
到项目的org.apache.hadoop.io.nativeio.NativeIO中,定位到570行,直接修改为return true。如下图所示:
修改后,程序运行后的结果如下:
后记
eclipse中无法编辑目录,切执行mapreduce程序时报如下错误
原因是文件系统权限设置了检查,可在hdfs-site.xml文件中添加以下配置取消检查
<property> <name>dfs.permissions</name> <value>false</value> </property>