在windows环境中用eclipse搭建hadoop开发环境
1. 整体环境和设置
1.1 hadoo1.0.4集群部署在4台VMWare虚拟机中,这四台虚拟机都是通过NAT模式连接主机
集群中/etc/hosts文件配置
#本机
127.0.0.1 localhost
#namenode,second namenode 192.168.1.11 master
#data node 192.168.1.12 slave1
#data node
192.168.1.13 slave2
#data node
192.168.1.14 slave3
把上述的配置加入到windows系统中/drives/C/Windows/system32/drivers/etc/hosts的文件中,就可以在windows中,就可以实现ping master通畅。
1.2 把linux 中配置好的hadoop复制一份到windwos系统中,在配置eclipse 开发环境的时候需要。并从网上下载一个hadoop的eclipse插件(我没有找到和我的版本的对应的插件包,用的是hadoop-eclipse-plugin-1.2.1.jar),将插件包放到你的eclipse安装目录的插件目录中(我的是D:\eclipseSoft\plugins\). 然后重启eclipse。
1.3 设置hadoop 安装目录
1.3.1 设置hadoop的安装目录. 打开windows-->preference -->> Hadoop Map/Reduce
直接将我们从linux下配置好的hadoop存放在windows上的副本的目录添加上便可以.
1.3.2 设置Map/Reduce Location
打开Windows-->Open Perspective-->Other,选择Map/Reduce,点击OK,在右下方看到有个Map/Reduce Locations的图标,如下图所示:
在空白区域点击右键,选择 new hadoop location
选择advance选项,将配置改成在配置文件中所写的一致。
1.3.3 用DFS Location进行HDFS操作
2. 用WordCount来测试环境
2.1 新建工程
选择Map/Reduce project,会自动从hadoop的安装目录中拉去相关jar包
2.2 编写测试(统计字符)
package wordcount; import java.io.IOException; import java.util.StringTokenizer; import org.apache.hadoop.conf.Configured; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.LongWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Job; import org.apache.hadoop.mapreduce.Mapper; import org.apache.hadoop.mapreduce.Reducer; import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; import org.apache.hadoop.mapreduce.lib.input.TextInputFormat; import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat; import org.apache.hadoop.util.Tool; import org.apache.hadoop.util.ToolRunner; public class WordCountNew extends Configured implements Tool { public static class Map extends Mapper<LongWritable, Text, Text, IntWritable>{ private final static IntWritable one = new IntWritable(1); private Text word = new Text(); public void map(LongWritable key, Text value, Context context){ String line = value.toString(); StringTokenizer tokenizer = new StringTokenizer(line); while( tokenizer.hasMoreTokens() ){ word.set(tokenizer.nextToken()); try { context.write(word, one); } catch (IOException | InterruptedException e) { // TODO Auto-generated catch block e.printStackTrace(); } } } } public static class Reduce extends Reducer<Text, IntWritable, Text, IntWritable>{ public void reduce (Text key, Iterable<IntWritable> values,Context context){ int sum = 0; for ( IntWritable val: values ){ sum += val.get(); } try { context.write(key, new IntWritable(sum)); } catch (IOException | InterruptedException e) { // TODO Auto-generated catch block e.printStackTrace(); } } } public int run(String[] arg0) throws Exception { Job job = new Job(getConf()); job.setJarByClass(WordCountNew.class); job.setJobName("wordcount"); job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); job.setMapperClass(Map.class); job.setReducerClass(Reduce.class); job.setInputFormatClass(TextInputFormat.class); job.setOutputFormatClass(TextOutputFormat.class); FileInputFormat.setInputPaths(job, new Path("input")); FileOutputFormat.setOutputPath(job, new Path("output")); boolean success = job.waitForCompletion(true); return success ? 0 : 1; } public static void main(String[] args) throws Exception { int ret = ToolRunner.run(new WordCountNew(), args); System.exit(ret); } }
2.1 权限错误
16/04/08 16:31:06 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 16/04/08 16:31:06 ERROR security.UserGroupInformation: PriviledgedActionException as:Amei cause:java.io.IOException: Failed to set permissions of path: \home\hadoop_admin\tmp\mapred\staging\Amei-1505897053\.staging to 0700 Exception in thread "main" java.io.IOException: Failed to set permissions of path: \home\hadoop_admin\tmp\mapred\staging\Amei-1505897053\.staging to 0700 at org.apache.hadoop.fs.FileUtil.checkReturnValue(FileUtil.java:689) at org.apache.hadoop.fs.FileUtil.setPermission(FileUtil.java:662) at org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:509) at org.apache.hadoop.fs.RawLocalFileSystem.mkdirs(RawLocalFileSystem.java:344) at org.apache.hadoop.fs.FilterFileSystem.mkdirs(FilterFileSystem.java:189) at org.apache.hadoop.mapreduce.JobSubmissionFiles.getStagingDir(JobSubmissionFiles.java:116) at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:856) at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:850) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Unknown Source) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121) at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:850) at org.apache.hadoop.mapreduce.Job.submit(Job.java:500) at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:530) at wordcount.WordCountNew.run(WordCountNew.java:73) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79) at wordcount.WordCountNew.main(WordCountNew.java:79)
解决方法:http://blog.csdn.net/mengfei86/article/details/8155544
2.2 路径错误
解决办法:
直接写输入文件的绝对路径(从根目录算起) /user/hadoop_admin/input.
也可以更改你的windows用户和linux的用户名一样就可以。
2.3 输出结果: