首先得有一台部署好了的BigInsights服务器
一、安装插件(注意Eclipse的版本,这里是Juno-4.2)
Eclipse -> Help -> Install New Software -> Add
填写插件名与插件下载地址,下载地址可以根据服务器引导获取
在浏览器中 http:// 加上服务器IP与8080端口号进入服务器,找到引导,如下图:
二、新建工程
Eclipse -> New -> BigInsights -> BigInsights Project
Eclipse -> New -> BigInsights -> Java MapReduce Program
假如创建了一个工程aaaaaa,想把WordCount程序放在这个项目中,就如下图:
1. Mapper类
后面四个选项表示输入与输出的<K, V>类型,WordCount程序通常是Text为输入,IntWritable为输出
2. Reducer类
3. Driver类
三、编码(注意包名)
1. Mapper
package znufe.wordcount; import java.io.IOException; import java.util.StringTokenizer; import org.apache.hadoop.io.Text; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.mapreduce.Mapper; public class WordMapper extends Mapper<Object, Text, Text, IntWritable> { private final static IntWritable one = new IntWritable(1); private Text word = new Text(); @Override public void map(Object key, Text value, Context context) throws IOException, InterruptedException { StringTokenizer itr = new StringTokenizer(value.toString()); while (itr.hasMoreTokens()) { word.set(itr.nextToken()); context.write(word, one); } } }
2. Reducer
package znufe.wordcount; import java.io.IOException; import org.apache.hadoop.io.Text; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.mapreduce.Reducer; public class WordReducer extends Reducer<Text, IntWritable, Text, IntWritable> { private IntWritable result = new IntWritable(); public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException { int sum = 0; for (IntWritable val : values) { sum += val.get(); } result.set(sum); context.write(key, result); } }
3. Driver
package znufe.wordcount; import org.apache.hadoop.io.Text; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.mapreduce.Job; import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; import org.apache.hadoop.util.GenericOptionsParser; public class WordMain { public static void main(String[] args) throws Exception { Configuration conf = new Configuration(); // Use programArgs array to retrieve program arguments. String[] programArgs = new GenericOptionsParser(conf, args) .getRemainingArgs(); /** * 这里必须有输入和输出 */ if (programArgs.length != 2) { System.out.println("Usage: wordcount <in> <out>"); System.exit(2); } Job job = new Job(conf, "word count"); job.setJarByClass(WordMain.class); //主类 job.setMapperClass(WordMapper.class); //Mapper job.setCombinerClass(WordReducer.class); //作业合成类 job.setReducerClass(WordReducer.class); //Reducer job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); // TODO: Update the input path for the location of the inputs of the map-reduce job. FileInputFormat.addInputPath(job, new Path(programArgs[0])); // TODO: Update the output path for the output directory of the map-reduce job. FileOutputFormat.setOutputPath(job, new Path(programArgs[1])); // Submit the job and wait for it to finish. //job.waitForCompletion(true); // Submit and return immediately: // job.submit(); System.exit(job.waitForCompletion(true) ? 0 : 1); //等待完成退出 } }
四、发布与部署
1. 右键工程名然后选择BigInsights Application Publish
2. 接着进入服务器中Applications分页,如图
3. 进入Manage子页并在My Applications文件夹中找到刚Publish的工程
4. 部署工程
我的工程是Test_BI,点击Deploy完成部署
五、上传测试文件并运行
1. 上传测试文件
进入Files分页,点击Upload按钮,上传完毕之后可以点击左边的Move按钮放到指定路径
2. 选择路径运行程序
进入到Applications分页,Run子页找到工程,选择输入与输出的路径点击Run按钮即可,注:输出路径必须是不存在的(程序会自行新建)
3. 查看结果
进入Files分页,根据之前填写的输出路径找到结果
PS:这个结果有点奇怪,代码貌似没问题,没找到是啥原因,不过创建工程到运行的流程是没问题的。