Hadoop中OutputFormat解析

一、OutputFormat

OutputFormat描述的是MapReduce的输出格式,它主要的任务是:

  1.验证job输出格式的有效性,如:检查输出的目录是否存在。

  2.通过实现RecordWriter,将输出的结果写到文件系统的文件中。

OutputFormat的主要是由三个抽象方法组成,下面根据源代码介绍每个方法的功能,源代码详解如下:

 1 public abstract class OutputFormat<K, V> {
 2 
 3   /** 
 4    * Get the {@link RecordWriter} for the given task. 
 5    *  得到给定任务的K-V对,即RecordWriter。
 6    * @param context the information about the current task.
 7    * @return a {@link RecordWriter} to write the output for the job.
 8    * @throws IOException
 9    */
10   public abstract RecordWriter<K, V> getRecordWriter(TaskAttemptContext context) 
11           throws IOException, InterruptedException;
12 
13   /** 
14    * Check for validity of the output-specification for the job.
15    * 为job检查输出格式的有效性。
16    * <p>This is to validate the output specification for the job when it is
17    * a job is submitted.  Typically checks that it does not already exist,
18    * throwing an exception when it already exists, so that output is not
19    * overwritten.</p>
20    * 这里,当job被提交时验证输出格式。实际上检查输出目录是否已经存在,当存在时抛出exception。
21    * 以至于原来的输出不会被覆盖。
22    * @param context information about the job
23    * @throws IOException when output should not be attempted
24    */
25   public abstract void checkOutputSpecs(JobContext context) throws IOException, InterruptedException;
26 
27   /**
28    * Get the output committer for this output format. This is responsible
29    * for ensuring the output is committed correctly.
30    * 获得一个OutPutCommitter对象。这是用来确保输出被正确的提交。
31    * @param context the task context
32    * @return an output committer
33    * @throws IOException
34    * @throws InterruptedException
35    */
36   public abstract OutputCommitter getOutputCommitter(TaskAttemptContext context)
37           throws IOException, InterruptedException;
38 }

 

 

 

 

posted on 2014-05-02 14:59  月下美妞1314  阅读(391)  评论(0编辑  收藏  举报