一、OutputFormat

OutputFormat描述的是MapReduce的輸出格式,它主要的任務是:

1.驗證job輸出格式的有效性,如:檢查輸出的目錄是否存在。

2.通過實現RecordWriter,將輸出的結果寫到檔案系統的檔案中。

OutputFormat的主要是由三個抽象方法組成,下面根據原始碼介紹每個方法的功能,原始碼詳解如下:

 public abstract class OutputFormat<K, V> {

   /**
* Get the {@link RecordWriter} for the given task.
* 得到給定任務的K-V對,即RecordWriter。
* @param context the information about the current task.
* @return a {@link RecordWriter} to write the output for the job.
* @throws IOException
*/
public abstract RecordWriter<K, V> getRecordWriter(TaskAttemptContext context)
throws IOException, InterruptedException; /**
* Check for validity of the output-specification for the job.
* 為job檢查輸出格式的有效性。
* <p>This is to validate the output specification for the job when it is
* a job is submitted. Typically checks that it does not already exist,
* throwing an exception when it already exists, so that output is not
* overwritten.</p>
* 這裡,當job被提交時驗證輸出格式。實際上檢查輸出目錄是否已經存在,當存在時丟擲exception。
* 以至於原來的輸出不會被覆蓋。
* @param context information about the job
* @throws IOException when output should not be attempted
*/
public abstract void checkOutputSpecs(JobContext context) throws IOException, InterruptedException; /**
* Get the output committer for this output format. This is responsible
* for ensuring the output is committed correctly.
* 獲得一個OutPutCommitter物件。這是用來確保輸出被正確的提交。
* @param context the task context
* @return an output committer
* @throws IOException
* @throws InterruptedException
*/
public abstract OutputCommitter getOutputCommitter(TaskAttemptContext context)
throws IOException, InterruptedException;
}