| 
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||
java.lang.Objectorg.apache.hadoop.mapred.FileOutputFormat<K,V>
public abstract class FileOutputFormat<K,V>
A base class for OutputFormat.
| Constructor Summary | |
|---|---|
FileOutputFormat()
 | 
|
| Method Summary | |
|---|---|
 void | 
checkOutputSpecs(FileSystem ignored,
                 JobConf job)
Check for validity of the output-specification for the job.  | 
static boolean | 
getCompressOutput(JobConf conf)
Is the job output compressed?  | 
static Class<? extends CompressionCodec> | 
getOutputCompressorClass(JobConf conf,
                         Class<? extends CompressionCodec> defaultValue)
Get the CompressionCodec for compressing the job outputs. | 
static Path | 
getOutputPath(JobConf conf)
Get the Path to the output directory for the map-reduce job. | 
abstract  RecordWriter<K,V> | 
getRecordWriter(FileSystem ignored,
                JobConf job,
                String name,
                Progressable progress)
Get the RecordWriter for the given job. | 
protected static Path | 
getTaskOutputPath(JobConf conf,
                  String name)
Helper function to create the task's temporary output directory and return the path to the task's output file.  | 
static Path | 
getWorkOutputPath(JobConf conf)
Get the Path to the task's temporary output directory 
  for the map-reduce job
  
 Tasks' Side-Effect Files | 
static void | 
setCompressOutput(JobConf conf,
                  boolean compress)
Set whether the output of the job is compressed.  | 
static void | 
setOutputCompressorClass(JobConf conf,
                         Class<? extends CompressionCodec> codecClass)
Set the CompressionCodec to be used to compress job outputs. | 
static void | 
setOutputPath(JobConf conf,
              Path outputDir)
Set the Path of the output directory for the map-reduce job. | 
| Methods inherited from class java.lang.Object | 
|---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait | 
| Constructor Detail | 
|---|
public FileOutputFormat()
| Method Detail | 
|---|
public static void setCompressOutput(JobConf conf,
                                     boolean compress)
conf - the JobConf to modifycompress - should the output of the job be compressed?public static boolean getCompressOutput(JobConf conf)
conf - the JobConf to look in
true if the job output should be compressed,
         false otherwise
public static void setOutputCompressorClass(JobConf conf,
                                            Class<? extends CompressionCodec> codecClass)
CompressionCodec to be used to compress job outputs.
conf - the JobConf to modifycodecClass - the CompressionCodec to be used to
                   compress the job outputs
public static Class<? extends CompressionCodec> getOutputCompressorClass(JobConf conf,
                                                                         Class<? extends CompressionCodec> defaultValue)
CompressionCodec for compressing the job outputs.
conf - the JobConf to look indefaultValue - the CompressionCodec to return if not set
CompressionCodec to be used to compress the 
         job outputs
IllegalArgumentException - if the class was specified, but not found
public abstract RecordWriter<K,V> getRecordWriter(FileSystem ignored,
                                                  JobConf job,
                                                  String name,
                                                  Progressable progress)
                                           throws IOException
OutputFormatRecordWriter for the given job.
getRecordWriter in interface OutputFormat<K,V>job - configuration for the job whose output is being written.name - the unique name for this part of the output.progress - mechanism for reporting progress while writing to file.
RecordWriter to write the output for the job.
IOException
public void checkOutputSpecs(FileSystem ignored,
                             JobConf job)
                      throws FileAlreadyExistsException,
                             InvalidJobConfException,
                             IOException
OutputFormatThis is to validate the output specification for the job when it is a job is submitted. Typically checks that it does not already exist, throwing an exception when it already exists, so that output is not overwritten.
checkOutputSpecs in interface OutputFormat<K,V>job - job configuration.
IOException - when output should not be attempted
FileAlreadyExistsException
InvalidJobConfException
public static void setOutputPath(JobConf conf,
                                 Path outputDir)
Path of the output directory for the map-reduce job.
conf - The configuration of the job.outputDir - the Path of the output directory for 
 the map-reduce job.public static Path getOutputPath(JobConf conf)
Path to the output directory for the map-reduce job.
Path to the output directory for the map-reduce job.getWorkOutputPath(JobConf)public static Path getWorkOutputPath(JobConf conf)
Path to the task's temporary output directory 
  for the map-reduce job
  
 Some applications need to create/write-to side-files, which differ from the actual job-outputs.
In such cases there could be issues with 2 instances of the same TIP (running simultaneously e.g. speculative tasks) trying to open/write-to the same file (path) on HDFS. Hence the application-writer will have to pick unique names per task-attempt (e.g. using the attemptid, say attempt_200709221812_0001_m_000000_0), not just per TIP.
To get around this the Map-Reduce framework helps the application-writer out by maintaining a special ${mapred.output.dir}/_temporary/_${taskid} sub-directory for each task-attempt on HDFS where the output of the task-attempt goes. On successful completion of the task-attempt the files in the ${mapred.output.dir}/_temporary/_${taskid} (only) are promoted to ${mapred.output.dir}. Of course, the framework discards the sub-directory of unsuccessful task-attempts. This is completely transparent to the application.
The application-writer can take advantage of this by creating any 
 side-files required in ${mapred.work.output.dir} during execution 
 of his reduce-task i.e. via getWorkOutputPath(JobConf), and the 
 framework will move them out similarly - thus she doesn't have to pick 
 unique paths per task-attempt.
Note: the value of ${mapred.work.output.dir} during 
 execution of a particular task-attempt is actually 
 ${mapred.output.dir}/_temporary/_{$taskid}, and this value is 
 set by the map-reduce framework. So, just create any side-files in the 
 path  returned by getWorkOutputPath(JobConf) from map/reduce 
 task to take advantage of this feature.
The entire discussion holds true for maps of jobs with reducer=NONE (i.e. 0 reduces) since output of the map, in that case, goes directly to HDFS.
Path to the task's temporary output directory 
 for the map-reduce job.
protected static Path getTaskOutputPath(JobConf conf,
                                        String name)
                                 throws IOException
conf - job-configurationname - temporary task-output filename
IOException
  | 
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||