| 
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||
java.lang.Objectorg.apache.hadoop.mapred.FileInputFormat<K,V>
public abstract class FileInputFormat<K,V>
A base class for file-based InputFormat.
 
 
FileInputFormat is the base class for all file-based 
 InputFormats. This provides a generic implementation of
 getSplits(JobConf, int).
 Subclasses of FileInputFormat can also override the 
 isSplitable(FileSystem, Path) method to ensure input-files are
 not split-up and are processed as a whole by Mappers.
| Field Summary | |
|---|---|
static org.apache.commons.logging.Log | 
LOG
 | 
| Constructor Summary | |
|---|---|
FileInputFormat()
 | 
|
| Method Summary | |
|---|---|
static void | 
addInputPath(JobConf conf,
             Path path)
Add a Path to the list of inputs for the map-reduce job. | 
static void | 
addInputPaths(JobConf conf,
              String commaSeparatedPaths)
Add the given comma separated paths to the list of inputs for the map-reduce job.  | 
protected  long | 
computeSplitSize(long goalSize,
                 long minSize,
                 long blockSize)
 | 
protected  int | 
getBlockIndex(BlockLocation[] blkLocations,
              long offset)
 | 
static PathFilter | 
getInputPathFilter(JobConf conf)
Get a PathFilter instance of the filter set for the input paths.  | 
static Path[] | 
getInputPaths(JobConf conf)
Get the list of input Paths for the map-reduce job. | 
abstract  RecordReader<K,V> | 
getRecordReader(InputSplit split,
                JobConf job,
                Reporter reporter)
Get the RecordReader for the given InputSplit. | 
 InputSplit[] | 
getSplits(JobConf job,
          int numSplits)
Splits files returned by listStatus(JobConf) when
 they're too big. | 
protected  boolean | 
isSplitable(FileSystem fs,
            Path filename)
Is the given filename splitable? Usually, true, but if the file is stream compressed, it will not be.  | 
protected  Path[] | 
listPaths(JobConf job)
Deprecated. Use listStatus(JobConf) instead. | 
protected  FileStatus[] | 
listStatus(JobConf job)
List input directories.  | 
static void | 
setInputPathFilter(JobConf conf,
                   Class<? extends PathFilter> filter)
Set a PathFilter to be applied to the input paths for the map-reduce job.  | 
static void | 
setInputPaths(JobConf conf,
              Path... inputPaths)
Set the array of Paths as the list of inputs
 for the map-reduce job. | 
static void | 
setInputPaths(JobConf conf,
              String commaSeparatedPaths)
Sets the given comma separated paths as the list of inputs for the map-reduce job.  | 
protected  void | 
setMinSplitSize(long minSplitSize)
 | 
 void | 
validateInput(JobConf job)
Deprecated.  | 
| Methods inherited from class java.lang.Object | 
|---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait | 
| Field Detail | 
|---|
public static final org.apache.commons.logging.Log LOG
| Constructor Detail | 
|---|
public FileInputFormat()
| Method Detail | 
|---|
protected void setMinSplitSize(long minSplitSize)
protected boolean isSplitable(FileSystem fs,
                              Path filename)
FileInputFormat implementations can override this and return
 false to ensure that individual input files are never split-up
 so that Mappers process entire files.
fs - the file system that the file is onfilename - the file name to check
public abstract RecordReader<K,V> getRecordReader(InputSplit split,
                                                  JobConf job,
                                                  Reporter reporter)
                                           throws IOException
InputFormatRecordReader for the given InputSplit.
 It is the responsibility of the RecordReader to respect
 record boundaries while processing the logical split to present a 
 record-oriented view to the individual task.
getRecordReader in interface InputFormat<K,V>split - the InputSplitjob - the job that this split belongs to
RecordReader
IOException
public static void setInputPathFilter(JobConf conf,
                                      Class<? extends PathFilter> filter)
filter - the PathFilter class use for filtering the input paths.public static PathFilter getInputPathFilter(JobConf conf)
protected FileStatus[] listStatus(JobConf job)
                           throws IOException
job - the job to list input paths for
IOException - if zero items.
@Deprecated
protected Path[] listPaths(JobConf job)
                    throws IOException
listStatus(JobConf) instead.
job - the job to list input paths for
IOException - if zero items.
@Deprecated
public void validateInput(JobConf job)
                   throws IOException
InputFormatThis method is used to validate the input directories when a job is 
 submitted so that the JobClient can fail early, with an useful 
 error message, in case of errors. For e.g. input directory does not exist.
 
validateInput in interface InputFormat<K,V>job - job configuration.
InvalidInputException - if the job does not have valid input
IOException
public InputSplit[] getSplits(JobConf job,
                              int numSplits)
                       throws IOException
listStatus(JobConf) when
 they're too big.
getSplits in interface InputFormat<K,V>job - job configuration.numSplits - the desired number of splits, a hint.
InputSplits for the job.
IOException
protected long computeSplitSize(long goalSize,
                                long minSize,
                                long blockSize)
protected int getBlockIndex(BlockLocation[] blkLocations,
                            long offset)
public static void setInputPaths(JobConf conf,
                                 String commaSeparatedPaths)
conf - Configuration of the jobcommaSeparatedPaths - Comma separated paths to be set as 
        the list of inputs for the map-reduce job.
public static void addInputPaths(JobConf conf,
                                 String commaSeparatedPaths)
conf - The configuration of the jobcommaSeparatedPaths - Comma separated paths to be added to
        the list of inputs for the map-reduce job.
public static void setInputPaths(JobConf conf,
                                 Path... inputPaths)
Paths as the list of inputs
 for the map-reduce job.
conf - Configuration of the job.inputPaths - the Paths of the input directories/files 
 for the map-reduce job.
public static void addInputPath(JobConf conf,
                                Path path)
Path to the list of inputs for the map-reduce job.
conf - The configuration of the jobpath - Path to be added to the list of inputs for 
            the map-reduce job.public static Path[] getInputPaths(JobConf conf)
Paths for the map-reduce job.
conf - The configuration of the job
Paths for the map-reduce job.
  | 
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||