org.apache.hadoop.streaming
Class StreamXmlRecordReader
java.lang.Object
  
org.apache.hadoop.streaming.StreamBaseRecordReader
      
org.apache.hadoop.streaming.StreamXmlRecordReader
- All Implemented Interfaces: 
 - RecordReader<Text,Text>
 
public class StreamXmlRecordReader
- extends StreamBaseRecordReader
 
A way to interpret XML fragments as Mapper input records.
  Values are XML subtrees delimited by configurable tags.
  Keys could be the value of a certain attribute in the XML subtree, 
  but this is left to the stream processor application.
  The name-value properties that StreamXmlRecordReader understands are:
    String begin (chars marking beginning of record)
    String end   (chars marking end of record)
    int maxrec   (maximum record size)
    int lookahead(maximum lookahead to sync CDATA)
    boolean slowmatch
 
 
 
 
 
| Methods inherited from class java.lang.Object | 
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait | 
 
StreamXmlRecordReader
public StreamXmlRecordReader(FSDataInputStream in,
                             FileSplit split,
                             Reporter reporter,
                             JobConf job,
                             FileSystem fs)
                      throws IOException
- Throws:
 IOException
init
public void init()
          throws IOException
- Throws:
 IOException
 
next
public boolean next(Text key,
                    Text value)
             throws IOException
- Description copied from class: 
StreamBaseRecordReader 
- Read a record. Implementation should call numRecStats at the end
- Specified by:
 next in interface RecordReader<Text,Text>- Specified by:
 next in class StreamBaseRecordReader
 
- Parameters:
 key - the key to read data intovalue - the value to read data into
- Returns:
 - true iff a key/value was read, false if at EOF
 - Throws:
 IOException
 
 
seekNextRecordBoundary
public void seekNextRecordBoundary()
                            throws IOException
- Description copied from class: 
StreamBaseRecordReader 
- Implementation should seek forward in_ to the first byte of the next record.
  The initial byte offset in the stream is arbitrary.
- Specified by:
 seekNextRecordBoundary in class StreamBaseRecordReader
 
- Throws:
 IOException
 
 
Copyright © 2008 The Apache Software Foundation