org.apache.hadoop.mapred
Class TaskTracker

java.lang.Object
  extended by org.apache.hadoop.mapred.TaskTracker
All Implemented Interfaces:
Runnable

public class TaskTracker
extends Object
implements Runnable

TaskTracker is a process that starts and tracks MR Tasks in a networked environment. It contacts the JobTracker for Task assignments and reporting results.


Nested Class Summary
static class TaskTracker.Child
          The main() for child processes.
static class TaskTracker.MapOutputServlet
          This class is used in TaskTracker's Jetty to serve the map outputs to other nodes.
 class TaskTracker.TaskTrackerMetrics
           
 
Field Summary
static int CLUSTER_INCREMENT
           
static long COUNTER_UPDATE_INTERVAL
           
static int FILE_NOT_FOUND
           
static int HEARTBEAT_INTERVAL_MIN
           
static org.apache.commons.logging.Log LOG
           
static String MAP_OUTPUT_LENGTH
          The custom http header used for the map output length.
static float MAX_INMEM_FILESIZE_FRACTION
          Constant denoting the max size (in terms of the fraction of the total size of the filesys) of a map output file that we will try to keep in mem.
static float MAX_INMEM_FILESYS_USE
          Constant denoting when a merge of in memory files will be triggered
static String RAW_MAP_OUTPUT_LENGTH
          The custom http header used for the "raw" map output length.
static int SUCCESS
           
static String TEMP_DIR_NAME
          Temporary directory name
static long versionID
          Changed the version to 2, since we have a new method getMapOutputs Changed version to 3 to have progress() return a boolean Changed the version to 4, since we have replaced TaskUmbilicalProtocol.progress(String, float, String, org.apache.hadoop.mapred.TaskStatus.Phase, Counters) with statusUpdate(String, TaskStatus) Version 5 changed counters representation for HADOOP-2248 Version 6 changes the TaskStatus representation for HADOOP-2208 Version 7 changes the done api (via HADOOP-3140).
static String WORKDIR
           
 
Constructor Summary
TaskTracker(JobConf conf)
          Start with the local machine name, and the default JobTracker
 
Method Summary
 void cleanupStorage()
          Removes all contents of temporary storage.
 void close()
          Close down the TaskTracker and all its components.
 void done(String taskid, boolean shouldPromote)
          Deprecated. 
 void done(TaskAttemptID taskid, boolean shouldPromote)
          The task is done.
 void fsError(String taskid, String msg)
          Deprecated. 
 void fsError(TaskAttemptID taskId, String message)
          A child task had a local filesystem error.
 org.apache.hadoop.mapred.InterTrackerProtocol getJobClient()
          The connection to the JobTracker, used by the TaskRunner for locating remote files.
 TaskCompletionEvent[] getMapCompletionEvents(JobID jobId, int fromEventId, int maxLocs)
          Called by a reduce task to get the map output locations for finished maps.
 TaskCompletionEvent[] getMapCompletionEvents(String jobid, int fromid, int maxlocs)
          Deprecated. 
 long getProtocolVersion(String protocol, long clientVersion)
          Return protocol version corresponding to protocol interface.
 org.apache.hadoop.mapred.Task getTask(String id)
          Deprecated. 
 org.apache.hadoop.mapred.Task getTask(TaskAttemptID taskid)
          Called upon startup by the child process, to fetch Task data.
 TaskTracker.TaskTrackerMetrics getTaskTrackerMetrics()
           
 InetSocketAddress getTaskTrackerReportAddress()
          Return the port at which the tasktracker bound to
 boolean isIdle()
          Is this task tracker idle?
static void main(String[] argv)
          Start the TaskTracker, point toward the indicated JobTracker
 void mapOutputLost(String taskid, String msg)
          Deprecated. 
 void mapOutputLost(TaskAttemptID taskid, String errorMsg)
          A completed map task's output has been lost.
 boolean ping(String taskid)
          Deprecated. 
 boolean ping(TaskAttemptID taskid)
          Child checking to see if we're alive.
 void reportDiagnosticInfo(String taskid, String info)
          Deprecated. 
 void reportDiagnosticInfo(TaskAttemptID taskid, String info)
          Called when the task dies before completion, and we want to report back diagnostic info
 void run()
          The server retry loop.
 void shuffleError(String taskid, String msg)
          Deprecated. 
 void shuffleError(TaskAttemptID taskId, String message)
          A reduce-task failed to shuffle the map-outputs.
 void shutdown()
           
 boolean statusUpdate(String taskid, org.apache.hadoop.mapred.TaskStatus status)
          Deprecated. 
 boolean statusUpdate(TaskAttemptID taskid, org.apache.hadoop.mapred.TaskStatus taskStatus)
          Called periodically to report Task progress, from 0.0 to 1.0.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

LOG

public static final org.apache.commons.logging.Log LOG

HEARTBEAT_INTERVAL_MIN

public static final int HEARTBEAT_INTERVAL_MIN
See Also:
Constant Field Values

CLUSTER_INCREMENT

public static final int CLUSTER_INCREMENT
See Also:
Constant Field Values

COUNTER_UPDATE_INTERVAL

public static final long COUNTER_UPDATE_INTERVAL
See Also:
Constant Field Values

MAX_INMEM_FILESYS_USE

public static final float MAX_INMEM_FILESYS_USE
Constant denoting when a merge of in memory files will be triggered

See Also:
Constant Field Values

MAX_INMEM_FILESIZE_FRACTION

public static final float MAX_INMEM_FILESIZE_FRACTION
Constant denoting the max size (in terms of the fraction of the total size of the filesys) of a map output file that we will try to keep in mem. Ideally, this should be a factor of MAX_INMEM_FILESYS_USE

See Also:
Constant Field Values

SUCCESS

public static final int SUCCESS
See Also:
Constant Field Values

FILE_NOT_FOUND

public static final int FILE_NOT_FOUND
See Also:
Constant Field Values

MAP_OUTPUT_LENGTH

public static final String MAP_OUTPUT_LENGTH
The custom http header used for the map output length.

See Also:
Constant Field Values

RAW_MAP_OUTPUT_LENGTH

public static final String RAW_MAP_OUTPUT_LENGTH
The custom http header used for the "raw" map output length.

See Also:
Constant Field Values

TEMP_DIR_NAME

public static final String TEMP_DIR_NAME
Temporary directory name

See Also:
Constant Field Values

WORKDIR

public static final String WORKDIR
See Also:
Constant Field Values

versionID

public static final long versionID
Changed the version to 2, since we have a new method getMapOutputs Changed version to 3 to have progress() return a boolean Changed the version to 4, since we have replaced TaskUmbilicalProtocol.progress(String, float, String, org.apache.hadoop.mapred.TaskStatus.Phase, Counters) with statusUpdate(String, TaskStatus) Version 5 changed counters representation for HADOOP-2248 Version 6 changes the TaskStatus representation for HADOOP-2208 Version 7 changes the done api (via HADOOP-3140). It now expects whether or not the task's output needs to be promoted. Version 8 changes {job|tip|task}id's to use their corresponding objects rather than strings. Version 9 changes the counter representation for HADOOP-1915

See Also:
Constant Field Values
Constructor Detail

TaskTracker

public TaskTracker(JobConf conf)
            throws IOException
Start with the local machine name, and the default JobTracker

Throws:
IOException
Method Detail

getTaskTrackerMetrics

public TaskTracker.TaskTrackerMetrics getTaskTrackerMetrics()

getProtocolVersion

public long getProtocolVersion(String protocol,
                               long clientVersion)
                        throws IOException
Description copied from interface: VersionedProtocol
Return protocol version corresponding to protocol interface.

Parameters:
protocol - The classname of the protocol interface
clientVersion - The version of the protocol that the client speaks
Returns:
the version that the server will speak
Throws:
IOException

cleanupStorage

public void cleanupStorage()
                    throws IOException
Removes all contents of temporary storage. Called upon startup, to remove any leftovers from previous run.

Throws:
IOException

shutdown

public void shutdown()
              throws IOException
Throws:
IOException

close

public void close()
           throws IOException
Close down the TaskTracker and all its components. We must also shutdown any running tasks or threads, and cleanup disk space. A new TaskTracker within the same process space might be restarted, so everything must be clean.

Throws:
IOException

getJobClient

public org.apache.hadoop.mapred.InterTrackerProtocol getJobClient()
The connection to the JobTracker, used by the TaskRunner for locating remote files.


getTaskTrackerReportAddress

public InetSocketAddress getTaskTrackerReportAddress()
Return the port at which the tasktracker bound to


run

public void run()
The server retry loop. This while-loop attempts to connect to the JobTracker. It only loops when the old TaskTracker has gone bad (its state is stale somehow) and we need to reinitialize everything.

Specified by:
run in interface Runnable

getTask

@Deprecated
public org.apache.hadoop.mapred.Task getTask(String id)
                                      throws IOException
Deprecated. 

Throws:
IOException

getTask

public org.apache.hadoop.mapred.Task getTask(TaskAttemptID taskid)
                                      throws IOException
Called upon startup by the child process, to fetch Task data.

Throws:
IOException

statusUpdate

@Deprecated
public boolean statusUpdate(String taskid,
                                       org.apache.hadoop.mapred.TaskStatus status)
                     throws IOException
Deprecated. 

Throws:
IOException

statusUpdate

public boolean statusUpdate(TaskAttemptID taskid,
                            org.apache.hadoop.mapred.TaskStatus taskStatus)
                     throws IOException
Called periodically to report Task progress, from 0.0 to 1.0.

Parameters:
taskid - task-id of the child
taskStatus - status of the child
Returns:
True if the task is known
Throws:
IOException

reportDiagnosticInfo

@Deprecated
public void reportDiagnosticInfo(String taskid,
                                            String info)
                          throws IOException
Deprecated. 

Throws:
IOException

reportDiagnosticInfo

public void reportDiagnosticInfo(TaskAttemptID taskid,
                                 String info)
                          throws IOException
Called when the task dies before completion, and we want to report back diagnostic info

Parameters:
taskid - the id of the task involved
info - the text to report
Throws:
IOException

ping

@Deprecated
public boolean ping(String taskid)
             throws IOException
Deprecated. 

Throws:
IOException

ping

public boolean ping(TaskAttemptID taskid)
             throws IOException
Child checking to see if we're alive. Normally does nothing.

Returns:
True if the task is known
Throws:
IOException

done

@Deprecated
public void done(String taskid,
                            boolean shouldPromote)
          throws IOException
Deprecated. 

Throws:
IOException

done

public void done(TaskAttemptID taskid,
                 boolean shouldPromote)
          throws IOException
The task is done.

Parameters:
taskid - task's id
shouldPromote - whether to promote the task's output or not
Throws:
IOException

shuffleError

@Deprecated
public void shuffleError(String taskid,
                                    String msg)
                  throws IOException
Deprecated. 

Throws:
IOException

shuffleError

public void shuffleError(TaskAttemptID taskId,
                         String message)
                  throws IOException
A reduce-task failed to shuffle the map-outputs. Kill the task.

Throws:
IOException

fsError

@Deprecated
public void fsError(String taskid,
                               String msg)
             throws IOException
Deprecated. 

Throws:
IOException

fsError

public void fsError(TaskAttemptID taskId,
                    String message)
             throws IOException
A child task had a local filesystem error. Kill the task.

Throws:
IOException

getMapCompletionEvents

@Deprecated
public TaskCompletionEvent[] getMapCompletionEvents(String jobid,
                                                               int fromid,
                                                               int maxlocs)
                                             throws IOException
Deprecated. 

Throws:
IOException

getMapCompletionEvents

public TaskCompletionEvent[] getMapCompletionEvents(JobID jobId,
                                                    int fromEventId,
                                                    int maxLocs)
                                             throws IOException
Called by a reduce task to get the map output locations for finished maps.

fromEventId - the index starting from which the locations should be fetched
maxLocs - the max number of locations to fetch
Returns:
an array of TaskCompletionEvent
Throws:
IOException

mapOutputLost

@Deprecated
public void mapOutputLost(String taskid,
                                     String msg)
                   throws IOException
Deprecated. 

Throws:
IOException

mapOutputLost

public void mapOutputLost(TaskAttemptID taskid,
                          String errorMsg)
                   throws IOException
A completed map task's output has been lost.

Throws:
IOException

isIdle

public boolean isIdle()
Is this task tracker idle?

Returns:
has this task tracker finished and cleaned up all of its tasks?

main

public static void main(String[] argv)
                 throws Exception
Start the TaskTracker, point toward the indicated JobTracker

Throws:
Exception


Copyright © 2008 The Apache Software Foundation