org.apache.hadoop.dfs
Class Balancer

java.lang.Object
  extended by org.apache.hadoop.dfs.Balancer
All Implemented Interfaces:
Configurable, Tool

public class Balancer
extends Object
implements Tool

The balancer is a tool that balances disk space usage on an HDFS cluster when some datanodes become full or when new empty nodes join the cluster. The tool is deployed as an application program that can be run by the cluster administrator on a live HDFS cluster while applications adding and deleting files.

SYNOPSIS

 To start:
      bin/start-balancer.sh [-threshold ]
      Example: bin/ start-balancer.sh 
                     start the balancer with a default threshold of 10%
               bin/ start-balancer.sh -threshold 5
                     start the balancer with a threshold of 5%
 To stop:
      bin/ stop-balancer.sh
 

DESCRIPTION

The threshold parameter is a fraction in the range of (0%, 100%) with a default value of 10%. The threshold sets a target for whether the cluster is balanced. A cluster is balanced if for each datanode, the utilization of the node (ratio of used space at the node to total capacity of the node) differs from the utilization of the (ratio of used space in the cluster to total capacity of the cluster) by no more than the threshold value. The smaller the threshold, the more balanced a cluster will become. It takes more time to run the balancer for small threshold values. Also for a very small threshold the cluster may not be able to reach the balanced state when applications write and delete files concurrently.

The tool moves blocks from highly utilized datanodes to poorly utilized datanodes iteratively. In each iteration a datanode moves or receives no more than the lesser of 10G bytes or the threshold fraction of its capacity. Each iteration runs no more than 20 minutes. At the end of each iteration, the balancer obtains updated datanodes information from the namenode.

A system property that limits the balancer's use of bandwidth is defined in the default configuration file:

 
   dfs.balance.bandwidthPerSec
   1048576
   Specifies the maximum bandwidth that each datanode 
 can utilize for the balancing purpose in term of the number of bytes 
 per second. 
 
 

This property determines the maximum speed at which a block will be moved from one datanode to another. The default value is 1MB/s. The higher the bandwidth, the faster a cluster can reach the balanced state, but with greater competition with application processes. If an administrator changes the value of this property in the configuration file, the change is observed when HDFS is next restarted.

MONITERING BALANCER PROGRESS

After the balancer is started, an output file name where the balancer progress will be recorded is printed on the screen. The administrator can monitor the running of the balancer by reading the output file. The output shows the balancer's status iteration by iteration. In each iteration it prints the starting time, the iteration number, the total number of bytes that have been moved in the previous iterations, the total number of bytes that are left to move in order for the cluster to be balanced, and the number of bytes that are being moved in this iteration. Normally "Bytes Already Moved" is increasing while "Bytes Left To Move" is decreasing.

Running multiple instances of the balancer in an HDFS cluster is prohibited by the tool.

The balancer automatically exits when any of the following five conditions is satisfied:

  1. The cluster is balanced;
  2. No block can be moved;
  3. No block has been moved for five consecutive iterations;
  4. An IOException occurs while communicating with the namenode;
  5. Another balancer is running.

Upon exit, a balancer returns an exit code and prints one of the following messages to the output file in corresponding to the above exit reasons:

  1. The cluster is balanced. Exiting
  2. No block can be moved. Exiting...
  3. No block has been moved for 3 iterations. Exiting...
  4. Received an IO exception: failure reason. Exiting...
  5. Another balancer is running. Exiting...

The administrator can interrupt the execution of the balancer at any time by running the command "stop-balancer.sh" on the machine where the balancer is running.


Field Summary
static int ALREADY_RUNNING
           
static int ILLEGAL_ARGS
           
static int IO_EXCEPTION
           
static int NO_MOVE_BLOCK
           
static int NO_MOVE_PROGRESS
           
static int SUCCESS
           
 
Method Summary
 Configuration getConf()
          return this balancer's configuration
static void main(String[] args)
          Run a balancer
 int run(String[] args)
          main method of Balancer
 void setConf(Configuration conf)
          set this balancer's configuration
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

SUCCESS

public static final int SUCCESS
See Also:
Constant Field Values

ALREADY_RUNNING

public static final int ALREADY_RUNNING
See Also:
Constant Field Values

NO_MOVE_BLOCK

public static final int NO_MOVE_BLOCK
See Also:
Constant Field Values

NO_MOVE_PROGRESS

public static final int NO_MOVE_PROGRESS
See Also:
Constant Field Values

IO_EXCEPTION

public static final int IO_EXCEPTION
See Also:
Constant Field Values

ILLEGAL_ARGS

public static final int ILLEGAL_ARGS
See Also:
Constant Field Values
Method Detail

main

public static void main(String[] args)
Run a balancer

Parameters:
args -

run

public int run(String[] args)
        throws Exception
main method of Balancer

Specified by:
run in interface Tool
Parameters:
args - arguments to a Balancer
Returns:
exit code.
Throws:
any - exception occurs during datanode balancing
Exception

getConf

public Configuration getConf()
return this balancer's configuration

Specified by:
getConf in interface Configurable

setConf

public void setConf(Configuration conf)
set this balancer's configuration

Specified by:
setConf in interface Configurable


Copyright © 2008 The Apache Software Foundation