As nodes are added and deleted in a Hadoop cluster. Storage usage across DataNodes may be different. Some DataNodes’ disks are almost used up while some others’ are almost empty.
How to balance data across DataNodes in HDFS?
Hadoop provides the balancer to redistribute the data.
Brief introduction to balancer in Hadoop: balancer.
The design and discussion of balancer in Hadoop: HADOOP-1652.
The command to start balancer:
hadoop balancer as the administrator.