One of HDFS cluster’s
hdfs dfsadmin -report reports:
Under replicated blocks: 139016 Blocks with corrupt replicas: 9 Missing blocks: 0
The “Under replicated blocks” can be re-replicated automatically after some time.
How to handle the missing blocks and blocks with corrupt replicas in HDFS?
Understanding these blocks
A block is “with corrupt replicas” in HDFS if it has at least one corrupt replica along with at least one live replica. As such, a block having corrupt replicas does not indicate unavailable data, but they do indicate an increased chance that data may become unavailable.
If none of a block’s replicas are live, the block is called a missing block by HDFS, not a block with corrupt replicas.
Here are lists of potential causes and actions that you may take to handle the missing or corrupted blocks:
- HDFS automatically fixes corrupt blocks in the background. A failure of this may indicate a problem with the underlying storage or filesystem of a DataNode. Use the HDFS fsck command to identify which files contain corrupt blocks.
- Some DataNodes are down and the replicas that are missing blocks are only on those DataNodes.
- The corrupt/missing blocks are from files with a replication factor of 1. New replicas cannot be created because the only replica of the block is missing.
Some suggestion as by HDP
- For critical data, use a replication factor of 3
- Bring up the failed DataNodes with missing or corrupt blocks.
- Identify the files associated with the missing or corrupt blocks by running the Hadoop fsck command.
- Delete the corrupt files and recover them from backup, if it exists.