How to handle missing blocks and blocks with corrupt replicas in HDFS?

One of HDFS cluster's hdfs dfsadmin -report reports:

Under replicated blocks: 139016
Blocks with corrupt replicas: 9
Missing blocks: 0

The "Under replicated blocks" can be re-replicated automatically after some time.

How to handle the missing blocks and blocks with corrupt replicas in HDFS?

asked Apr 26, 2015 by Eric Z Ma (44,280 points)

1 Answer

Best answer

Understanding these blocks

A block is called corrupt by HDFS if it has at least one corrupt replica along with at least one live replica. As such, a corrupt block does not indicate unavailable data, but they do indicate an increased chance that data may become unavailable.

If none of a block's replicas are live, the block is called a missing block by HDFS, not a corrupt block.

Potential causes

Here are lists of potential causes and actions that you may take to handle the missing or corrupted blocks:

HDFS automatically fixes corrupt blocks in the background. A failure of this may indicate a problem with the underlying storage or filesystem of a DataNode. Use the HDFS fsck command to identify which files contain corrupt blocks.

Some DataNodes are down and the replicas that are missing blocks are only on those DataNodes

The corrupt/missing blocks are from files with a replication factor of 1. New replicas cannot be created because the only replica of the block is missing

Possible remedies

Some suggestion as by HDP

For critical data, use a replication factor of 3

Bring up the failed DataNodes with missing or corrupt blocks.

Identify the files associated with the missing or corrupt blocks by running the Hadoop fsck command

Delete the corrupt files and recover them from backup, if it exists

Reference: HDP doc and Cloudera doc.

answered Apr 26, 2015 by Eric Z Ma (44,280 points)
edited May 21, 2015 by Eric Z Ma

Please log in or register to answer this question.

Copyright © SysTutorials. User contributions licensed under cc-wiki with attribution required.
Hosted on Dreamhost