How to handle missing blocks and blocks with corrupt replicas in HDFS?

ByEric Ma Mar 24, 2018Feb 20, 2020

One of HDFS cluster’s hdfs dfsadmin -report reports:

Under replicated blocks: 139016
Blocks with corrupt replicas: 9
Missing blocks: 0

The “Under replicated blocks” can be re-replicated automatically after some time.

How to handle the missing blocks and blocks with corrupt replicas in HDFS?

Understanding these blocks

A block is “with corrupt replicas” in HDFS if it has at least one corrupt replica along with at least one live replica. As such, a block having corrupt replicas does not indicate unavailable data, but they do indicate an increased chance that data may become unavailable.

If none of a block’s replicas are live, the block is called a missing block by HDFS, not a block with corrupt replicas.

Potential causes

Here are lists of potential causes and actions that you may take to handle the missing or corrupted blocks:

HDFS automatically fixes corrupt blocks in the background. A failure of this may indicate a problem with the underlying storage or filesystem of a DataNode. Use the HDFS fsck command to identify which files contain corrupt blocks.
Some DataNodes are down and the replicas that are missing blocks are only on those DataNodes.
The corrupt/missing blocks are from files with a replication factor of 1. New replicas cannot be created because the only replica of the block is missing.

Possible remedies

Some suggestion as by HDP

For critical data, use a replication factor of 3
Bring up the failed DataNodes with missing or corrupt blocks.
Identify the files associated with the missing or corrupt blocks by running the Hadoop fsck command.
Delete the corrupt files and recover them from backup, if it exists.

Reference: HDP doc and Cloudera doc.

Linux | Network | Programming | Software | Tutorial

Git through SSH Tunnel as Proxy

ByEric Ma May 21, 2014Aug 30, 2020

git is a great tool and it is common to have a git server over SSH possibly managed by gitolite. However, there are situations that we can not directly connect to the git server but be able to SSH to another node that can connect to the git server. The git server may allow only…

How to pipe the stderr to less

ByQ A Mar 24, 2018

For example, try to less the error messages generated by the compiler with make | less Use this command: make 2>&1 | less or, if you want stderr only: make 2>&1 >/dev/null | less Read more: How to judge whether its STDERR is redirected to a file in a Bash script on Linux? How to…

How to advertise different gateway ip via DHCP in OpenWRT?

ByEric Ma Mar 24, 2018Mar 24, 2018

How to advertise a different router/gateway ip via DHCP in OpenWRT? In general, you need to configure the DHCP option with code 3 (router). (A list of all options can be found in http://www.networksorcery.com/enp/protocol/bootp/options.htm ) For example, to advise the gateway IP 192.168.1.2, you will send this option: “3,192.168.1.2” Now, for OpenWRT, you have 2…

How to choose the key used by SSH for a specific host?

ByEric Ma Mar 24, 2018Nov 15, 2018

How to choose a key used rather than the default ~/.ssh/id_rsa when ssh to a remote server for a specific host? You have at least 2 choices for choosing the key used by ssh. Taking ~/.ssh/key1 and user@example.com as the example here. Method one, specify the key in command line with the -i option of…

How to make Vim indent C++11 lambdas correctly?

ByEric Ma Mar 24, 2018Nov 21, 2019

Vim seems not indent C++11 lambas very well. How to make Vim indent C++11 lambdas correctly? For this following program, Vim indents it as #include <iostream> #include <string> #include <vector> #include <algorithm> int main () { std::vector<std::string> strs({“one”, “two”}); std::vector<std::string> newstrs; std::transform(strs.begin(), strs.end(), std::back_inserter(newstrs), [](const std::string& s) -> std::string { if (s == “one”) {…

How to extract images from PDF on Linux?

ByEric Ma Mar 24, 2018Mar 24, 2018

I have a PDF file that has some images in it. How to extract the images out (not snapshot/screenshot of the page areas) from PDF on Linux? 2 tools that I usually use for extracting and saving images from PDF files: pdfedit and libreoffice. You can open the PDF file by the tools, right click…

2 Comments

Jim Jones says:

Feb 9, 2020 at 1:33 am

Hi. I believe there’s a mistake in the description of corrupt blocks: “A block is called corrupt by HDFS if it has at least one corrupt replica along.” I think that all replicas must be corrupted to be marked as corrupt rather than at least one.

JIRA HDFS-7281 explains the following:
1. A block is missing if and only if all DNs of its expected replicas are dead.
2. A block is corrupted if and only if all its available replicas are corrupted. So if a block has 3 replicas; one of the DN is dead, the other two replicas are corrupted; it will be marked as corrupted.

Reply
1. Eric Ma says:
  
  Feb 20, 2020 at 8:14 pm
  
  Hi Jim, yes, the description is not accurate. I fixed it to make it clearer – “A block is “with corrupt replicas” in HDFS if it has at least one corrupt replica along with at least one live replica.”
  
  It refers to the count from the HDFS report.
  
  Reply

Understanding these blocks

Potential causes

Possible remedies

Similar Posts

2 Comments

Leave a Reply Cancel reply