How to handle missing blocks and blocks with corrupt replicas in HDFS?

ByEric Ma Mar 24, 2018Feb 20, 2020

One of HDFS cluster’s hdfs dfsadmin -report reports:

Under replicated blocks: 139016
Blocks with corrupt replicas: 9
Missing blocks: 0

The “Under replicated blocks” can be re-replicated automatically after some time.

How to handle the missing blocks and blocks with corrupt replicas in HDFS?

Understanding these blocks

A block is “with corrupt replicas” in HDFS if it has at least one corrupt replica along with at least one live replica. As such, a block having corrupt replicas does not indicate unavailable data, but they do indicate an increased chance that data may become unavailable.

If none of a block’s replicas are live, the block is called a missing block by HDFS, not a block with corrupt replicas.

Potential causes

Here are lists of potential causes and actions that you may take to handle the missing or corrupted blocks:

HDFS automatically fixes corrupt blocks in the background. A failure of this may indicate a problem with the underlying storage or filesystem of a DataNode. Use the HDFS fsck command to identify which files contain corrupt blocks.
Some DataNodes are down and the replicas that are missing blocks are only on those DataNodes.
The corrupt/missing blocks are from files with a replication factor of 1. New replicas cannot be created because the only replica of the block is missing.

Possible remedies

Some suggestion as by HDP

For critical data, use a replication factor of 3
Bring up the failed DataNodes with missing or corrupt blocks.
Identify the files associated with the missing or corrupt blocks by running the Hadoop fsck command.
Delete the corrupt files and recover them from backup, if it exists.

Reference: HDP doc and Cloudera doc.

How to test the connections between Linux hosts/servers?

ByQ A Mar 24, 2018Mar 24, 2018

How to easily and quickly test the connection between two nodes on Linux? This should be specific to protocol and port. We can use nc (netcat) to test the connection between two servers. For example, to test whether TCP port 1048 can be used on the server (IP 10.0.3.48 as an example) side: On the server: $…

Linux

Make Grub2 Boot Older Kernel Version in Ubuntu 20.04

ByDavid Yang Jun 7, 2020Nov 1, 2020

In a Linux system, we may have multiple kernels installed. Usually, it is the latest kernel configured to be the default one the system boot loader will use during automatic boot if there is no manual kernel choosing. In many cases, such as there is no driver ready yet for some devices in newer kernels,…

Auto completion in Vim

ByQ A Mar 24, 2018

How to enable auto code completion in Vim, like in the IDE? Several plugins I use: snipMate: Plugin for using TextMate-style snippets in Vim omnicppcomplete: Plugin for C/C++ omnicompletion neocomplcache: Ultimate auto completion system for Vim Read more: How to enable Email address auto completion in Evolution? Profiling Vim to Find Out Which Plugin Makes…

Rsync with non-standard ssh ports

ByEric Ma Mar 24, 2018Mar 24, 2018

This problem appears when I try to rsync directories with hosts inside a cluster used NAT for forwarding ports to internal nodes. Hence, the ssh port for internal nodes are not the default 22. So, how to use rsync with the non-standard ssh ports? The -e options of rsync play the trick very well. For…

How to pull your git tree after creating it on remote server

ByEric Ma Mar 24, 2018Oct 20, 2020

Currently, I have created my branch dev-harry but I cannot pull it successfully as follows. harryxiyou@common_vm ~/forest/kvplus/kvplus $ git branch * dev-harry master rc harryxiyou@common_vm ~/forest/kvplus/kvplus $ git pull You asked me to pull without telling me which branch you want to merge with, and ‘branch.dev-harry.merge’ in your configuration file does not tell me, either….

Linux | Programming | Software | Tutorial

Handling Sparse Files on Linux

ByEric Ma Apr 5, 2018Nov 21, 2019

Sparse files are common in Linux/Unix and are also supported by Windows (e.g. NTFS) and macOSes (e.g. HFS+). Sparse files uses storage efficiently when the files have a lot of holes (contiguous ranges of bytes having the value of zero) by storing only metadata for the holes instead of using real disk blocks. They are…

2 Comments

Jim Jones says:

Feb 9, 2020 at 1:33 am

Hi. I believe there’s a mistake in the description of corrupt blocks: “A block is called corrupt by HDFS if it has at least one corrupt replica along.” I think that all replicas must be corrupted to be marked as corrupt rather than at least one.

JIRA HDFS-7281 explains the following:
1. A block is missing if and only if all DNs of its expected replicas are dead.
2. A block is corrupted if and only if all its available replicas are corrupted. So if a block has 3 replicas; one of the DN is dead, the other two replicas are corrupted; it will be marked as corrupted.

Reply
1. Eric Ma says:
  
  Feb 20, 2020 at 8:14 pm
  
  Hi Jim, yes, the description is not accurate. I fixed it to make it clearer – “A block is “with corrupt replicas” in HDFS if it has at least one corrupt replica along with at least one live replica.”
  
  It refers to the count from the HDFS report.
  
  Reply

Understanding these blocks

Potential causes

Possible remedies

Similar Posts

2 Comments

Leave a Reply Cancel reply