How to handle missing blocks and blocks with corrupt replicas in HDFS?

ByEric Ma Mar 24, 2018Feb 20, 2020

One of HDFS cluster’s hdfs dfsadmin -report reports:

Under replicated blocks: 139016
Blocks with corrupt replicas: 9
Missing blocks: 0

The “Under replicated blocks” can be re-replicated automatically after some time.

How to handle the missing blocks and blocks with corrupt replicas in HDFS?

Understanding these blocks

A block is “with corrupt replicas” in HDFS if it has at least one corrupt replica along with at least one live replica. As such, a block having corrupt replicas does not indicate unavailable data, but they do indicate an increased chance that data may become unavailable.

If none of a block’s replicas are live, the block is called a missing block by HDFS, not a block with corrupt replicas.

Potential causes

Here are lists of potential causes and actions that you may take to handle the missing or corrupted blocks:

HDFS automatically fixes corrupt blocks in the background. A failure of this may indicate a problem with the underlying storage or filesystem of a DataNode. Use the HDFS fsck command to identify which files contain corrupt blocks.
Some DataNodes are down and the replicas that are missing blocks are only on those DataNodes.
The corrupt/missing blocks are from files with a replication factor of 1. New replicas cannot be created because the only replica of the block is missing.

Possible remedies

Some suggestion as by HDP

For critical data, use a replication factor of 3
Bring up the failed DataNodes with missing or corrupt blocks.
Identify the files associated with the missing or corrupt blocks by running the Hadoop fsck command.
Delete the corrupt files and recover them from backup, if it exists.

Reference: HDP doc and Cloudera doc.

How to redirect HTTP to HTTPS in Apache with mod_rewrite?

ByEric Ma Mar 24, 2018Mar 24, 2018

I’d like to force using of https on one of my site. How to redirect HTTP to HTTPS in Apache with mod_rewrite? You can put these lines to the .htaccess file in the directory from which you would like to redirect HTTPS to HTTP: RewriteEngine On RewriteCond %{HTTPS} off RewriteRule (.*) https://%{HTTP_HOST}%{REQUEST_URI} Read more: How…

Blockchain | Systems | Systems 101 | Tutorial

Understanding Bitcoin Transaction Structure: Legacy and Witness Versions

ByEric Ma Sep 14, 2024

Bitcoin transactions are the backbone of the Bitcoin network, facilitating the transfer of value. Understanding their structure is essential for anyone interested in blockchain technology. This article delves into the details of both legacy and witness (SegWit) transaction structures. Basics of Bitcoin Transactions A Bitcoin transaction comprises inputs and outputs. Inputs are sources of funds,…

How to parse POST data in node.js?

ByEric Ma Mar 24, 2018Mar 24, 2018

Some JavaScript libraries send data by POST instead of JSON. In node.js, how to parse the POST data? You can use the “querystring” module: var qs = require(‘querystring’) The following piece of code works well for me. The comments should help you read the code well. var post_handle = function(request, response) { if (request.method ==…

Systems 101

Consensus Algorithm 101

ByEthan Ainsworth Sep 16, 2023Sep 16, 2023

Consensus algorithms play a crucial role in the functioning of decentralized networks, such as blockchain-based systems. They help maintain the integrity, security, and reliability of these networks by ensuring that all participants agree on the state of the system. In this post, we will explore the concept of consensus algorithms, their importance, and some of…

How to make the rpcbind.service auto start in systemd on Linux?

ByEric Ma Mar 24, 2018Mar 24, 2018

The command systemctl enable rpcbind.service seems not work. The rpcbind.service does not start after rebooting. How to make/force the rpcbind.service auto start in systemd on Linux? You an force the rpcbind.service to start by making it be “wanted” by the target such as multi-user: # systemctl add-wants multi-user rpcbind.service This will force rpcbind.service to start….

Tutorial

Clearing Git History in Local and Remote Branches

ByDavid Yang Oct 17, 2020Dec 20, 2022

Git tracks changes as commits. This makes it possible and convenient to check the code change history in a repository. However, this also has side effect. The history consumes storage and the whole repository including the current version of the code files and all the previous changes may be quite large. This is especially a…

2 Comments

Jim Jones says:

Feb 9, 2020 at 1:33 am

Hi. I believe there’s a mistake in the description of corrupt blocks: “A block is called corrupt by HDFS if it has at least one corrupt replica along.” I think that all replicas must be corrupted to be marked as corrupt rather than at least one.

JIRA HDFS-7281 explains the following:
1. A block is missing if and only if all DNs of its expected replicas are dead.
2. A block is corrupted if and only if all its available replicas are corrupted. So if a block has 3 replicas; one of the DN is dead, the other two replicas are corrupted; it will be marked as corrupted.

Reply
1. Eric Ma says:
  
  Feb 20, 2020 at 8:14 pm
  
  Hi Jim, yes, the description is not accurate. I fixed it to make it clearer – “A block is “with corrupt replicas” in HDFS if it has at least one corrupt replica along with at least one live replica.”
  
  It refers to the count from the HDFS report.
  
  Reply

Understanding these blocks

Potential causes

Possible remedies

Similar Posts

2 Comments

Leave a Reply Cancel reply