HDFS stays in safe mode because of reported blocks not reaching 0.9990 of total blocks

ByEric Ma Mar 24, 2018Feb 9, 2019

After a node failure and restarting the HDFS, the NameNode reports:

“The reported blocks 1968810 needs additional 5071 blocks to reach the threshold 0.9990 of total blocks 1975856. Safe mode will be turned off automatically.”

in the log.

Why this happens? And how to fix it?

About why the NameNode stays in the safe mode:

At startup time, the namenode reads its namespace from disk (the
FSImage and edits files). This includes all the HDFS filenames and
block lists that it should know, but not the mappings of block
replicas to datanodes. Then it waits in safe mode for all or most of
the datanodes to send their Initial Block Reports, which let the
namenode build its map of which blocks have replicas in which
datanodes. It keeps waiting until dfs.namenode.safemode.threshold-pct
of the blocks that it knows about from FSImage have been reported from
at least dfs.namenode.replication.min (default 1) datanodes [so that’s
a third config parameter I didn’t mention earlier]. If this threshold
is achieved, it will post a log that it is ready to leave safe mode,
wait for dfs.namenode.safemode.extension seconds, then automatically
leave safe mode and generate replication requests for any
under-replicated blocks (by default, those with replication < 3).

and

If it doesn’t reach the “safe replication for all known blocks”
threshold, then it will not leave safe mode automatically. It logs
the condition and waits for an admin to decide what to do, because
generally it means whole datanodes or sets of datanodes did not come
up or are not able to communicate with the namenode. Hadoop wants a
human to look at the situation before hadoop starts trying to madly
generate re-replication commands for under-replicated blocks, and
deleting blocks with zero replicas available.

By Matthew Foley.

If you are sure that the blocks will never be reported in. You can force the NameMode to leave safemode by

hadoop dfsadmin -safemode leave

You may then run hdfs fsck -move or hdfs fdck -delete to move or delete corrupted files if you are sure you will not need these affected files any more.

QA

How to detect whether a file is being written by any other process in Linux?
ByEric Ma Mar 24, 2018Mar 24, 2018

How to detect whether a file is being written by any other process in Linux? Before a program open a file to processes it, it wants to ensure no other processes are writing to it. Here, we are sure after the files are written and closed, they will not be written any more. Hence, one-time…

Read More How to detect whether a file is being written by any other process in Linux?
QA

How to check interrupts lively in your systems?
ByWeiwei Jia Mar 24, 2018Jan 7, 2020

I need to see what kind of interrupts are handled by which CPU. Please run command: # cat /proc/interrupts or you can execute $ watch -n1 “cat /proc/interrupts” to watch interrupts every 1 second. See [1] [2] for more details. References: [1] https://stackoverflow.com/questions/28301875/how-to-observe-interrupts-in-windows-or-linux-ubuntu-14-04 [2] http://www.linuxjournal.com/content/watch-live-interrupts Read more: How to check the replication factor of a…

Read More How to check interrupts lively in your systems?
QA

How to change strings in MySQL tables
ByQ A Mar 24, 2018

How to change strings in MySQL tables? e.g. I want to change domain.com to www.domain.com. Use the REPLACE functions of MySQL: http://dev.mysql.com/doc/refman/5.0/en/string-functions.html#function_replace One example is like this: UPDATE table SET field = REPLACE(field, ‘domain.com’, ‘www.domain.com’) WHERE field LIKE ‘%domain.com%’ The WHERE clause is not needed but can make execution faster. Read more: How to find…

Read More How to change strings in MySQL tables
QA

How to force a fsck during next rebooting of Linux?
ByEric Ma Mar 24, 2018Nov 21, 2019

How to force a fsck of a file system, say the root, during the next rebooting of Linux? 2 possible ways: /forcefsck way for / # touch /forcefsck and reboot. Next time the / will be fsck’ed . systemd way Add these 2 kernel boot parameters: fsck.mode=force fsck.repair=yes What these 2 kernel parameters do: KERNEL…

Read More How to force a fsck during next rebooting of Linux?
QA

Shared hosting services with SSH enabled
ByQ A Mar 24, 2018Sep 6, 2020

Which shared hosting services have SSH enabled? SSH is a great tool for management and development. Lots shared hosting services support SSH. Here is a non-complete list: Dreamhost SSH on dreamhost: http://wiki.dreamhost.com/Enabling_Shell_Access BlueHost SSH on BlueHost: http://my.bluehost.com/cgi/help/180 GoDaddy SSH on GoDaddy: http://support.godaddy.com/help/article/4942 HostGator SSH on HostGator: http://support.hostgator.com/articles/hosting-guide/lets-get-started/how-do-i-get-and-use-ssh-access HostMonster SSH on HostMonster: http://my.hostmonster.com/cgi/help/180 ServerGrove SSH on…

Read More Shared hosting services with SSH enabled
QA

How to get vCPU thread ID in QEMU/KVM host OS?
ByWeiwei Jia Mar 24, 2018Jan 7, 2020

In order to collect more information about CPU information internal guest OS, we usually need to get vCPU’s thread ID in host OS. Solution 1: Under directory ‘/sys/fs/cgroup/cpuset/machine’, you will find virtual_machine_name.libvirt-qemu directory. And under this directory, you will find all vCPU sub-directories and under these dirs, you will find vCPU thread IDs. For example…

Read More How to get vCPU thread ID in QEMU/KVM host OS?

2 Comments

Yedukondalu says:

Feb 6, 2019 at 12:21 pm

Don’t we loose data if delete corrupted files?

Reply
1. Eric Z Ma says:
  
  Feb 9, 2019 at 9:44 pm
  
  Yes, the operations remove files — “move or delete corrupted files”.
  
  Reply

Similar Posts

2 Comments

Leave a Reply Cancel reply