After a node failure and restarting the HDFS, the NameNode reports:
“The reported blocks 1968810 needs additional 5071 blocks to reach the threshold 0.9990 of total blocks 1975856. Safe mode will be turned off automatically.”
in the log.
Why this happens? And how to fix it?
About why the NameNode stays in the safe mode:
At startup time, the namenode reads its namespace from disk (the
FSImage and edits files). This includes all the HDFS filenames and
block lists that it should know, but not the mappings of block
replicas to datanodes. Then it waits in safe mode for all or most of
the datanodes to send their Initial Block Reports, which let the
namenode build its map of which blocks have replicas in which
datanodes. It keeps waiting until dfs.namenode.safemode.threshold-pct
of the blocks that it knows about from FSImage have been reported from
at least dfs.namenode.replication.min (default 1) datanodes [so that’s
a third config parameter I didn’t mention earlier]. If this threshold
is achieved, it will post a log that it is ready to leave safe mode,
wait for dfs.namenode.safemode.extension seconds, then automatically
leave safe mode and generate replication requests for any
under-replicated blocks (by default, those with replication < 3).
If it doesn’t reach the “safe replication for all known blocks”
threshold, then it will not leave safe mode automatically. It logs
the condition and waits for an admin to decide what to do, because
generally it means whole datanodes or sets of datanodes did not come
up or are not able to communicate with the namenode. Hadoop wants a
human to look at the situation before hadoop starts trying to madly
generate re-replication commands for under-replicated blocks, and
deleting blocks with zero replicas available.
By Matthew Foley.
If you are sure that the blocks will never be reported in. You can force the NameMode to leave safemode by
hadoop dfsadmin -safemode leave
You may then run
hdfs fsck -move or
hdfs fdck -delete to move or delete corrupted files.