How to force a checkpointing of metadata in HDFS?

HDFS SecondaraNameNode log shows 2017-08-06 10:54:14,488 ERROR org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: Exception in doCheckpoint java.io.IOException: Inconsistent checkpoint fields. LV = -63 namespaceID = 1920275013 cTime = 0 ; clusterId = CID-f38880ba-3415-4277-8abf-b5c2848b7a63 ; blockpoolId = BP-578888813-10.6.1.2-1497278556180. Expecting respectively: -63; 263120692; 0; CID-d22222fd-e28a-4b2d-bd2a-f60e1f0ad1b1; BP-622207878-10.6.1.2-1497242227638. at org.apache.hadoop.hdfs.server.namenode.CheckpointSignature.validateStorageInfo(CheckpointSignature.java:134) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doCheckpoint(SecondaryNameNode.java:531) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doWork(SecondaryNameNode.java:395) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$1.run(SecondaryNameNode.java:361) at org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:415) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.run(SecondaryNameNode.java:357) It seems the checkpoint […]

How to put files with spaces in names into HDFS?

I got this error when I tried to save a file with a space in its name into HDFS: $ hdfs dfs -put -f “/home/u1/testa/test a” “/u1/testa/test a” put: unexpected URISyntaxException while the HDFS seems allow spaces in its file names: https://hadoop.apache.org/docs/r2.7.3/hadoop-project-dist/hadoop-common/filesystem/model.html . How to achieve the effect of saving the files with spaces in […]

How to set the replication factor for one file when it is uploaded by `hdfs dfs -put` command line in HDFS?

When uploading a file by the hdfs dfs -put command line in HDFS, how to set a replication factor instead of the global one for that file? For example, HDFS’s global replication factor is 3. For some temporary files, I would like to save just one copy for faster uploading and saving disk space. The […]

How to make Fedora Linux not clean some files in /tmp/?

On my Fedora 20, I find that the system automatically clean up file under /tmp/. This is convenient. However, it cause some problems for some programs. For example, HDFS puts its DataNode pid file under /tmp/ by default like hadoop-hadoop-datanode.pid. After it is cleaned up, the hadoop-daemon.sh script will consider there is no DataNode running. […]

How to handle missing blocks and blocks with corrupt replicas in HDFS?

One of HDFS cluster’s hdfs dfsadmin -report reports: Under replicated blocks: 139016 Blocks with corrupt replicas: 9 Missing blocks: 0 The “Under replicated blocks” can be re-replicated automatically after some time. How to handle the missing blocks and blocks with corrupt replicas in HDFS? Understanding these blocks A block is called corrupt by HDFS if […]

HDFS stays in safe mode because of reported blocks not reaching 0.9990 of total blocks

After a node failure and restarting the HDFS, the NameNode reports: “The reported blocks 1968810 needs additional 5071 blocks to reach the threshold 0.9990 of total blocks 1975856. Safe mode will be turned off automatically.” in the log. Why this happens? And how to fix it? About why the NameNode stays in the safe mode: […]

How to add a new HDFS NameNode metadata directory to an existing cluster?

We have a running HDFS cluster. Currently, the NameNode metadata data directory has only one directory configured in hdfs-site.xml: <property> <name>dfs.namenode.name.dir</name> <value>file:///home/hadoop/hdfs/</value> <description>NameNode directory for namespace and transaction logs storage.</description> </property> We would like to add a new directory for dfs.namenode.name.dir to make replicas of the metadata on a separated disk for higher data reliability. […]

How to change an running HDFS cluster’s replication factor?

Now, I have a running HDFS cluster storing lost files. I want to change its default replication factor. How to change it? What will happen after it is changed? For example, I change from 2 to 3. Will HDFS automatically re-replicate the data chunks? First, the replication factor is client decided. Second, the replication factor […]

How to change number of replications of certain files in HDFS?

The HDFS has a configuration in hdfs-site.xml to set the global replication number of blocks with the “dfs.replication” property. However, there are some “hot” files that are access by many nodes. How to increase the number of blocks for these certain files in HDFS? You can the replication number of certain file to 10: hdfs […]

How to set the data replication factor of Hadoop HDFS?

How to set the data replication factor of Hadoop HDFS in Hadoop 2 (YARN)? The default replication factor in HDFS is controlled by the dfs.replication property. The value is 3 by default. To change the replication factor, you can add a dfs.replication property settings in the hdfs-site.xml configuration file of Hadoop: <property> <name>dfs.replication</name> <value>1</value> <description>Replication […]