How to find out all files with replication factor 1 in HDFS?

How to find out all files with replication factor 1 in HDFS? The hdfs dfsadmin -report shows there are blocks with replication factor 1: Missing blocks (with replication factor 1): 7 How to find them out? You can run hdfs fsck to list all files with their replication counts and grep those with replication factor […]

How to set the replication factor for one file when it is uploaded by `hdfs dfs -put` command line in HDFS?

When uploading a file by the hdfs dfs -put command line in HDFS, how to set a replication factor instead of the global one for that file? For example, HDFS’s global replication factor is 3. For some temporary files, I would like to save just one copy for faster uploading and saving disk space. The […]

How to handle missing blocks and blocks with corrupt replicas in HDFS?

One of HDFS cluster’s hdfs dfsadmin -report reports: Under replicated blocks: 139016 Blocks with corrupt replicas: 9 Missing blocks: 0 The “Under replicated blocks” can be re-replicated automatically after some time. How to handle the missing blocks and blocks with corrupt replicas in HDFS? Understanding these blocks A block is called corrupt by HDFS if […]

HDFS stays in safe mode because of reported blocks not reaching 0.9990 of total blocks

After a node failure and restarting the HDFS, the NameNode reports: “The reported blocks 1968810 needs additional 5071 blocks to reach the threshold 0.9990 of total blocks 1975856. Safe mode will be turned off automatically.” in the log. Why this happens? And how to fix it? About why the NameNode stays in the safe mode: […]

How to set replication factors for HDFS directories?

Is it possible to set the replication factor for specific directory in HDFS to be one that is different from the default replication factor? This should set the existing files’ replication factors but also new files created in the specific directory. This can simplify the administration. We can set the replication factor of /tmp/ to […]

How to check the replication factor of a file in HDFS?

A related question: how to find the replication factors of files in a HDFS cluster? method 1: You can use the HDFS command line to ls the file. The second column of the output will show the replication factor of the file. For example, $ hdfs dfs -ls /usr/GroupStorage/data1/out.txt -rw-r–r– 3 hadoop zma 11906625598 2014-10-22 […]

How to set the data replication factor of Hadoop HDFS?

How to set the data replication factor of Hadoop HDFS in Hadoop 2 (YARN)? The default replication factor in HDFS is controlled by the dfs.replication property. The value is 3 by default. To change the replication factor, you can add a dfs.replication property settings in the hdfs-site.xml configuration file of Hadoop: <property> <name>dfs.replication</name> <value>1</value> <description>Replication […]

Hadoop 2 (YARN) default configuration values

Where to check the default Hadoop 2 (YARN) configuration values for: HDFS: hdfs-site.xml YARN: yarn-site.xml MapReduce: mapred-site.xml Default Hadoop 2 (YARN) configuration values for Hadoop 2.2.0 from Apache Hadoop website: HDFS: YARN: MapReduce: Answered by Eric Z Ma.

Good introductions to Hadoop 2.0 (YARN)?

Which ones are recommended introductions to Hadoop 2.0 (YARN)? Pointers to webpages are good. Those are good ones that I find: The SoCC13 paper “Apache Hadoop YARN: Yet Another Resource Negotiator” by Vinod Kumar Vavilapalli et al.: The introduction from Hortonworks by Arun Murthy: The “Official” one from Apache Hadoop website (very brief): Answered […]