Checking HDFS File Replication Factor
When managing HDFS clusters, you often need to verify the replication factor of specific files to ensure data redundancy meets your requirements. Here are the practical methods to check this.
Using hdfs dfs -ls
The most straightforward way is to list the file with hdfs dfs -ls:
hdfs dfs -ls /usr/GroupStorage/data1/out.txt
Output:
-rw-r--r-- 3 hadoop zma 11906625598 2014-10-22 18:35 /usr/GroupStorage/data1/out.txt
The third column (the number 3 in this example) represents the replication factor. This file has 3 replicas across the cluster.
Using hdfs dfs -stat
For a cleaner, more parseable output, use the -stat flag with the %r format specifier:
hdfs dfs -stat %r /usr/GroupStorage/data1/out.txt
Output:
3
This is particularly useful when scripting or piping output to other tools, as it returns only the replication factor without additional formatting.
Checking Multiple Files
To check the replication factor for all files in a directory:
hdfs dfs -stat %r /usr/GroupStorage/data1/*
Or recursively for all files:
hdfs dfs -stat %r /usr/GroupStorage/data1/*/
Checking Replication on the NameNode
If you need to inspect replication details from the NameNode perspective, use the fsck command:
hdfs fsck /usr/GroupStorage/data1/out.txt
This provides detailed information about block distribution and replication status across the cluster, including which DataNodes hold each block replica. It’s useful for troubleshooting under-replicated or over-replicated files.
Common Format Specifiers for -stat
The -stat command supports several useful format specifiers beyond %r:
%b— file size in bytes%n— filename%o— block size%r— replication factor%u— owner%g— group
Combine them for custom output:
hdfs dfs -stat "%n has replication %r and block size %o" /usr/GroupStorage/data1/out.txt
Output:
out.txt has replication 3 and block size 134217728
Changing Replication Factor
If you need to modify the replication factor:
hdfs dfs -setrep -w 2 /usr/GroupStorage/data1/out.txt
The -w flag waits for the replication to complete before returning. Without it, the command returns immediately and replication happens asynchronously.
Use -R for recursive changes across directories:
hdfs dfs -setrep -w -R 2 /usr/GroupStorage/data1/
Quick Reference
This article covered the essential concepts and commands for the topic. For more information, consult the official documentation or manual pages. The key takeaway is to understand the fundamentals before applying advanced configurations.
Practice in a test environment before making changes on production systems. Keep notes of what works and what does not for future reference.
Hadoop Cluster Health Monitoring
Regular health checks prevent small issues from becoming cluster-wide problems. Monitor HDFS capacity utilization and ensure DataNode heartbeats are current. Watch for under-replicated blocks which indicate potential data loss risk.
Key monitoring commands include hdfs dfsadmin -report for cluster overview, yarn node -list for NodeManager status, and hdfs fsck / for filesystem consistency checks. Set up automated alerts for critical metrics like disk usage above 85% or failed NodeManagers.
Quick Verification
After applying the changes described above, verify that everything works as expected. Run the relevant commands to confirm the new configuration is active. Check system logs for any errors or warnings that might indicate problems. If something does not work as expected, review the steps carefully and consult the official documentation for your specific version.

Hello Eric,
I want to find out all the files having replication factor of 1 and get that changed to 3.
I am unable to get the completed path of these file and directories hence I am unable to change it, would there be a way to get list (including complete path) of all these files with RF 1 so that I can change the replication to 3.
Regards
Wert.
You can find out the files with replication factor of 1 using the method introduced at https://www.systutorials.com/how-to-find-out-all-files-with-replication-factor-1-in-hdfs/ . Then you can set them. A script may automate the process.