Is it possible to set the replication factor for specific directory in HDFS to be one that is different from the default replication factor? This should set the existing files’ replication factors but also new files created in the specific directory.
This can simplify the administration. We can set the replication factor of /tmp/ to 2 and all new files and sub directories’ replication factors are 2 by default.
There is a feature request but the feature is not yet available: https://issues.apache.org/jira/browse/HDFS-199
So, if all files in a dir like “/tmp/” are set to 2 replicas while the default is 3, the new files created in /tmp/ will still have 3 replicas.
To achieve the similar effect, you may need a daemon script to set the replication factors continuously like
while true; do hdfs dfs -setrep -w 2 /tmp/ sleep 30 done