How to set replication factors for HDFS directories?

Is it possible to set the replication factor for specific directory in HDFS to be one that is different from the default replication factor? This should set the existing files’ replication factors but also new files created in the specific directory.

This can simplify the administration. We can set the replication factor of /tmp/ to 2 and all new files and sub directories’ replication factors are 2 by default.

There is a feature request but the feature is not yet available: https://issues.apache.org/jira/browse/HDFS-199

So, if all files in a dir like “/tmp/” are set to 2 replicas while the default is 3, the new files created in /tmp/ will still have 3 replicas.

To achieve the similar effect, you may need a daemon script to set the replication factors continuously like

while true; do
    hdfs dfs -setrep -w 2 /tmp/
    sleep 30
done

Eric Ma

Eric is a systems guy. Eric is interested in building high-performance and scalable distributed systems and related technologies. The views or opinions expressed here are solely Eric's own and do not necessarily represent those of any third parties.

Leave a Reply

Your email address will not be published. Required fields are marked *