How to estimate the memory usage of HDFS NameNode for a HDFS cluster?

HDFS stores the metadata of files and blocks in the memory of the NameNode. How to estimate the memory usage of HDFS NameNode for a HDFS cluster?

Each file and each block has around 150 bytes of metadata on NameNode. So you may do the calculation based on this.

For examples, assume block size is 64MB, 100 million 1MB files (100TB in total) will have metadata of (100 million blocks)

100M * 150B + 100M * 150B = 30GB

1 million 64GB files (64PB in total, assume HDFS can scale to this large) will have metadata of

1M * 150B + 64GB/64MB * 150B = 330MB

So, you also get the idea that HDFS does need lots memory on the NameNode to handle many small files.

Answered by Eric Z Ma.

Eric Z Ma

Eric is a father and systems guy. Eric is interested in building high-performance and scalable distributed systems and related technologies. The views or opinions expressed here are solely Eric's own and do not necessarily represent those of any third parties.

Leave a Reply

Your email address will not be published. Required fields are marked *