HDFS stores the metadata of files and blocks in the memory of the NameNode. How to estimate the memory usage of HDFS NameNode for a HDFS cluster?
Each file and each block has around 150 bytes of metadata on NameNode. So you may do the calculation based on this.
For examples, assume block size is 64MB, 100 million 1MB files (100TB in total) will have metadata of (100 million blocks)
100M * 150B + 100M * 150B = 30GB
1 million 64GB files (64PB in total, assume HDFS can scale to this large) will have metadata of
1M * 150B + 64GB/64MB * 150B = 330MB
So, you also get the idea that HDFS does need lots memory on the NameNode to handle many small files.