How to find the DataNodes that actually store a file in HDFS?

A file may be splitted to many chunks and replications stored on many datanodes in HDFS. Now, the question is how to find the DataNodes that actually store a file in HDFS?

You may use the dfsadmin -fsck tool from the Hadoop hdfs util. Here is an example:

$ hadoop fsck /user/aaa/ -files -locations -blocks

Connecting to namenode via http://dstore-170:50070
FSCK started by hadoop (auth:SIMPLE) from / for path /user/path/to/file.gz at Fri Oct 17 12:25:55 HKT 2014
/user/path/to/file.gz 12448905476 bytes, 93 block(s):  OK
0. BP-1960069741- len=134217728 repl=2 [,]
1. BP-1960069741- len=134217728 repl=2 [,]
2. BP-1960069741- len=134217728 repl=2 [,]
3. BP-1960069741- len=134217728 repl=2 [,]
4. BP-1960069741- len=134217728 repl=2 [,]
91. BP-1960069741- len=134217728 repl=2 [,]
92. BP-1960069741- len=100874500 repl=2 [,]

 Total size:	12448905476 B
 Total dirs:	0
 Total files:	1
 Total symlinks:		0
 Total blocks (validated):	93 (avg. block size 133859198 B)
 Minimally replicated blocks:	93 (100.0 %)
 Over-replicated blocks:	0 (0.0 %)
 Under-replicated blocks:	0 (0.0 %)
 Mis-replicated blocks:		0 (0.0 %)
 Default replication factor:	2
 Average block replication:	2.0
 Corrupt blocks:		0
 Missing replicas:		0 (0.0 %)
 Number of data-nodes:		10
 Number of racks:		1
FSCK ended at Fri Oct 17 12:25:55 HKT 2014 in 1 milliseconds

The filesystem under path '/user/aaa/' is HEALTHY

Eric Ma

Eric is a systems guy. Eric is interested in building high-performance and scalable distributed systems and related technologies. The views or opinions expressed here are solely Eric's own and do not necessarily represent those of any third parties.

Leave a Reply

Your email address will not be published. Required fields are marked *