How to find the DataNodes that actually store a file in HDFS?

ByEric Ma Mar 24, 2018Mar 24, 2018

A file may be splitted to many chunks and replications stored on many datanodes in HDFS. Now, the question is how to find the DataNodes that actually store a file in HDFS?

You may use the dfsadmin -fsck tool from the Hadoop hdfs util. Here is an example:

$ hadoop fsck /user/aaa/file.name -files -locations -blocks

Connecting to namenode via http://dstore-170:50070
FSCK started by hadoop (auth:SIMPLE) from /10.0.3.170 for path /user/path/to/file.gz at Fri Oct 17 12:25:55 HKT 2014
/user/path/to/file.gz 12448905476 bytes, 93 block(s):  OK
0. BP-1960069741-10.0.3.170-1410430543652:blk_1074365040_625145 len=134217728 repl=2 [10.0.3.173:50010, 10.0.3.174:50010]
1. BP-1960069741-10.0.3.170-1410430543652:blk_1074365041_625146 len=134217728 repl=2 [10.0.3.175:50010, 10.0.3.174:50010]
2. BP-1960069741-10.0.3.170-1410430543652:blk_1074365042_625147 len=134217728 repl=2 [10.0.3.175:50010, 10.0.3.174:50010]
3. BP-1960069741-10.0.3.170-1410430543652:blk_1074365043_625148 len=134217728 repl=2 [10.0.3.175:50010, 10.0.3.174:50010]
4. BP-1960069741-10.0.3.170-1410430543652:blk_1074365044_625149 len=134217728 repl=2 [10.0.3.181:50010, 10.0.3.174:50010]
...
91. BP-1960069741-10.0.3.170-1410430543652:blk_1074365131_625236 len=134217728 repl=2 [10.0.3.175:50010, 10.0.3.174:50010]
92. BP-1960069741-10.0.3.170-1410430543652:blk_1074365132_625237 len=100874500 repl=2 [10.0.3.181:50010, 10.0.3.174:50010]

Status: HEALTHY
 Total size:	12448905476 B
 Total dirs:	0
 Total files:	1
 Total symlinks:		0
 Total blocks (validated):	93 (avg. block size 133859198 B)
 Minimally replicated blocks:	93 (100.0 %)
 Over-replicated blocks:	0 (0.0 %)
 Under-replicated blocks:	0 (0.0 %)
 Mis-replicated blocks:		0 (0.0 %)
 Default replication factor:	2
 Average block replication:	2.0
 Corrupt blocks:		0
 Missing replicas:		0 (0.0 %)
 Number of data-nodes:		10
 Number of racks:		1
FSCK ended at Fri Oct 17 12:25:55 HKT 2014 in 1 milliseconds


The filesystem under path '/user/aaa/file.name' is HEALTHY

Run Firefox on Remote Host over SSH

ByQ A Mar 24, 2018Jun 26, 2018

How to run Firefox on Remote Host over SSH ? After $ ssh -X remotehost Run this script on remote host: #!/bin/bash export $(dbus-launch) export NSS_USE_SHARED_DB=ENABLED firefox -no-remote & This script credits to Waxborg: http://waxborg.servepics.com/opensuse/remote-administration/running-firefox-remotely-over-ssh. Read more: How to run screen on a Linux host reporting “Cannot make directory ‘/var/run/screen’: Permission denied”? Fixing “Remote Host…

Programming | Tutorial

How to Count the Number of Words in a File in PHP?

ByDavid Yang Sep 20, 2020Sep 20, 2020

Counting the number of words in a file is useful in many programs. In this post, we will discuss how to count the number of words in a file in a PHP script. In PHP standard library, we have function str_word_count($str) which returns the number of words from a string $str. On the other hand,…

mkfs refuses to make filesystem with message “is apparently in use by the system; will not make a filesystem here!”

ByEric Ma Mar 24, 2018Feb 20, 2020

I have a disk from another server installed on a new server. However, when I try to make a filesystem on it, mkfs reports # mkfs -t ext4 /dev/sdd1 mke2fs 1.42.8 (20-Jun-2013) /dev/sdd1 is apparently in use by the system; will not make a filesystem here! This is a new disk to the new server….

Programming

How to Measure Time Accurately in Programs

ByEric Ma Sep 5, 2013Aug 30, 2020

It is quite common to measure the time in programs using APIs like clock() and gettimeofday(). We may also want to measure the time “accurately” for certain purposes, such as measuring a small piece of code’s execution time for performance analysis, or measuring the time in time-sensitive game software. It is hard to measure the…

Blockchain | Systems | Tutorial

How to Query Transaction By ID in Hyperledger Fabric 2.0

ByEric Ma Apr 9, 2020Mar 1, 2021

Querying transaction content out from a blockchain network is a common practice used by common scenarios like exploring the blockchain history or verifying the blockchain transaction content from a known ID. In Hyperledger Fabric, the transaction can be queried using a special system chaincode QSCC (Query System Chaincode) which is for ledger and other Fabric-related…

top-like tools on Linux for network

ByQ A Mar 24, 2018

How to display the network usage by processes like top for CPU/mem on Linux? The nethogs tool is my favorite: nethogs – Net top tool grouping bandwidth per process Read more: Multi-connection multi-part file downloading tools on Linux PDF annotation tools on Linux Good tools to manage OCaml packages How to config network in host…

Similar Posts

Leave a Reply Cancel reply