HDFS SecondaraNameNode log shows 2017-08-06 10:54:14,488 ERROR org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: Exception in doCheckpoint java.io.IOException: Inconsistent checkpoint fields. LV = -63 namespaceID = 1920275013 cTime = 0 ; clusterId = CID-f38880ba-3415-4277-8abf-b5c2848b7a63 ; blockpoolId = BP-578888813-10.6.1.2-1497278556180. Expecting respectively: -63; 263120692; 0; CID-d22222fd-e28a-4b2d-bd2a-f60e1f0ad1b1; BP-622207878-10.6.1.2-1497242227638. at org.apache.hadoop.hdfs.server.namenode.CheckpointSignature.validateStorageInfo(CheckpointSignature.java:134) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doCheckpoint(SecondaryNameNode.java:531) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doWork(SecondaryNameNode.java:395) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$1.run(SecondaryNameNode.java:361) at org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:415) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.run(SecondaryNameNode.java:357) It seems the checkpoint
Read more
Tag: dfs
How to estimate the memory usage of HDFS NameNode for a HDFS cluster?
Posted onHDFS stores the metadata of files and blocks in the memory of the NameNode. How to estimate the memory usage of HDFS NameNode for a HDFS cluster? Each file and each block has around 150 bytes of metadata on NameNode. So you may do the calculation based on this. For examples, assume block size is
Read more
How to put files with spaces in names into HDFS?
Posted onI got this error when I tried to save a file with a space in its name into HDFS: $ hdfs dfs -put -f “/home/u1/testa/test a” “/u1/testa/test a” put: unexpected URISyntaxException while the HDFS seems allow spaces in its file names: https://hadoop.apache.org/docs/r2.7.3/hadoop-project-dist/hadoop-common/filesystem/model.html . How to achieve the effect of saving the files with spaces in
Read more
How to manually kill HDFS DataNodes?
Posted onstop-dfs.sh report that there are no datanodes running on some nodes like hdfs-node-000208: no datanode to stop However, there are DataNode process running there. How to clean these processes on many (100s) of nodes? You may use this piece of bash script: for i in `cat hadoop/etc/hadoop/slaves`; do echo $i; ssh $i ‘jps | grep
Read more
How to set the replication factor for one file when it is uploaded by `hdfs dfs -put` command line in HDFS?
Posted onWhen uploading a file by the hdfs dfs -put command line in HDFS, how to set a replication factor instead of the global one for that file? For example, HDFS’s global replication factor is 3. For some temporary files, I would like to save just one copy for faster uploading and saving disk space. The
Read more
HDFS stays in safe mode because of reported blocks not reaching 0.9990 of total blocks
Posted onAfter a node failure and restarting the HDFS, the NameNode reports: “The reported blocks 1968810 needs additional 5071 blocks to reach the threshold 0.9990 of total blocks 1975856. Safe mode will be turned off automatically.” in the log. Why this happens? And how to fix it? About why the NameNode stays in the safe mode:
Read more
How to set replication factors for HDFS directories?
Posted onIs it possible to set the replication factor for specific directory in HDFS to be one that is different from the default replication factor? This should set the existing files’ replication factors but also new files created in the specific directory. This can simplify the administration. We can set the replication factor of /tmp/ to
Read more
How to add a new HDFS NameNode metadata directory to an existing cluster?
Posted onWe have a running HDFS cluster. Currently, the NameNode metadata data directory has only one directory configured in hdfs-site.xml: <property> <name>dfs.namenode.name.dir</name> <value>file:///home/hadoop/hdfs/</value> <description>NameNode directory for namespace and transaction logs storage.</description> </property> We would like to add a new directory for dfs.namenode.name.dir to make replicas of the metadata on a separated disk for higher data reliability.
Read more
How to check the replication factor of a file in HDFS?
Posted onA related question: how to find the replication factors of files in a HDFS cluster? method 1: You can use the HDFS command line to ls the file. The second column of the output will show the replication factor of the file. For example, $ hdfs dfs -ls /usr/GroupStorage/data1/out.txt -rw-r–r– 3 hadoop zma 11906625598 2014-10-22
Read more
How to change number of replications of certain files in HDFS?
Posted onThe HDFS has a configuration in hdfs-site.xml to set the global replication number of blocks with the “dfs.replication” property. However, there are some “hot” files that are access by many nodes. How to increase the number of blocks for these certain files in HDFS? You can the replication number of certain file to 10: hdfs
Read more
How to set the data replication factor of Hadoop HDFS?
Posted onHow to set the data replication factor of Hadoop HDFS in Hadoop 2 (YARN)? The default replication factor in HDFS is controlled by the dfs.replication property. The value is 3 by default. To change the replication factor, you can add a dfs.replication property settings in the hdfs-site.xml configuration file of Hadoop: <property> <name>dfs.replication</name> <value>1</value> <description>Replication
Read more
How to choose the number of mappers and reducers in Hadoop
Posted onHow to choose the number of mappers and reducers in Hadoop to get good job performance? The Hadoop Wiki gives a discussion on this: http://wiki.apache.org/hadoop/HowManyMapsAndReduces Some valuable points: About the number of Maps: The number of maps is usually driven by the number of DFS blocks in the input files. Although that causes people to
Read more
How to force a metadata checkpointing in HDFS
Posted onThe metadata checkpointing in HDFS is done by the Secondary NameNode to merge the fsimage and the edits log files periodically and keep edits log size within a limit. For various reasons, the checkpointing by the Secondary NameNode may fail. For one example, HDFS SecondaraNameNode log shows errors in its log as follows. 2017-08-06 10:54:14,488
Read more
Hadoop Installation Tutorial (Hadoop 2.x)
Posted onHadoop 2 or YARN is the new version of Hadoop. It adds the yarn resource manager in addition to the HDFS and MapReduce components. Hadoop MapReduce is a programming model and software framework for writing applications, which is an open-source variant of MapReduce designed and implemented by Google initially for processing and generating large data
Read more
Hadoop Installation Tutorial (Hadoop 1.x)
Posted onUpdate: If you are new to Hadoop and trying to install one. Please check the newer version: Hadoop Installation Tutorial (Hadoop 2.x). Hadoop mainly consists of two parts: Hadoop MapReduce and HDFS. Hadoop MapReduce is a programming model and software framework for writing applications, which is an open-source variant of MapReduce that is initially designed
Read more
Hadoop Default Ports
Posted onHadoop’s namenode and datanodes expose a bunch of TCP ports used by Hadoop’s daemons to communicate to each other or listen directly to users’ requests. These ports information are needed by both the Hadoop users and cluster administrators to write programs or configure firewalls/gateways accordingly. A post written by Philip Zeyliger from Cloudera’s blog summarizes the
Read more
A Simple Sort Benchmark on Hadoop
Posted onAfter [[hadoop-installation-tutorial|installing Hadoop]], we usually run some benchmark programs to test whether the system works well. In the post of the Hadoop install tutorial, we show a very simple to grep strings from a simple sets of files. In this post, we introduce the Sort for testing and benchmarking Hadoop. The Sort program is also
Read more
mrcc – A Distributed C Compiler System on MapReduce
Posted onThe mrcc project’s homepage is here: mrcc project. Abstract mrcc is an open source compilation system that uses MapReduce to distribute C code compilation across the servers of the cloud computing platform. mrcc is built to use Hadoop by default, but it is easy to port it to other could computing platforms, such as MRlite,
Read more