Setting Up Standalone (Local) Hadoop

Hadoop is designed to run on hundreds to thousands of computers inside cluster. However, Hadoop is configured to run things in a non-distributed mode as a single Java process by default. This is specially useful for debugging since distributed debugging is really a nightmare. This post introduces how to set up a standalone Hadoop environment.

1. Hadoop package and software installation

Follow the instruction of “1. Install needed packages” part in Hadoop Installation Tutorial to install packages. Fllow “4. Hadoop Concigurations” to configure hadoop-env.sh (this file only).

2. Just run Hadoop!

Just run hadoop jobs whose input and output is in local directories. We use a simple example to show how to start a Hadoop job.

The example finds and displays every match of the given regular expression. Output is written to the given output directory.

$ mkdir input
$ cp conf/*.xml input
$ bin/hadoop jar hadoop-mapred-examples-0.21.0.jar grep input output '[a-z.]+'
$ cat output/*

The jar file’s name may be different depending on the Hadoop distribution’s version.

Is it simple? Enjoy it and go further to play Fully-distributed Hadoop Installation.

Eric Zhiqiang Ma

Eric is interested in building high-performance and scalable distributed systems and related technologies. The views or opinions expressed here are solely Eric's own and do not necessarily represent those of any third parties.

Leave a Reply

Your email address will not be published. Required fields are marked *