Setting Up Standalone (Local) Hadoop

Hadoop is designed to run on hundreds to thousands of computers inside cluster. However, Hadoop is configured to run things in a non-distributed mode as a single Java process by default. This is specially useful for debugging since distributed debugging is really a nightmare. This post introduces how to set up a standalone Hadoop environment.

1. Hadoop package and software installation

Follow the instruction of “1. Install needed packages” part in Hadoop Installation Tutorial to install packages. Fllow “4. Hadoop Concigurations” to configure (this file only).

2. Just run Hadoop!

Just run hadoop jobs whose input and output is in local directories. We use a simple example to show how to start a Hadoop job.

The example finds and displays every match of the given regular expression. Output is written to the given output directory.

$ mkdir input
$ cp conf/*.xml input
$ bin/hadoop jar hadoop-mapred-examples-0.21.0.jar grep input output '[a-z.]+'
$ cat output/*

The jar file’s name may be different depending on the Hadoop distribution’s version.

Is it simple? Enjoy it and go further to play Fully-distributed Hadoop Installation.

