When compiling MapReduce jobs against a Hadoop installation, you need to include the correct classpath to resolve Hadoop dependencies. The yarn classpath command handles this automatically.
Getting the classpath
Run this command to output the full classpath:
yarn classpath
If yarn isn’t in your $PATH, use the full path:
$HADOOP_HOME/bin/yarn classpath
Replace $HADOOP_HOME with your actual Hadoop installation directory. The command outputs a colon-separated list of JAR files and directories that Hadoop requires.
Compiling with javac
To compile your MapReduce code, pass the classpath directly to javac using command substitution:
javac -cp $(yarn classpath) MyMapReduceJob.java
This captures the output of yarn classpath and uses it as the compilation classpath.
For multiple source files:
javac -cp $(yarn classpath) -d build/ src/*.java
This compiles all Java files in src/ and places class files in the build/ directory.
Setting the classpath persistently
For shell scripts or repeated compilation, export the classpath as an environment variable:
export HADOOP_CLASSPATH=$(yarn classpath)
javac -cp $HADOOP_CLASSPATH MyMapReduceJob.java
You can add this to your .bashrc or .bash_profile if you work with Hadoop frequently, but be aware that HADOOP_CLASSPATH has special meaning in Hadoop — it’s the user classpath that gets prepended to the system classpath at runtime. For compilation purposes, yarn classpath is more appropriate.
Verifying and debugging classpath issues
If compilation fails with “cannot find symbol” errors, verify the classpath is correct:
yarn classpath
echo $?
Check that the command exits with status 0 and outputs paths without errors. The output should include JARs like:
hadoop-common-*.jarhadoop-mapreduce-client-core-*.jarhadoop-hdfs-client-*.jarhadoop-yarn-*.jar
Verify all listed paths actually exist:
yarn classpath | tr ':' '\n' | while read jar; do
[ -e "$jar" ] || echo "Missing: $jar"
done
If paths are missing, your Hadoop installation may be incomplete. Check that $HADOOP_HOME is set correctly:
echo $HADOOP_HOME
ls -la $HADOOP_HOME/share/hadoop/
The share/hadoop/ directory should contain subdirectories like common/, hdfs/, mapreduce/, and yarn/.
Using build tools instead
For new projects, Maven or Gradle are strongly preferred over manual classpath management. They handle dependency resolution automatically and make your project reproducible:
Maven:
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-mapreduce-client-core</artifactId>
<version>3.3.6</version>
</dependency>
Gradle:
dependencies {
implementation 'org.apache.hadoop:hadoop-mapreduce-client-core:3.3.6'
}
You’ll also typically need hadoop-common and hadoop-hdfs-client dependencies depending on your job requirements.
Runtime vs. compilation classpath
When you actually submit a MapReduce job to the cluster, use hadoop jar:
hadoop jar MyJob.jar com.example.MyJobClass input/ output/
The hadoop jar command automatically includes the correct runtime classpath. You don’t need to manually specify it for job submission — the cluster handles classpath management through the Hadoop configuration.
The yarn classpath command is primarily useful for development-time compilation. At runtime, Hadoop manages the classpath internally based on the installed version and configuration.
