how to skip mapper function in hadoop

In hadoop I need to skip mapper function and directly execute the reducer function.

We doing this to improve hadoop performance, if the hadoop framework is used to analyze same data sets, then mapper's output will be same for different kind of jobs. To save the redundant computation for same results, I am planning to run mapper function once and store its output as a cache and utilize it for future jobs by passing on mapper function and directly execute reducer function using pre processed mapper's results.

Any help or pointer will be appreciated.

(sorry for bad english)

asked Jan 18, 2015 by kanabargi sanjeev (170 points)

1 Answer

 
Best answer

I feel your requirement does not fit into the MapReduce model well.

If you must use MapReduce/HDFS, you may consider using multiple MapReduce jobs:

The first MapReduce job stores the shuffle results (by a reduce function that outputs its input) in HDFS that can be reused.

The other MapReduce jobs just have map tasks that read the data from HDFS generated by the first MapReduce job as input. You must carefully organize and partition the data in HDFS to simulate the reduce tasks' semantics if you applications reply on the semantics.

It can work but is ugly.

Another choice is to implement a MapReduce-like framework on Hadoop (YARN/Hadoop 2, not Hadoop v1) that can skip the map phase. You may take a look at the approachs of HaLoop and DryadInc.

Overall, my felling is that it is better to use other programming models/systems other than plain MapReduce for your workloads.

answered Jan 20, 2015 by Eric Z Ma (44,280 points)
edited Jan 20, 2015 by Eric Z Ma

Hey Eric,
Appreciate your response, will look into other models as you suggested and update this post.

Regards,
Sanjeev

commented Jan 20, 2015 by kanabargi sanjeev (170 points)

HI, Eric,

Can you please give me some materials, about how to do same using "Hadoop 2" ?

Regards,
Sanjeev

commented Jan 20, 2015 by kanabargi sanjeev (170 points)
commented Jan 20, 2015 by Eric Z Ma (44,280 points)

Please log in or register to answer this question.

Copyright © SysTutorials. User contributions licensed under cc-wiki with attribution required.
Hosted on Dreamhost

...