How to choose the number of mappers and reducers in Hadoop

How to choose the number of mappers and reducers in Hadoop to get good job performance?

asked Jul 23, 2013 by anonymous

1 Answer

Best answer

The Hadoop Wiki gives a discussion on this:

Some valuable points:

About the number of Maps:

The number of maps is usually driven by the number of DFS blocks in
the input files. Although that causes people to adjust their DFS block
size to adjust the number of maps. The right level of parallelism for
maps seems to be around 10-100 maps/node, although we have taken it up
to 300 or so for very cpu-light map tasks. Task setup takes awhile, so
it is best if the maps take at least a minute to execute.

About the mumber of Reduces:

The right number of reduces seems to be 0.95 or 1.75 (nodes
mapred.tasktracker.tasks.maximum). At 0.95 all of the reduces can
launch immediately and start transfering map outputs as the maps
finish. At 1.75 the faster nodes will finish their first round of
reduces and launch a second round of reduces doing a much better job
of load balancing.

answered Jul 26, 2013 by anonymous

Please log in or register to answer this question.

Copyright © SysTutorials. User contributions licensed under cc-wiki with attribution required.
Hosted on Dreamhost