How to choose the number of mappers and reducers in Hadoop

ByQ A Mar 24, 2018

How to choose the number of mappers and reducers in Hadoop to get good job performance?

The Hadoop Wiki gives a discussion on this: http://wiki.apache.org/hadoop/HowManyMapsAndReduces

Some valuable points:

About the number of Maps:

The number of maps is usually driven by the number of DFS blocks in
the input files. Although that causes people to adjust their DFS block
size to adjust the number of maps. The right level of parallelism for
maps seems to be around 10-100 maps/node, although we have taken it up
to 300 or so for very cpu-light map tasks. Task setup takes awhile, so
it is best if the maps take at least a minute to execute.

About the mumber of Reduces:

The right number of reduces seems to be 0.95 or 1.75 (nodes
mapred.tasktracker.tasks.maximum). At 0.95 all of the reduces can
launch immediately and start transfering map outputs as the maps
finish. At 1.75 the faster nodes will finish their first round of
reduces and launch a second round of reduces doing a much better job
of load balancing.

Linux

Unified Linux Login and Home Using OpenLDAP and NFS

ByEric Ma Sep 25, 2013Mar 25, 2023

In this post, how to unified Linux login and home directory using OpenLDAP and NFS/automount will be introduced. 0. System environment This solution is tested on Fedora 12 systems and CentOS 5. LDAP and NFS server: IP: 10.0.0.2 OS: Fedora 12 x86_64 ldap base dn: “dc=lgcpu1″ Clients: IP: 10.0.0.1/24 OS: Fedora 12 x86_64 1. LDAP…

QA | Tutorial

Generating a Pair of RSA Private and Public Keys in Linux using OpenSSL

ByQ A Nov 17, 2019Mar 25, 2023

RSA (Rivest–Shamir–Adleman) is a widely used public-key cryptosystem that is used for secure communication over the internet. In this post, we will explore how to generate a pair of RSA private and public keys in Linux using the OpenSSL library. Generating a pair of RSA private and public keys in Linux using OpenSSL is a…

How to make dd faster on Linux?

ByQ A Mar 24, 2018

dd seems slow when I use command like # dd if=/dev/sda2 of=./sda2.bak How to make it faster? You can make dd faster by specifying a good bs like # dd if=/dev/sda2 of=./sda2.bak bs=8192 8192 is a magic number. There are may be other good sizes for bs for different systems. But 8192 works pretty well…

How to change the default text editor on Linux

ByEric Ma Mar 24, 2018Mar 24, 2018

On Linux, a default editor is used for text editing such as crontab -e or git commit. How to change it to the editor of my own choice? The default editor is indicated by the EDITOR environment variable. You can set this environment variable to set the default editor. For example, set it to emacs…

Software | Tutorial

How to Change Windows User Name on Windows 10 Using Computer Management

ByEric Ma Jul 14, 2019Nov 21, 2019

The Windows user name can be changed according to the user’s needs and requirement. The Windows 10 Windows Settings tool interface keeps changing after updates. It is a little hard to find out the tool to do the user name changing. One way to change Windows user name is to do it through the Computer…

How to search history commands very effectively in bash shell command line?

ByWeiwei Jia Mar 24, 2018Jan 7, 2020

How to search history commands very effectively in bash shell command line? Just enter CTRL+r The function of above is: (reverse-i-search)`’: It is very fast and efficient. Read more: How to effectively disable a Linux user account? How to add a prefix string at the beginning of each line in Bash shell script on Linux?…

Similar Posts

Leave a Reply Cancel reply