How to choose the number of mappers and reducers in Hadoop

How to choose the number of mappers and reducers in Hadoop to get good job performance? The Hadoop Wiki gives a discussion on this: http://wiki.apache.org/hadoop/HowManyMapsAndReduces Some valuable points: About the number of Maps: The number of maps is usually driven by the number of DFS blocks in the input files. Although that causes people to…

Specifying –no-print-directory within the Makefile

The –no-print-directory option of make tells make not to print the message about entering and leaving the working directory. However, how to specify the –no-print-directory inside the Makefile itself? Add this line to the Makefile: MAKEFLAGS += –no-print-directory You can also set MAKEFLAGS in a makefile, to specify additional flags that should also be in…

MySQL at Facebook

Facebook uses lots MySQL databases. Any information about how Facebook scales MySQL? Some information on the Web: MySQL at Facebook’s page https://www.facebook.com/MySQLatFacebook?filter=1 A post by Ryan Thiessen, Database Operations at Facebook on Quora: http://www.quora.com/Facebook-Engineering/How-does-Facebook-structure-MySQL-so-that-it-is-robust-and-scalable And more: http://mashable.com/2011/12/15/facebook-timeline-mysql/ http://gigaom.com/2011/12/06/facebook-shares-some-secrets-on-making-mysql-scale/ http://www.wired.com/wiredenterprise/2011/12/facebook-timeline-anatomy “A lot of people are surprised that for this shiny new thing for Facebook, we’re using…

Force Linux to reboot

How to force Linux to reboot when the reboot command does not work. Enable the use of the magic SysRq option: # echo 1 > /proc/sys/kernel/sysrq Reboot the machine: # echo b > /proc/sysrq-trigger Even if you could not log on the system but sshd is working, you can force the Linux to reboot by:…

Cache at Facebook

About caching system at Facebook. According to: https://www.facebook.com/notes/facebook-engineering/monitoring-cache-with-claspin/10151076705703920 Facebook has two major cache systems: Memcache, which is a simple lookaside cache with most of its smarts in the client, and TAO, a caching graph database that does its own queries to MySQL. The NSDI’13 paper introduces more about Memcache: https://www.usenix.org/conference/nsdi13/scaling-memcache-facebook The USENIX ATC’13 paper introduces…

How to install Scala from the official Scala distribution

How to install Scala from the official Scala distribution? This is needed on a Linux release with older version of Scala in the repository, e.g. Fedora 12. Use the install-scala.sh script: # wget https://raw2.github.com/zma/usefulscripts/master/script/install-scala.sh # sh ./install-scala.sh VER where VER is the scala version that you want to install. First step, install and configure the…

How to delete all topics and posts from a user/spammer in myBB?

This is a tutorial on “how to delete all topics and posts from a user/spammer in myBB”. Please check out the answer. I use the Goodbye Spammer plugin for MyBB. First, install it after downloading the plugin form its homepage: Upload ./inc/plugins/goodbyespammer.php to ./inc/plugins/ Upload ./inc/languages/english/goodbyespammer.lang.php to ./inc/languages/english/ Go to ACP > Plugins > Activate…

Question2answer: show excerpt in the RSS feed

There is a “Include full text in feeds:” option in the “RSS feeds” configuration panel but no options/method to only show excerpt instead of the “full text” or nothing for RSS feeds. This need to hack the question2answer source code. Details are in the answer. The changes based on Question2asnwer 1.5.4: diff –git a/qa-include/qa-index.php b/qa-include/qa-index.php…

How to repair a MySQL table?

After a server crash and restarting, MyBB reports a SQL Error as follows: MyBB SQL Error MyBB has experienced an internal SQL error and cannot continue. SQL Error: 145 – Table ‘./mybb/mybb_sessions’ is marked as crashed and should be repaired Query: SELECT * FROM mybb_sessions WHERE sid=’40021925bd0494ea31…’ AND ip=’x.x.x.x’ LIMIT 1 The dababase is MySQL….

Plain text file pipelined to Linux mailx turns to “Content-Type: application/octet-stream” (an attachment)

Plain text file pipelined to Linux mailx turns to “Content-Type: application/octet-stream” which is recognized as an attachment by some email client. The command is like this: $ cat log.txt | mail -s “Updated log file” -r “from@example.com” “to@example.com” I expect it to be: Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit But it turns out to be: Content-Type:…

Where are the Linux routing table entries stored on disk?

I know the routing tables on Linux is in memory after being set. However, where are the routing table entries stored on disk? I mean where are the routing table is persistently stored so that the routing table can be reloaded like the iptables (under /etc/sysconfig/iptables on Fedora/RHEL/CentOS Linuxes). If the system uses the /etc/rc.d/init.d/network…