How to choose the number of mappers and reducers in Hadoop

Posted on

How to choose the number of mappers and reducers in Hadoop to get good job performance? The Hadoop Wiki gives a discussion on this: http://wiki.apache.org/hadoop/HowManyMapsAndReduces Some valuable points: About the number of Maps: The number of maps is usually driven by the number of DFS blocks in the input files. Although that causes people to
Read more

How to quickly find out failed disks’ SATA port in Linux? (how to map Linux disk names to SATA ports)

Posted on

I find one disk failed on my server which have several ones installed. I know the disk’s name in Linux (e.g. sda, sdb). However, the Linux disk name to SATA port mapping does not follow the same order. Now, I want to find out the failed disks. How to quickly find out them and which
Read more

Where Does Evolution Save Its Data and Configuration Files on Linux?

Posted on

Evolution is a great personal information management tool that provides Email, address book and calendar tools. Evolution provides many enterprise friendly feature such as native support to Microsoft Exchange connectivity for Emails, address books and calendars. Evolution uses various ways including plain files and dconf configuration systems. This post will give an introduction to the
Read more

Building and Installing Linux Kernel from the Source Code in an Existing Linux OS

Posted on

Building Linux kernel may sound a complex and geek-only thing. However, as Linux kernel itself has much less depended tools/packages compared to other software packages, it is quite easy to compile, build and install a Linux kernel from the source code in an existing Linux OS. Building Linux kernel is needed if you need to
Read more

How to Turn GNOME terminal to a Pop-up Terminal

Posted on

A pop-up terminal is great and handy on Linux and similar OS. On KDE, Yakuake is great. On Gnome or GTK, I ever tried Guake. It is quite good. However, it has not been as mature, stable and figure-rich as gnome-terminal. One day, I got this idea: why not using a script/program to manage the
Read more

Hadoop Installation Tutorial (Hadoop 2.x)

Posted on

Hadoop 2 or YARN is the new version of Hadoop. It adds the yarn resource manager in addition to the HDFS and MapReduce components. Hadoop MapReduce is a programming model and software framework for writing applications, which is an open-source variant of MapReduce designed and implemented by Google initially for processing and generating large data
Read more

Keyboard Key Mapping for Emacs: Evil Mode and Rearranging Alt, Ctrl and Win Keys

Posted on

Ctrl keys are important and possibly most frequently used in Emacs. However, it is painful on today’s common PC keyboards since Ctrl keys are usually in the corner of the keyboard main area. Why the key mappings in Emacs are designed like this? After it was designed, Emacs was commonly on the Lisp Machine keyboards
Read more

How to Find Out Failed Disks’ SATA Ports in Linux

Posted on

The Linux disk names (e.g. sda1, hdb3, etc.) are not reliable—they may be changed if there are hardware changes, such an adding or removing a disk. Additionally, the order for the Linux device names is not always the same as the order of SATA poets. For example, the disk connected to SATA port 0 (first
Read more

Improving Font Rendering for Fedora Using Bytecode Interpreter

Posted on

Fedora’s font rendering isn’t very nice. At least on my laptop with Fedora 12. Bytecode Interpreter (BCI for short) is disabled by default because of patent issues. As the TrueType bytecode patents have expired. We may enable BCI in Fedora now. TrueType announced that BCI is enabled by default from 2.4. Fedora 12’s TrueType version
Read more

Finding out Linux Network Configuration Information

Posted on

There is various network configuration information in Linux and lots tools can be used to find out those configuration information. Finding out these network information in Fedora Linux as the example will be introduced. IP address, MAC address and netmask ifconfig will print out all the network interfaces and their information including the IP address
Read more

Hadoop TeraSort Benchmark

Posted on

TeraSort is one of Hadoop’s widely used benchmarks. Hadoop’s distribution contains both the input generator and sorting implementations: the TeraGen generates the input and TeraSort conducts the sorting. Here, we provide a short tutorial for using the Hadoop TeraSort benchmark. TeraGen generates random data that can be used as input data for a subsequent running
Read more

Hadoop Installation Tutorial (Hadoop 1.x)

Posted on

Update: If you are new to Hadoop and trying to install one. Please check the newer version: Hadoop Installation Tutorial (Hadoop 2.x). Hadoop mainly consists of two parts: Hadoop MapReduce and HDFS. Hadoop MapReduce is a programming model and software framework for writing applications, which is an open-source variant of MapReduce that is initially designed
Read more

A Simple Sort Benchmark on Hadoop

Posted on

After [[hadoop-installation-tutorial|installing Hadoop]], we usually run some benchmark programs to test whether the system works well. In the post of the Hadoop install tutorial, we show a very simple to grep strings from a simple sets of files. In this post, we introduce the Sort for testing and benchmarking Hadoop. The Sort program is also
Read more

mrcc – A Distributed C Compiler System on MapReduce

Posted on

The mrcc project’s homepage is here: mrcc project. Abstract mrcc is an open source compilation system that uses MapReduce to distribute C code compilation across the servers of the cloud computing platform. mrcc is built to use Hadoop by default, but it is easy to port it to other could computing platforms, such as MRlite,
Read more