map

How to get logs of a specific time range on Linux?

The logs I am processing is Hadoop log (log4j). It is in format like: 2014-09-20 21:55:11,855 INFO org.apache.hadoop.nfs.nfs3.IdUserGroup: Updated user map size: 36 2014-09-20 21:55:11,863 INFO org.apache.hadoop.nfs.nfs3.IdUserGroup: Updated group map size: 55 2014-09-20 22:10:11,907 INFO org.apache.hadoop.nfs.nfs3.IdUserGroup: Update cache now 2014-09-20 22:10:11,907 INFO org.apache.hadoop.nfs.nfs3.IdUserGroup: Not doing static UID/GID mapping because ‘/etc/nfs.map’ does not exist. Now, I…

How to set the number of mappers and reducers of Hadoop in command line?

How to set the number of mappers and reducers of Hadoop in command line? Number of mappers and reducers can be set like (5 mappers, 2 reducers): -D mapred.map.tasks=5 -D mapred.reduce.tasks=2 in the command line. In the code, one can configure JobConf variables. job.setNumMapTasks(5); // 5 mappers job.setNumReduceTasks(2); // 2 reducers Note that on Hadoop…

How to choose the number of mappers and reducers in Hadoop

How to choose the number of mappers and reducers in Hadoop to get good job performance? The Hadoop Wiki gives a discussion on this: http://wiki.apache.org/hadoop/HowManyMapsAndReduces Some valuable points: About the number of Maps: The number of maps is usually driven by the number of DFS blocks in the input files. Although that causes people to…

How to quickly find out failed disks’ SATA port in Linux? (how to map Linux disk names to SATA ports)

I find one disk failed on my server which have several ones installed. I know the disk’s name in Linux (e.g. sda, sdb). However, the Linux disk name to SATA port mapping does not follow the same order. Now, I want to find out the failed disks. How to quickly find out them and which…

Mmaping Memory Range Larger Than the Total Size of Physical Memory and Swap

In Linux, mmap() is a system call that is used to map a portion of a file or device into the memory of a process. This can be useful for a variety of purposes, such as memory-mapped I/O, shared memory, and virtual memory management. However, when mapping a large range of memory that is larger…

| | |

Where Does Evolution Save Its Data and Configuration Files on Linux?

Evolution is a great personal information management tool that provides Email, address book and calendar tools. Evolution provides many enterprise friendly feature such as native support to Microsoft Exchange connectivity for Emails, address books and calendars. Evolution uses various ways including plain files and dconf configuration systems. This post will give an introduction to the…

| |

Building and Installing Linux Kernel from the Source Code in an Existing Linux OS

Building Linux kernel may sound a complex and geek-only thing. However, as Linux kernel itself has much less depended tools/packages compared to other software packages, it is quite easy to compile, build and install a Linux kernel from the source code in an existing Linux OS. Building Linux kernel is needed if you need to…

| | | |

Hadoop Installation Tutorial (Hadoop 2.x)

Hadoop 2 or YARN is the new version of Hadoop. It adds the yarn resource manager in addition to the HDFS and MapReduce components. Hadoop MapReduce is a programming model and software framework for writing applications, which is an open-source variant of MapReduce designed and implemented by Google initially for processing and generating large data…

| | |

Keyboard Key Mapping for Emacs: Evil Mode and Rearranging Alt, Ctrl and Win Keys

Ctrl keys are important and possibly most frequently used in Emacs. However, it is painful on today’s common PC keyboards since Ctrl keys are usually in the corner of the keyboard main area. Why the key mappings in Emacs are designed like this? After it was designed, Emacs was commonly on the Lisp Machine keyboards…

Improving Font Rendering for Fedora Using Bytecode Interpreter

Fedora’s font rendering isn’t very nice. At least on my laptop with Fedora 12. Bytecode Interpreter (BCI for short) is disabled by default because of patent issues. As the TrueType bytecode patents have expired. We may enable BCI in Fedora now. TrueType announced that BCI is enabled by default from 2.4. Fedora 12’s TrueType version…

Finding out Linux Network Configuration Information

There is various network configuration information in Linux and lots tools can be used to find out those configuration information. Finding out these network information in Fedora Linux as the example will be introduced. IP address, MAC address and netmask ifconfig will print out all the network interfaces and their information including the IP address…

Hadoop TeraSort Benchmark

TeraSort is one of Hadoop’s widely used benchmarks. Hadoop’s distribution contains both the input generator and sorting implementations: the TeraGen generates the input and TeraSort conducts the sorting. Here, we provide a short tutorial for using the Hadoop TeraSort benchmark. TeraGen generates random data that can be used as input data for a subsequent running…

| |

Hadoop Installation Tutorial (Hadoop 1.x)

Update: If you are new to Hadoop and trying to install one. Please check the newer version: Hadoop Installation Tutorial (Hadoop 2.x). Hadoop mainly consists of two parts: Hadoop MapReduce and HDFS. Hadoop MapReduce is a programming model and software framework for writing applications, which is an open-source variant of MapReduce that is initially designed…

mrcc – A Distributed C Compiler System on MapReduce

The mrcc project’s homepage is here: mrcc project. Abstract mrcc is an open source compilation system that uses MapReduce to distribute C code compilation across the servers of the cloud computing platform. mrcc is built to use Hadoop by default, but it is easy to port it to other could computing platforms, such as MRlite,…