Cluster

Three Methods of Executing Commands on Many Nodes in Parallel via SSH on Linux

It is common to execute commands on many nodes/hosts via SSH for managing a cluster of Linux servers. On Linux, there are many choices for this task. Generally, to run commands on many nodes, there are two modes: serial mode and parallel mode. In serial mode, the command is executed on the node one by…

| |

Lazy Linux Admins Going to Server Rooms Less: Forced Reboot, Auto Reboot after Kernel Panic and Email Notification after Reboot

Having to go the the server room to reset servers is the most headache thing for admins managing a cluster of Linux servers in a remote site. Either you can ping the server but can not ssh to it, or you even can not ping it. There are various reasons that may cause a Linux…

| | | |

Hadoop Installation Tutorial (Hadoop 2.x)

Hadoop 2 or YARN is the new version of Hadoop. It adds the yarn resource manager in addition to the HDFS and MapReduce components. Hadoop MapReduce is a programming model and software framework for writing applications, which is an open-source variant of MapReduce designed and implemented by Google initially for processing and generating large data…

| | |

Improving ssh/scp Performance by Choosing Suitable Ciphers

Update on Oct. 9, 2014: You should be aware of the possible security problems of blowfish and it is suggested not to be used. Instead, you may consider ChaCha20 as suggested by Tony Arcieri. To use this with OpenSSH, you need to specify the Ciphers in your .ssh/config files as chacha20-poly1305@openssh.com possibly with another default…

Linux Cluster Solutions

Solutions to Linux cluster construction and management such as unified account management, NFS home directory, network configurations are summarised in this post. The post is keeping updating while new solutions is added to this site. ===Account and storage management=== [[unified-linux-login-and-home-directory-using-openldap-and-nfsautomount|Unified Linux Login and Home Directory Using OpenLDAP and NFS/automount]] [[backup-linux-home-directory-using-rsync|Backup Linux Home Directory Using rsync]]…

|

Xen DomU’s I/O Performance of LVM and loopback Backed VBDs

This posts list benchmark (using bonnie++) result of I/O performance of Xen LVM and loopback backed VBDs. The configuration of machines Dom0 VCPU: 2 (Intel(R) Xeon(R) CPU E5520  @ 2.27GHz) Memory: 2GB Xen and Linux kernel: Xen 3.4.3 with Xenified 2.6.32.13 kernel DomU VCPU: 2 Memory: 2GB Linux kernel: Fedora (2.6.32.19-163.fc12.x86_64) DomU’s profile: name=”10.0.1.200″ vcps=2…

| |

Hadoop Installation Tutorial (Hadoop 1.x)

Update: If you are new to Hadoop and trying to install one. Please check the newer version: Hadoop Installation Tutorial (Hadoop 2.x). Hadoop mainly consists of two parts: Hadoop MapReduce and HDFS. Hadoop MapReduce is a programming model and software framework for writing applications, which is an open-source variant of MapReduce that is initially designed…

Reading List for Distributed Systems and Cloud Computing

Understanding the literature is usually the first step to do research, which is the same for systems research on cloud computing. A reading list may help a lot to those that just start in cloud computing research. Prof. Lin Gu, my PhD supervisor, compiled a reading list for system research on cloud computing. The reading…

Hadoop Default Ports

Hadoop’s namenode and datanodes expose a bunch of TCP ports used by Hadoop’s daemons to communicate to each other or listen directly to users’ requests. These ports information are needed by both the Hadoop users and cluster administrators to write programs or configure firewalls/gateways accordingly. A post written by Philip Zeyliger from Cloudera’s blog summarizes the…

Pitfalls and Lessons on Configuing and Tuning Hadoop

This post lists pitfalls and lessons learning when configuring and tuning Hadoop. Hadoop with IPv6 Hadoo doesn’t support IPv6 currently (up to 0.20.2 and 0.21.0): Hadoop and IPv6. The performance of the cluster may suffer from turning IPv6 on in clusters: mail archive. One good practice is to disable IPv6 on servers in the Hadoop…

Setting Up Standalone (Local) Hadoop

Hadoop is designed to run on [[hadoop-installation-tutorial|hundreds to thousands of computers]] inside cluster. However, Hadoop is configured to run things in a non-distributed mode as a single Java process by default. This is specially useful for debugging since distributed debugging is really a nightmare. This post introduces how to set up a standalone Hadoop environment….