Handling Sparse Files on Linux

Posted on

Sparse files are common in Linux/Unix and are also supported by Windows (e.g. NTFS) and macOSes (e.g. HFS+). Sparse files uses storage efficiently when the files have a lot of holes (contiguous ranges of bytes having the value of zero) by storing only metadata for the holes instead of using real disk blocks. They are
Read more

Killing Running Bash Script Process Itself and All Child Processes In Linux

Posted on

In Linux, how to kill a process and all its child processes? For example, a Bash script A starts B, B starts C and C calls rsync. I would like to kill A and all its child processes all together. How to do this? There are possibly many answers to this question. One of the
Read more

How to handle spaces in paths with rsync on Linux?

Posted on

The common rsync commands seems not handle spaces well. For example, rsync -avxP file “user@server:/data/my dir” It reports: rsync: link_stat “/home/zma/file” failed: No such file or directory (2) How to make rsync handle spaces well? You can use the –protect-args option of rsync. $ rsync –protect-args -avxP file “user@server:/data/my dir” What does –protect-args do: -s,
Read more

How to choose the key used by SSH for a specific host?

Posted on

How to choose a key used rather than the default ~/.ssh/id_rsa when ssh to a remote server for a specific host? You have at least 2 choices for choosing the key used by ssh. Taking ~/.ssh/key1 and user@example.com as the example here. Method one, specify the key in command line with the -i option of
Read more

How to exclude directories with certain names from rsync on Linux?

Posted on

How to exclude directories with certain names like “cache” from rsync on Linux during backup? The “cache” directory may in many different paths, such as file1/cache/ or file2/cache/, and adding all “cache” directories to rsync command is not a doable way. You can use rsync with –exclude=cache/ like rsync -avxP –exclude=cache/ /path/to/src/directory/ /path/to/dst/dir/

SEEK_HOLE and SEEK_DATA: efficiently archive/copy large sparse files

Posted on

How to efficiently archive a very large sparse file, say 1TB? The sparse file may contains a small amount of data, say 32MB. SEEK_HOLE and SEEK_DATA The SEEK_HOLE/SEEK_DATA functionalities play the trick and makes `tar` and `cp` handle the large sparse file very efficiently. `lseek` with `SEEK_HOLE` returns the offset of the start of the
Read more

Notes for Beginners of Software Development on Linux

Posted on

Linux is a great platform for software development targeting servers or backends. In general, working on Linux is very productive. The problem that beginners on Linux face is the the learning curve is steep at the beginning. But believe me, after you get through the initial green steep learning step as in the figure below
Read more

Hadoop Installation Tutorial (Hadoop 2.x)

Posted on

Hadoop 2 or YARN is the new version of Hadoop. It adds the yarn resource manager in addition to the HDFS and MapReduce components. Hadoop MapReduce is a programming model and software framework for writing applications, which is an open-source variant of MapReduce designed and implemented by Google initially for processing and generating large data
Read more

Linux Cluster Solutions

Posted on

Solutions to Linux cluster construction and management such as unified account management, NFS home directory, network configurations are summarised in this post. The post is keeping updating while new solutions is added to this site. ===Account and storage management=== [[unified-linux-login-and-home-directory-using-openldap-and-nfsautomount|Unified Linux Login and Home Directory Using OpenLDAP and NFS/automount]] [[backup-linux-home-directory-using-rsync|Backup Linux Home Directory Using rsync]]
Read more

How to Backup Linux Home Directories Using rsync

Posted on

I need to backup my Linux home directory to one of my portable hard disk. I tried to use git, but failed since git doesn’t support large file (I failed after many tries, I have file larger than 5G). I find rsync, the fast, versatile, remote (and local) file-copying tool and I am happy with
Read more