How to merge sam files on Linux?

ByEric Ma Mar 24, 2018Mar 24, 2018

The samtools merge can merge bam files while it can not work for sam files.

According to the sam format specification, header lines start with @, while alignment lines do not. So you can use grep to merge sam files as follows efficiently.

Assume the header is from 0.sam, the files to be merged are 0.sam, 1.sam and 2.sam, and the merged output sam file is out.sam.

header=0.sam
files="0.sam 1.sam 2.sam"
output=out.sam

(grep ^@ $header; for f in $files; do grep -v ^@ $f; done) > $output

It should be much faster than the way of “convert sams to bams, merge bams by samtools merge, convert bam to sam”.

Linux | Virtualization

Setting Up Ubuntu DomU on Xen: Ubuntu 10.10 on Fedora Xen Dom0

ByEric Ma Jul 13, 2013Apr 1, 2020

Setting up Ubuntu 10.10 DomU on top of Fedora Xen Dom0 is introduced in this post. The process of setting up Ubuntu 10.10 DomU is the same as Setting Up Stable Xen DomU with Fedora: Unmodified Fedora 12 on top of Xenified Fedora 12 Dom0 with Xen 4.0 This post only show the difference which…

Linux | Linux Kernel

Linux Kernel: drm/i915/guc: Don’t enable GuC/HuC in auto mode on pre-Gen11

ByTony Oct 1, 2020Oct 1, 2020

This change “drm/i915/guc: Don’t enable GuC/HuC in auto mode on pre-Gen11” (commit 87d855e) in Linux kernel is authored by Michal Wajdeczko <michal.wajdeczko [at] intel.com> on Fri Jul 12 11:14:44 2019 +0000. Description of “drm/i915/guc: Don’t enable GuC/HuC in auto mode on pre-Gen11” The change “drm/i915/guc: Don’t enable GuC/HuC in auto mode on pre-Gen11” introduces changes…

Software

Make Better Decisions for Your Businesses with Data Visualization

ByJoseph Macwan Feb 24, 2017Aug 30, 2020

In today’s time, data visualization has become a significant part of the success story of an organization. With the help of right techniques, visualizing data can reveal insights which the management staff can use in their decision-making in order to make sound data-driven decisions. Mapping software is among the robust data visualization tools that you…

Tutorial

Hadoop TeraSort Benchmark

ByEric Ma Dec 18, 2012Sep 5, 2020

TeraSort is one of Hadoop’s widely used benchmarks. Hadoop’s distribution contains both the input generator and sorting implementations: the TeraGen generates the input and TeraSort conducts the sorting. Here, we provide a short tutorial for using the Hadoop TeraSort benchmark. TeraGen generates random data that can be used as input data for a subsequent running…

Programming | Tutorial

Splitting a String by Another String in C++: A Flexible Utility Function

ByEthan Ainsworth May 1, 2023

In this post, we will explore a flexible utility function for splitting a string based on a given delimiter using C++ and the standard library. This allows us to break down complex strings into smaller parts that are easier to process and manipulate. The C++ Utility Function to Split a String by Another String Background:…

Similar Posts

Leave a Reply Cancel reply