How to count the number of reads in each chromosome in a bam file?

How to count the number of reads in each chromosome in a bam file? The bam file is already sorted by the chromosome names.

If the bam file is indexed, you may quickly get these info from the index:

samtools idxstats in.bam | awk '{print $1" "$3}'

If the bam file is not indexed, you may “count” it by uniq:

samtools view in.bam | awk '{print $3}' | uniq -c

(if it is a sam file like in.sam, replace the samtools view in.bam with cat in.sam)

In both cases, samtools provides the tools to parse/show the bam file content.

Eric Ma

Eric is a systems guy. Eric is interested in building high-performance and scalable distributed systems and related technologies. The views or opinions expressed here are solely Eric's own and do not necessarily represent those of any third parties.

Leave a Reply

Your email address will not be published. Required fields are marked *