Bowtie2 mapping overview

3 minute read

Published: February 24, 2015

You have mapped reads of genetic code against a genomic reference with bowtie2 (http://bowtie-bio.sourceforge.net/bowtie2/index.shtml) and would like to get some overview statistics from your SAM files?

Here, I present a python script that takes one or more SAM files as input and provides the following information in a text file named

the total number of aligned reads
the percentage of reads with a mapping quality below 20
the percentage of unmapped reads
the percentage of discordantly mapped reads
the percentage of non-unique mappings

The script uses fields that are specific to SAM files:

Detection of discordantly mapped reads is based on the field ‘YT:Z:DP’
Detection of non-unique mappings is based on the field ‘XS:i’

It was not tested on other sam files.

Download the script

You can download the script Bowtie2MappingOverview.py from:

https://gist.github.com/alj1983/80a0861cf9a033cdba42/download

How to use it?

Extract the python script and save it to the folder where one or several sam files are located. Open a terminal and navigate to this folder.

To apply the script on one sam file, type:

python Bowtie2MappingOverview.py Samfile.sam

Change Samfile.sam to the name of your own sam file.

To apply it to several sam files, type the name of each of them after the first sam file name, like:

python Bowtie2MappingOverview.py Samfile1.sam Samfile2.sam Samfile3.sam

In the above command, 3 sam files are analyzed. The created out.txt file will contain information in the following format:

sequence	total	mapq20	unmapped	discordant	nonunique
Samfile1.sam	12369010	0.199497	0.797805	0.005362	0.001147
Samfile2.sam	12369010	0.747813	0.150322	0.072331	0.055457
Samfile3.sam	11253608	0.200873	0.796430	0.004038	0.001412

To apply it to all sam files in a folder, type:

python Bowtie2MappingOverview.py *.sam

Code

The script contains the following python code

import re # to use regular expressions import sys # to read user arguments

Define the search patterns:

HeaderPattern= re.compile(‘^@’) DiscordantPattern= re.compile(‘YT:Z:DP’) # This field is specific to mappings that were done with Bowtie2 NonUniquePattern= re.compile(‘XS:i’) # This field is specific for mappings that mapped more than once outfile=open(‘out.txt’,’w’) headline= [‘sequence’,’total’,’mapq20’,’unmapped’,’discordant’,’nonunique’] headlineout=’\t’.join(headline) outfile.write(headlineout) outfile.write(‘\n’)

def MappingOverview(samfile): ‘’’ Counts the total number of aligned reads; the percentage of reads with a mapping quality below 20; the percentage of unmapped reads; the percentage of discordantly mapped reads; and the percentage of non-unique mappings. ‘’’ sam=open(samfile) total=0 mapq20=0 unmapped=0 discordant=0 nonunique=0 for line in sam: line=line.strip() if not HeaderPattern.search(line): # if there is no @ at the beginning of the line total=total+1 splitted=line.split(‘\t’) # Split the lines by tab separators if int(splitted[4])>19: # The fifth column – which is the mapping quality – shall be over 19 mapq20=mapq20+1 if splitted[2] is ‘*’: # column 3 in the SAM file shows “Name of reference sequence where alignment occurs” or a * for unmapped reads. unmapped=unmapped+1 if DiscordantPattern.search(line): # Look for the field indicating discordant mappings discordant=discordant+1 if NonUniquePattern.search(line): # Look for the field indicating concordant mappings nonunique=nonunique+1 outfile.write(‘%s\t%12i\t%7.6f\t%7.6f\t%7.6f\t%7.6f\n’ % (samfile, total, float(mapq20)/float(total), float(unmapped)/float(total), float(discordant)/float(total), float(nonunique)/float(total)))

for s in sys.argv[1:]: MappingOverview(str(s))

Share on

Twitter Facebook LinkedIn

Alexander Jueterbock

Bowtie2 mapping overview

Download the script

How to use it?

Code

Define the search patterns:

Share on

You May Also Enjoy

Selecting MethylRAD tags with less degenerate adaptor ends

In silico digestion with type IIB restricrion enzymes

R package ‘MaxentVariableSelection’ released

Blog Post number 4