Filter Bowtie2 alignments

3 minute read

Published:

Sequence alignments generally need to be filtered for certain criteria. The Python script Bowtie2Filtering.py enables you to filter out reads from Bowtie2 alignments (SAM files) that are

  • of low mapping quality (<20)
  • unmapped
  • discordantly mapped
  • ambiguous (mapped to several places)

The program makes use of flags that are specific for Bowtie2 alignment files and thus will not work with SAM files that were created with different alignment softwares.

Download

You can download the script Bowtie2Filtering.py from https://gist.github.com/alj1983/7a26d34a019aa0329bbd/download

How to use it?

usage: Bowtie2Filtering.py [-h] [-mq] [-u] [-d] [-a] [-s SAM]

optional arguments: -h, –help show this help message and exit -mq, –MappingQuality Filter out reads with a mapping quality below 20 -u, –unmapped Filter out unmapped reads -d, –discordant Filter out discordantly mapped reads -a, –ambiguous Filter out ambiguous (non-unique) mappings -s SAM, –sam SAM sam file name

Code

#!/usr/bin/python

import re # to use regular expressions #import sys # to read user arguments import argparse # To allow users setting flags in order to define what to filter for.

parser = argparse.ArgumentParser(description=”Filters bowtie2 alignments (.sam files)”)

parser.add_argument(“-mq”, “–MappingQuality”, help=”Filter out reads with a mapping quality below 20”, action=”store_true”) parser.add_argument(“-u”, “–unmapped”, help=”Filter out unmapped reads”, action=”store_true”) parser.add_argument(“-d”, “–discordant”, help=”Filter out discordantly mapped reads”, action=”store_true”) parser.add_argument(“-a”, “–ambiguous”, help=”Filter out ambiguous (non-unique) mappings”, action=”store_true”) parser.add_argument(“-s”,”–sam”, help=”sam file name”) args=parser.parse_args()

Define the search patterns:

HeaderPattern= re.compile(‘^@’) DiscordantPattern= re.compile(‘YT:Z:DP’) # This field is specific to mappings that were done with Bowtie2 NonUniquePattern= re.compile(‘XS:i’) # This field is specific for mappings that mapped more than once SamFileNamePattern = re.compile(‘(.*).sam’)

def MappingFilter(samfile): ‘’’ Filters the percentage of reads with a mapping quality below 20; the percentage of unmapped reads; the percentage of discordantly mapped reads; and the percentage of non-unique mappings. ‘’’ sam=open(samfile) OutfilePrefix1 =SamFileNamePattern.match(samfile) # define outfile OutfilePrefix=OutfilePrefix1.group(1)
outfile=open(OutfilePrefix + ‘filtered.sam’,’w’) total=0 filtered=0 for line in sam: line=line.strip() if HeaderPattern.search(line): # If this is a header line just write it to the outfile outfile.write(line+’\n’) else: # if there is no @ at the beginning of the line tag=’keep’ # Keep the line by default, and only throw it out if any of the conditions below are not met total=total+1 # counting the total number of reads splitted=line.split(‘\t’) # Split the lines by tab separators if args.MappingQuality: # if the user wants to filter out reads below a mapping quality of 20 if int(splitted[4])>19: # The fifth column – which is the mapping quality – shall be over 19 tag=’keep’ else: tag=’discard’ if tag is ‘keep’: # Continue only if the alignment is not yet filtered out if args.unmapped: # if the unmapped reads shall be filtered out if splitted[2] is not ‘*’: # column 3 in the SAM file shows “Name of reference sequence where alignment occurs” or a * for unmapped reads. tag=’keep’ else: tag=’discard’ if tag is ‘keep’: # Continue only if the alignment is not yet filtered out if args.discordant: # if the discordant reads shall be filtered out if DiscordantPattern.search(line): # Look for the field indicating discordant mappings tag=’discard’ else: tag=’keep’ if tag is ‘keep’: # Continue only if the alignment is not yet filtered out if args.ambiguous: # if the ambiguous reads shall be filtered out if NonUniquePattern.search(line): # Look for the field indicating discordant mappings tag=’discard’ else: tag=’keep’ if tag is ‘keep’: # Write the alignment only if it was not filtered out before filtered = filtered+1 # count the number of reads remaining after filtering outfile.write(line+’\n’) print samfile, ‘ - unfiltered reads:’, ‘%13i’ % total print samfile, ‘ - filtered reads:’, ‘%13i’ % filtered

MappingFilter(str(args.sam))