If you want to use the PoPoolation pipeline to analyze pooled next generation sequencing data, you need a pileup file as input format (created with samtools mpileup from a bam file). However, if you have already a set of trusted SNPs (Single Nucleotide Polymorphisms) you might want to run the PoPoolation analysis only on this set and not on all bases in the alignment. This blog post introduces a python script that allows you to extract such set of trusted SNPs from a pileup file.
You can download the script SNPsFromPileup.py
from
https://gist.github.com/alj1983/0334a46ec4a917cbe7ab/download
usage: SNPsFromPileup.py [-h] [-s SNPS] [-p PILEUP] Reduces pileup files to specified SNP locations optional arguments: -h, --help show this help message and exit -s SNPS, --snps SNPS text file containing SNP locations. Each line contains the chromosome specification and the TAB-separated position of a SNP -p PILEUP, --pileup PILEUP pileup file, created with samtools mpileup
Here an example text file with SNP locations to clarify its format:
Chromosome Position gi|612419702|gb|AZHG01039810.1| 489 gi|612419702|gb|AZHG01039810.1| 495 gi|612419702|gb|AZHG01039810.1| 541 gi|612419702|gb|AZHG01039810.1| 563 gi|612419730|gb|AZHG01039796.1| 295 gi|612419730|gb|AZHG01039796.1| 312 gi|612419730|gb|AZHG01039796.1| 393
import re # to use regular expressions import argparse # To allow users setting flags in order to define what to filter for. parser = argparse.ArgumentParser(description="Reduces pileup files to specified SNP locations") parser.add_argument("-s","--snps", help="text file containing SNP locations. Each line contains the chromosome specification and the TAB-separated position of a SNP") parser.add_argument("-p","--pileup", help="pileup file, created with samtools mpileup") args=parser.parse_args() def ExtractingSNPs(snpfile): ''' Stores the chromosome specification and the position of the SNPs into a table; ''' snplocations=open(snpfile) #chrompos_table = [] chrompos_list = [] for line in snplocations: line=line.strip() columns = line.split('\t') #chrompos_table.append([columns[0],columns[1]]) chrompos_list.append(str(columns[0]+columns[1])) return chrompos_list SNPlist = ExtractingSNPs(str(args.snps)) # Extract the SNP locations from the pileup file outfile=open(str(args.pileup)+'_ReducedToSNPs.pileup','w') def SNPsInPileup(pileupfile): ''' Reads the pileup file and extracts only those lines that contain SNPs ''' pileups=open(pileupfile) for line in pileups: line=line.strip() columns = line.split('\t') chrompos= str(columns[0]+columns[1]) if chrompos in SNPlist: lines = '\t'.join(columns) outfile.write(lines+'\n') SNPsInPileup(str(args.pileup)) outfile.close()