marinetics blog


Bioinformatics Solutions in Ecology and Evolution

26 Apr 2015

Extract SNPs from a pileup file

Tagged: python NGS pileup SNP






Related Posts

If you want to use the PoPoolation pipeline to analyze pooled next generation sequencing data, you need a pileup file as input format (created with samtools mpileup from a bam file). However, if you have already a set of trusted SNPs (Single Nucleotide Polymorphisms) you might want to run the PoPoolation analysis only on this set and not on all bases in the alignment. This blog post introduces a python script that allows you to extract such set of trusted SNPs from a pileup file.

Download

You can download the script SNPsFromPileup.py from https://gist.github.com/alj1983/0334a46ec4a917cbe7ab/download

How to use it?

usage: SNPsFromPileup.py [-h] [-s SNPS] [-p PILEUP]

Reduces pileup files to specified SNP locations

optional arguments:
  -h, --help            show this help message and exit
  -s SNPS, --snps SNPS  text file containing SNP locations. Each line contains
                        the chromosome specification and the TAB-separated
                        position of a SNP
  -p PILEUP, --pileup PILEUP
                        pileup file, created with samtools mpileup

Here an example text file with SNP locations to clarify its format:

Chromosome      Position
gi|612419702|gb|AZHG01039810.1| 489
gi|612419702|gb|AZHG01039810.1| 495
gi|612419702|gb|AZHG01039810.1| 541
gi|612419702|gb|AZHG01039810.1| 563
gi|612419730|gb|AZHG01039796.1| 295
gi|612419730|gb|AZHG01039796.1| 312
gi|612419730|gb|AZHG01039796.1| 393

Code

import re # to use regular expressions
import argparse # To allow users setting flags in order to define what to filter for.

parser = argparse.ArgumentParser(description="Reduces pileup files to specified SNP locations")
parser.add_argument("-s","--snps", help="text file containing SNP locations. Each line contains the chromosome specification and the TAB-separated position of a SNP")
parser.add_argument("-p","--pileup", help="pileup file, created with samtools mpileup")
args=parser.parse_args()


def ExtractingSNPs(snpfile):
    '''
    Stores the chromosome specification and the position of the SNPs into a table;
    '''

    snplocations=open(snpfile)
    #chrompos_table = []
    chrompos_list = []
    for line in snplocations:
        line=line.strip()
        columns = line.split('\t')
        #chrompos_table.append([columns[0],columns[1]])
        chrompos_list.append(str(columns[0]+columns[1]))

    return chrompos_list

SNPlist = ExtractingSNPs(str(args.snps))

# Extract the SNP locations from the pileup file
outfile=open(str(args.pileup)+'_ReducedToSNPs.pileup','w')

def SNPsInPileup(pileupfile):
      '''
      Reads the pileup file and extracts only those lines that contain SNPs
      '''
      pileups=open(pileupfile)
      for line in pileups:
           line=line.strip()
           columns = line.split('\t')
           chrompos= str(columns[0]+columns[1])
           if chrompos in SNPlist:
                lines = '\t'.join(columns)
                outfile.write(lines+'\n')

SNPsInPileup(str(args.pileup))

outfile.close()