Blog posts

Selecting MethylRAD tags with less degenerate adaptor ends

3 minute read

Published: May 05, 2017

The python script RTR.py uses fragments that result from DNA digestion with a IIb restriction enzyme (see this post) and selects only those that fit to adaptor ends with some specific bases. This allows to adjust the number of digested fragments that are sequenced. See the paper of Wang et. al presenting the MethylRAD method and the idea of using less degenerative adaptor ends to reduce MethylRAD tag representation. Read more

In silico digestion with type IIB restricrion enzymes

3 minute read

Published: April 11, 2017

The python script I present in this blog post, programs: to simulate digestion with type IIb restriction enzymes. Read more

R package ‘MaxentVariableSelection’ released

less than 1 minute read

Published: September 23, 2015

How to find a set of variables that are relevant for predicting a species' distribution? To identify the set of most important variables that are not auto-correlated can be very tedious. My new R package ‘MaxentVariableSelection’ automatizes the process for Niche modelling with the program Maxent. Read more

Blog Post number 4

less than 1 minute read

Published: August 14, 2015

This is a sample blog post. Lorem ipsum I can’t remember the rest of lorem ipsum and don’t have an internet connection right now. Testing testing testing this blog post. Blog posts are cool. Read more

Extract target windows from a fasta file

10 minute read

Published: July 19, 2015

You found a set of interesting SNPs that are putatively under selection? Here, I provide two python scripts that will help you to extract windows of putatively adaptive regions around these SNPs. Our working group, for example, uses fasta files with enrich putatively adaptive regions in targeted re-sequencing approaches. Read more

High throughput sequencing of non-model organisms

less than 1 minute read

Published: May 26, 2015

Next week, I will provide bioninformatics training within the PhD course High throughput sequencing of non-model organisms. This course runs now the second year at the University of Nordland in Norway and is accompanied by a webpage that provides access to all course material. Interested? Here is the link. Read more

Unix cheat sheet

4 minute read

Published: May 14, 2015

Being familiar with the Unix commandline is essential for bioinformatics data analysis but it might seem complex or tedious at the beginning. However, you will get already quite far with only a few commands. Here, I provide a cheat sheet that gives a quick overview of the most essential commands which will increase your efficiency in data analysis and file handling drastically - promised! You will find more details on most of these commands in my previous blog posts that were marked with the ‘Commandline’ tag. The cheat sheet can also be downloaded as a PDF file (click here). Read more

Count realigned reads in SAM file

1 minute read

Published: May 03, 2015

Reads that are spanning InDels (Insertion and Deletion variants) are often misaligned and can result in false positive SNPs (Single Nucleotide Polymorphisms). A popular tool that can re-align these reads is GATK's IndelRealigner. Once the job is done it's good to know how many of the reads had been actually re-aligned. Read more

Extract SNPs from a pileup file

1 minute read

Published: April 26, 2015

If you want to use the PoPoolation pipeline to analyze pooled next generation sequencing data, you need a pileup file as input format (created with samtools mpileup from a bam file). However, if you have already a set of trusted SNPs (Single Nucleotide Polymorphisms) you might want to run the PoPoolation analysis only on this set and not on all bases in the alignment. This blog post introduces a python script that allows you to extract such set of trusted SNPs from a pileup file. Read more

Determine read-coverage thresholds

5 minute read

Published: April 21, 2015

If you want to identify sequence variants like SNPs (Single Nucleotide Polymorphisms) or InDels (Insertions and Deletions) in your sequencing data, you want to avoid regions of too low coverage. The lower the coverage the more difficult it becomes to discriminate sequencing errors from real sequence variants. Read more

Count bases in Fasta file

less than 1 minute read

Published: April 02, 2015

The script in this blog post allows you to count the number of bases or amino acids in a fasa file. This can be useful, for example, to identify the size of a genome or an assembly. Read more

Mapped fragment lengths

3 minute read

Published: March 28, 2015

How much do your paired-end reads overlap? Let's assume your forward and reverse reads each cover 300 bp (base pairs). Now, do they overlap fully and cover only a range of 300 bp in the genome? Or do they span across 1500 or so bp of your genome with an uncovered gap of 900 bp between the forward and reverse read? Read more

Filter Bowtie2 alignments

3 minute read

Published: March 03, 2015

Sequence alignments generally need to be filtered for certain criteria. The Python script Bowtie2Filtering.py enables you to filter out reads from Bowtie2 alignments (SAM files) that are Read more

Bowtie2 mapping overview

3 minute read

Published: February 24, 2015

You have mapped reads of genetic code against a genomic reference with bowtie2 (http://bowtie-bio.sourceforge.net/bowtie2/index.shtml) and would like to get some overview statistics from your SAM files? Read more