Posts by Tags

Commandline

Unix cheat sheet

4 minute read

Published:

Being familiar with the Unix commandline is essential for bioinformatics data analysis but it might seem complex or tedious at the beginning. However, you will get already quite far with only a few commands. Here, I provide a cheat sheet that gives a quick overview of the most essential commands which will increase your efficiency in data analysis and file handling drastically - promised! You will find more details on most of these commands in my previous blog posts that were marked with the ‘Commandline’ tag. The cheat sheet can also be downloaded as a PDF file (click here). Read more

View and extract data

11 minute read

Published:

Got some sequencing data? Many powerful tools to analyse them are based on the command line and this is part of a series of short but essential posts that help you getting started. I assume that you are working on a UNIX-based operating system (‘Mac’ or ‘Linux’ computer). Read more

Parallel data analysis

6 minute read

Published:

Got some sequencing data? Many powerful tools to analyse them are based on the command line and this is part of a series of short but essential posts that help you getting started. I assume that you are working on a UNIX-based operating system (‘Mac’ or ‘Linux’ computer). Read more

Navigating to files and directories

3 minute read

Published:

Got some sequencing data? Many powerful tools to analyse them are based on the command line and this is part of a series of short but essential posts that help you getting started. I assume that you are working on a UNIX-based operating system (‘Mac’ or ‘Linux’ computer). Read more

File Transfer

2 minute read

Published:

Got some sequencing data? Many powerful tools to analyse them are based on the command line and this is part of a series of short but essential posts that help you getting started. I assume that you are working on a UNIX-based operating system (‘Mac’ or ‘Linux’ computer). Read more

Remote connection

2 minute read

Published:

Got some sequencing data? Many powerful tools to analyse them are based on the command line and this is part of a series of short but essential posts that help you getting started. I assume that you are working on a UNIX-based operating system (‘Mac’ or ‘Linux’ computer). Read more

Epigenetics

Selecting MethylRAD tags with less degenerate adaptor ends

3 minute read

Published:

The python script RTR.py uses fragments that result from DNA digestion with a IIb restriction enzyme (see this post) and selects only those that fit to adaptor ends with some specific bases. This allows to adjust the number of digested fragments that are sequenced. See the paper of Wang et. al presenting the MethylRAD method and the idea of using less degenerative adaptor ends to reduce MethylRAD tag representation. Read more

Fasta

Selecting MethylRAD tags with less degenerate adaptor ends

3 minute read

Published:

The python script RTR.py uses fragments that result from DNA digestion with a IIb restriction enzyme (see this post) and selects only those that fit to adaptor ends with some specific bases. This allows to adjust the number of digested fragments that are sequenced. See the paper of Wang et. al presenting the MethylRAD method and the idea of using less degenerative adaptor ends to reduce MethylRAD tag representation. Read more

Extract target windows from a fasta file

10 minute read

Published:

You found a set of interesting SNPs that are putatively under selection? Here, I provide two python scripts that will help you to extract windows of putatively adaptive regions around these SNPs. Our working group, for example, uses fasta files with enrich putatively adaptive regions in targeted re-sequencing approaches. Read more

Count bases in Fasta file

less than 1 minute read

Published:

The script in this blog post allows you to count the number of bases or amino acids in a fasa file. This can be useful, for example, to identify the size of a genome or an assembly. Read more

NGS

Selecting MethylRAD tags with less degenerate adaptor ends

3 minute read

Published:

The python script RTR.py uses fragments that result from DNA digestion with a IIb restriction enzyme (see this post) and selects only those that fit to adaptor ends with some specific bases. This allows to adjust the number of digested fragments that are sequenced. See the paper of Wang et. al presenting the MethylRAD method and the idea of using less degenerative adaptor ends to reduce MethylRAD tag representation. Read more

Extract target windows from a fasta file

10 minute read

Published:

You found a set of interesting SNPs that are putatively under selection? Here, I provide two python scripts that will help you to extract windows of putatively adaptive regions around these SNPs. Our working group, for example, uses fasta files with enrich putatively adaptive regions in targeted re-sequencing approaches. Read more

High throughput sequencing of non-model organisms

less than 1 minute read

Published:

Next week, I will provide bioninformatics training within the PhD course High throughput sequencing of non-model organisms. This course runs now the second year at the University of Nordland in Norway and is accompanied by a webpage that provides access to all course material. Interested? Here is the link. Read more

Count realigned reads in SAM file

1 minute read

Published:

Reads that are spanning InDels (Insertion and Deletion variants) are often misaligned and can result in false positive SNPs (Single Nucleotide Polymorphisms). A popular tool that can re-align these reads is GATK's IndelRealigner. Once the job is done it's good to know how many of the reads had been actually re-aligned. Read more

Extract SNPs from a pileup file

1 minute read

Published:

If you want to use the PoPoolation pipeline to analyze pooled next generation sequencing data, you need a pileup file as input format (created with samtools mpileup from a bam file). However, if you have already a set of trusted SNPs (Single Nucleotide Polymorphisms) you might want to run the PoPoolation analysis only on this set and not on all bases in the alignment. This blog post introduces a python script that allows you to extract such set of trusted SNPs from a pileup file. Read more

Determine read-coverage thresholds

5 minute read

Published:

If you want to identify sequence variants like SNPs (Single Nucleotide Polymorphisms) or InDels (Insertions and Deletions) in your sequencing data, you want to avoid regions of too low coverage. The lower the coverage the more difficult it becomes to discriminate sequencing errors from real sequence variants. Read more

Count bases in Fasta file

less than 1 minute read

Published:

The script in this blog post allows you to count the number of bases or amino acids in a fasa file. This can be useful, for example, to identify the size of a genome or an assembly. Read more

Mapped fragment lengths

3 minute read

Published:

How much do your paired-end reads overlap? Let's assume your forward and reverse reads each cover 300 bp (base pairs). Now, do they overlap fully and cover only a range of 300 bp in the genome? Or do they span across 1500 or so bp of your genome with an uncovered gap of 900 bp between the forward and reverse read? Read more

Filter Bowtie2 alignments

3 minute read

Published:

Sequence alignments generally need to be filtered for certain criteria. The Python script Bowtie2Filtering.py enables you to filter out reads from Bowtie2 alignments (SAM files) that are Read more

NicheModeling

PERL

Count bases in Fasta file

less than 1 minute read

Published:

The script in this blog post allows you to count the number of bases or amino acids in a fasa file. This can be useful, for example, to identify the size of a genome or an assembly. Read more

R

SNP

Extract SNPs from a pileup file

1 minute read

Published:

If you want to use the PoPoolation pipeline to analyze pooled next generation sequencing data, you need a pileup file as input format (created with samtools mpileup from a bam file). However, if you have already a set of trusted SNPs (Single Nucleotide Polymorphisms) you might want to run the PoPoolation analysis only on this set and not on all bases in the alignment. This blog post introduces a python script that allows you to extract such set of trusted SNPs from a pileup file. Read more

Determine read-coverage thresholds

5 minute read

Published:

If you want to identify sequence variants like SNPs (Single Nucleotide Polymorphisms) or InDels (Insertions and Deletions) in your sequencing data, you want to avoid regions of too low coverage. The lower the coverage the more difficult it becomes to discriminate sequencing errors from real sequence variants. Read more

category1

Blog Post number 4

less than 1 minute read

Published:

This is a sample blog post. Lorem ipsum I can’t remember the rest of lorem ipsum and don’t have an internet connection right now. Testing testing testing this blog post. Blog posts are cool. Read more

category2

Blog Post number 4

less than 1 minute read

Published:

This is a sample blog post. Lorem ipsum I can’t remember the rest of lorem ipsum and don’t have an internet connection right now. Testing testing testing this blog post. Blog posts are cool. Read more

cool posts

Blog Post number 4

less than 1 minute read

Published:

This is a sample blog post. Lorem ipsum I can’t remember the rest of lorem ipsum and don’t have an internet connection right now. Testing testing testing this blog post. Blog posts are cool. Read more

mapping

Count realigned reads in SAM file

1 minute read

Published:

Reads that are spanning InDels (Insertion and Deletion variants) are often misaligned and can result in false positive SNPs (Single Nucleotide Polymorphisms). A popular tool that can re-align these reads is GATK's IndelRealigner. Once the job is done it's good to know how many of the reads had been actually re-aligned. Read more

Mapped fragment lengths

3 minute read

Published:

How much do your paired-end reads overlap? Let's assume your forward and reverse reads each cover 300 bp (base pairs). Now, do they overlap fully and cover only a range of 300 bp in the genome? Or do they span across 1500 or so bp of your genome with an uncovered gap of 900 bp between the forward and reverse read? Read more

Filter Bowtie2 alignments

3 minute read

Published:

Sequence alignments generally need to be filtered for certain criteria. The Python script Bowtie2Filtering.py enables you to filter out reads from Bowtie2 alignments (SAM files) that are Read more

pileup

Extract SNPs from a pileup file

1 minute read

Published:

If you want to use the PoPoolation pipeline to analyze pooled next generation sequencing data, you need a pileup file as input format (created with samtools mpileup from a bam file). However, if you have already a set of trusted SNPs (Single Nucleotide Polymorphisms) you might want to run the PoPoolation analysis only on this set and not on all bases in the alignment. This blog post introduces a python script that allows you to extract such set of trusted SNPs from a pileup file. Read more

Determine read-coverage thresholds

5 minute read

Published:

If you want to identify sequence variants like SNPs (Single Nucleotide Polymorphisms) or InDels (Insertions and Deletions) in your sequencing data, you want to avoid regions of too low coverage. The lower the coverage the more difficult it becomes to discriminate sequencing errors from real sequence variants. Read more

python

Selecting MethylRAD tags with less degenerate adaptor ends

3 minute read

Published:

The python script RTR.py uses fragments that result from DNA digestion with a IIb restriction enzyme (see this post) and selects only those that fit to adaptor ends with some specific bases. This allows to adjust the number of digested fragments that are sequenced. See the paper of Wang et. al presenting the MethylRAD method and the idea of using less degenerative adaptor ends to reduce MethylRAD tag representation. Read more

Extract target windows from a fasta file

10 minute read

Published:

You found a set of interesting SNPs that are putatively under selection? Here, I provide two python scripts that will help you to extract windows of putatively adaptive regions around these SNPs. Our working group, for example, uses fasta files with enrich putatively adaptive regions in targeted re-sequencing approaches. Read more

Count realigned reads in SAM file

1 minute read

Published:

Reads that are spanning InDels (Insertion and Deletion variants) are often misaligned and can result in false positive SNPs (Single Nucleotide Polymorphisms). A popular tool that can re-align these reads is GATK's IndelRealigner. Once the job is done it's good to know how many of the reads had been actually re-aligned. Read more

Extract SNPs from a pileup file

1 minute read

Published:

If you want to use the PoPoolation pipeline to analyze pooled next generation sequencing data, you need a pileup file as input format (created with samtools mpileup from a bam file). However, if you have already a set of trusted SNPs (Single Nucleotide Polymorphisms) you might want to run the PoPoolation analysis only on this set and not on all bases in the alignment. This blog post introduces a python script that allows you to extract such set of trusted SNPs from a pileup file. Read more

Determine read-coverage thresholds

5 minute read

Published:

If you want to identify sequence variants like SNPs (Single Nucleotide Polymorphisms) or InDels (Insertions and Deletions) in your sequencing data, you want to avoid regions of too low coverage. The lower the coverage the more difficult it becomes to discriminate sequencing errors from real sequence variants. Read more

Mapped fragment lengths

3 minute read

Published:

How much do your paired-end reads overlap? Let's assume your forward and reverse reads each cover 300 bp (base pairs). Now, do they overlap fully and cover only a range of 300 bp in the genome? Or do they span across 1500 or so bp of your genome with an uncovered gap of 900 bp between the forward and reverse read? Read more

Filter Bowtie2 alignments

3 minute read

Published:

Sequence alignments generally need to be filtered for certain criteria. The Python script Bowtie2Filtering.py enables you to filter out reads from Bowtie2 alignments (SAM files) that are Read more

samfile

Count realigned reads in SAM file

1 minute read

Published:

Reads that are spanning InDels (Insertion and Deletion variants) are often misaligned and can result in false positive SNPs (Single Nucleotide Polymorphisms). A popular tool that can re-align these reads is GATK's IndelRealigner. Once the job is done it's good to know how many of the reads had been actually re-aligned. Read more