marinetics blog


Bioinformatics Solutions in Ecology and Evolution

05 May 2017

Selecting MethylRAD tags with less degenerate adaptor ends

Tagged: python NGS Epigenetics Fasta






Related Posts

The python script RTR.py uses fragments that result from DNA digestion with a IIb restriction enzyme (see this post) and selects only those that fit to adaptor ends with some specific bases. This allows to adjust the number of digested fragments that are sequenced. See the paper of Wang et. al presenting the MethylRAD method and the idea of using less degenerative adaptor ends to reduce MethylRAD tag representation.

For example, digestion of fully methylated restriction sites with the FspEI enzyme creates degenarative sticky adaptor ends with four ‘random’ bases (NNNN) that will form hydrogen bonds with complementary adaptor ends. Theoretically, a single fixed base targets 1/16th of all FspEI sites. A reduction of one quarter (NNNR overhang on both adaptors) to 1/256th of all sites (NNGG overhangs on both adaptors) is possible.

Example scheme for reduced tag representation:

Fraction of sites targeted 5’ adaptor oligo 3’ adaptor oligo
1 NN NN
1/4 NR NR
1/8 NC NG
1/16 NC NC
1/32 RG YG
1/64 RG RG

How to use it?

usage: RTR.py [-h] [-f FASTA] [-b1 BASE1] [-b2 BASE2] [-l1 LOCATION1]
              [-l2 LOCATION2]

Selecting MethylRAD tags with less degenerate adaptor ends

optional arguments:
  -h, --help            show this help message and exit
  -f FASTA, --fasta FASTA
                        fasta file with MethylRAD tags (output from the
                        InSilicoTypeIIbDigestion.py script). The tag endings
                        must be the complementary bases to adaptor 1 or 2.
  -b1 BASE1, --base1 BASE1
                        specific base of adaptor 1 that is complementary to
                        the cutting site overhang. The script converts it to
                        the complementary base on the 5 prime end of the
                        MethylRAD tag
  -b2 BASE2, --base2 BASE2
                        specific base of adaptor 2 that is complementary to
                        the cutting site overhang. The script converts it to
                        the complementary base on the 5 prime end of the
                        MethylRAD tag
  -l1 LOCATION1, --location1 LOCATION1
                        location of specific base 1 in the cutting site
                        overhang. Location=1 refers to the first outermost
                        base on either site of the MethylRAD tag
  -l2 LOCATION2, --location2 LOCATION2
                        location (in bp) of specific base 2 in the cutting
                        site overhang. Location=1 refers to the first
                        outermost base on either site of the MethylRAD tag

For example, to find fragments that would be complementary to adaptor 1 with the ending NNNT on one end and to adaptor 2 with the ending NNNC on the other and, we can use:

python RTR.py \
-f Digested.fasta \ 
-b1 'T' \
-b2 'C' \
-l1 4 \
-l2 4

Digested.fasta is the in silico digested genome, which is the output file of the script InSilicoTypeIIbDigestion.py.

The script reports how many tags fit to the adaptor endings. For example, when digesting the seagrass genome with the FspEI digestion enzyme and applying the adatper endings NNNT and NNNC, the output is:

73708 of 628255 (11.7321788127%) tags fit to these adaptor endings

That fits with the expected reduction to 1/8th of al digested fragments. The script creates two fasta files:

  1. ...RTRMarked.fasta, containing all digested fragments but only the ones fitting to the adaptor endings marked as ‘RTR.tag’. Thos that don’t fit to the adaptor endings are marked as ‘Not.included’.
  2. ...RTRExtracted.fasta, containing only the fragments fitting to the less degenrative adaptor endings.