The python script RTR.py
uses fragments that result from DNA
digestion with a IIb restriction enzyme (see this post) and selects
only those that fit to adaptor ends with some specific bases. This
allows to adjust the number of digested fragments that are sequenced.
See the paper of Wang et. al presenting the MethylRAD method and the
idea of using less degenerative adaptor ends to reduce MethylRAD tag
representation.
For example, digestion of fully methylated restriction sites with the FspEI enzyme creates degenarative sticky adaptor ends with four ‘random’ bases (NNNN) that will form hydrogen bonds with complementary adaptor ends. Theoretically, a single fixed base targets 1/16th of all FspEI sites. A reduction of one quarter (NNNR overhang on both adaptors) to 1/256th of all sites (NNGG overhangs on both adaptors) is possible.
Example scheme for reduced tag representation:
Fraction of sites targeted | 5’ adaptor oligo | 3’ adaptor oligo |
---|---|---|
1 | NN | NN |
1/4 | NR | NR |
1/8 | NC | NG |
1/16 | NC | NC |
1/32 | RG | YG |
1/64 | RG | RG |
You can download the script RTR.py
from
https://gist.github.com/alj1983/a1bcb950013d8ffac9a7fb9ceb70b1c0/archive/2bec1913e2454a58074deafd3f7687612f3b9048.zip
usage: RTR.py [-h] [-f FASTA] [-b1 BASE1] [-b2 BASE2] [-l1 LOCATION1] [-l2 LOCATION2] Selecting MethylRAD tags with less degenerate adaptor ends optional arguments: -h, --help show this help message and exit -f FASTA, --fasta FASTA fasta file with MethylRAD tags (output from the InSilicoTypeIIbDigestion.py script). The tag endings must be the complementary bases to adaptor 1 or 2. -b1 BASE1, --base1 BASE1 specific base of adaptor 1 that is complementary to the cutting site overhang. The script converts it to the complementary base on the 5 prime end of the MethylRAD tag -b2 BASE2, --base2 BASE2 specific base of adaptor 2 that is complementary to the cutting site overhang. The script converts it to the complementary base on the 5 prime end of the MethylRAD tag -l1 LOCATION1, --location1 LOCATION1 location of specific base 1 in the cutting site overhang. Location=1 refers to the first outermost base on either site of the MethylRAD tag -l2 LOCATION2, --location2 LOCATION2 location (in bp) of specific base 2 in the cutting site overhang. Location=1 refers to the first outermost base on either site of the MethylRAD tag
For example, to find fragments that would be complementary to adaptor 1 with the ending NNNT on one end and to adaptor 2 with the ending NNNC on the other and, we can use:
python RTR.py \ -f Digested.fasta \ -b1 'T' \ -b2 'C' \ -l1 4 \ -l2 4
Digested.fasta
is the in silico digested genome, which is the output
file of the script InSilicoTypeIIbDigestion.py
.
The script reports how many tags fit to the adaptor endings. For example, when digesting the seagrass genome with the FspEI digestion enzyme and applying the adatper endings NNNT and NNNC, the output is:
73708 of 628255 (11.7321788127%) tags fit to these adaptor endings
That fits with the expected reduction to 1/8th of al digested fragments. The script creates two fasta files:
...RTRMarked.fasta
, containing all digested fragments but only
the ones fitting to the adaptor endings marked as ‘RTR.tag’. Thos
that don’t fit to the adaptor endings are marked as ‘Not.included’....RTRExtracted.fasta
, containing only the fragments fitting to
the less degenrative adaptor endings.