Help - How to use DistinctiEnz

What is T-DistinctiEnz
Input Sequences
Sequence Options
Enzyme Selection
Output and Analysis Options
Analysis Output

^^ What is T-DistinctiEnz

T-DistinctiEnz is a web tool for performing a virtual T-RFLP (terminal restriction fragment length polymorphism) with a set of restriction enzymes on a set of DNA sequences. The name T-DistinctiEnz comes from DISTINCTION. The tool was produced by Mohammadreza Rezailashkajani and Delnaz Roshandel, Department of Bioinformatics, Research Center for Gastroenterology and Liver Diseases, Shaheed Beheshti University of Medical Sciences. We wish to express our gratitude to Caroline Reiff who first suggested the idea of producing such a tool (when we were all taking the Biocomputing module in University of Manchester) for her valuable comments on completing the tool by adding new features a biologist may need.
T-DistinctiEnz is used to get an estimate of DISTINCTION (Resolution) power of a set of enzymes for a set of sequences during a t-RFLP analysis. In other words, it helps the user to learn with what enzyme or set of enzymes, she can best DISTINGUISH among a set of sequences using t-RFLP. To see a brief description of t-RFLP, please click here.
T-DistinctiEnz performs a restriction analysis on each sequence and examines all tagged fragments to see how unique the cut fragments from each sequence are when compared to all other fragment sets produced from other sequences in the uploaded set of sequences. This depends on the type of t-RFLP analysis. In a 3' t-RFLP or 5' t-RFLP, the 3'-tagged or 5'-tagged fragment is analyzed while in a dual t-RFLP both are considered. Finally a Resolution Power (Ratio) is calculated by simply dividing the number of unique-fragment sets by the number of the submitted sequences. If Resolution power is 100%, it means that that set of enzymes can fully perform a DISTINCTION analysis among the submitted sequences. However if only ONE pattern is produced by all sequences, the resolution power will be set to 0 (e.g. it is not calculated from the formula above).

Notice: For very long sequences, if fragments over 100,000 base pairs are produced then the Resolution Power may be falsely high because over 100,000 (having a log value near 4) sequences may produce zero or negative migration distances (see below). In other words, the application calculates differences while in RFLP Gel we get lower resolution.

Resolution power = number of unique-fragment sets / number of the submitted sequences

The program allows the user to:

Upload a set of sequences in FASTA format all in ONE FASTA file
Choose one or more restriction enzymes out of a list of more than 500 type II restriction enzymes
Decide whether you wish to test each enzyme separately(no graphical output) or you want to perform a multiple-digest using all selected enzymes together.
Select the type of t-RFLP analysis: 3', 5' or dual t-RFLP (considering both 3' and 5' terminal fragments).
Decide whether you wish to include abundance values for sequences.
Select Maximum vertical value for abundances.
Select height of graphical output in pixels.

Alternatively, T-DistinctiEnz can be used with uploading a file with one sequence just to simply see the virtual T-RFLP Gel view of any sequence.

^^ Input Sequence

Use the Browse button to upload a file containing ALL your sequences in FASTA format.

Browse to upload a FASTA file containing ALL your sequences in FASTA format.

An example of the content of such file can be the following having been saved in text format with .FASTA extension.

>Sequence_a
CACACATTTAGGATTTTTATTCCGCTCCGGAATTCCCCGGCCCATATGAGCGCT
TATACTTTTTTTTTTTCCGCGCGCTACGTAAGCGCTTCGCCCAGC
TTTTTATTCCTTTATTATAAACCGGAACCTCCGGCAGGAAA

>Sequence_b
TTCCACACATTTAGGAATTCCGCTCCGGAATTCCCCGGCCCATATGAGCGCTTATACGCGAATTCGAGCGCTTC
CGCGCGCTACGTAAGCGCTTCGCCCAGCGGAATTCCTTTATTATAAACCGGAACCTCCGGCAGGAAATTTTTTTTTT
TTTTTTTTTTTTATACGCGAATTCTTTTTTTTTGAGCGCTTCCGCGCGCTACGTAAGCGCTTCGCCCAGCGGAATTCCTTTAT
TATAAACCGGAACCTCCGGCAGGAA

>Sequence_c
TTTTTTTCCCTTTTTTTTATTCCGCTCCGGAATTCCCCGGCCCATATGAGCGCT
TATACGCGAATTCGAGCGCTTCCGCGCGCTACGTAAGCGCTTCGCCCAGC GGAATTCCTTTATTATAAACCGGAACCTCCGG
CAGGAAA

Click here to download the above sequences in one FASTA file in Zip format. Please unzip them and test in T-DistinctiEnz with EcoNI and EcoRI and a dilation factor of 30.

You can also include an abundance value for each sequence which can represent abundance of each sequence in the sample. If so, you must set the abundance value for each sequence between pipes within sequence name (for example: Sequence_a|20|). Remember not put any space characters between this value and sequence name. Here is the same sequence set as above but with abundance values, 20 for Sequence_a, 40 for Sequence_b, and 15 for Sequence_c:

>Sequence_a|20|
CACACATTTAGGATTTTTATTCCGCTCCGGAATTCCCCGGCCCATATGAGCGCT
TATACTTTTTTTTTTTCCGCGCGCTACGTAAGCGCTTCGCCCAGC
TTTTTATTCCTTTATTATAAACCGGAACCTCCGGCAGGAAA

>Sequence_b|40|
TTCCACACATTTAGGAATTCCGCTCCGGAATTCCCCGGCCCATATGAGCGCTTATACGCGAATTCGAGCGCTTC
CGCGCGCTACGTAAGCGCTTCGCCCAGCGGAATTCCTTTATTATAAACCGGAACCTCCGGCAGGAAATTTTTTTTTT
TTTTTTTTTTTTATACGCGAATTCTTTTTTTTTGAGCGCTTCCGCGCGCTACGTAAGCGCTTCGCCCAGCGGAATTCCTTTAT
TATAAACCGGAACCTCCGGCAGGAA

>Sequence_c|15|
TTTTTTTCCCTTTTTTTTATTCCGCTCCGGAATTCCCCGGCCCATATGAGCGCT
TATACGCGAATTCGAGCGCTTCCGCGCGCTACGTAAGCGCTTCGCCCAGC GGAATTCCTTTATTATAAACCGGAACCTCCGG
CAGGAAA

^^ Sequence Options

If ALL of your sequences are circular please check the corresponding box shown below. If linear sequences are submitted, leave this box unchecked. It is not possible to submit circular and linear sequences simultaneously in one file. All sequences must be either circular OR linear. If unchecked, ALL the sequences in the uploaded file will be considered linear.
If your sequence contains non-atcg characters such as numbers, tabs, etc; check the Turn on PowerCleaner! box which will remove all characters but a, t, c, and g.

All uploaded sequences are circular.
Turn on PowerCleaner! (Cleans all non-atcg characters from the above-pasted/uploaded sequence.)

^^ Enzyme Selection

You can select one or more restriction enzymes to perform a multiple digest. Control+ click to select more than one enzyme.
For example, the sequence shown in the table above can be cut by EcoNI and EcoRI enzymes. You can have a test yourself with either or both enzymes simultaneously. Click here to download the above sequences in one FASTA file in Zip format. Please unzip before testing and cut the sequences by EcoNI and EcoRI. Especially test this sequence with and without checking "Sequences are Circular" checkbox explained above.

Select one or more enzymes (Ctrl+click):

^^ Output and Analysis Options

When you choose Test each enzyme option, a single digest is performed for each selected enzyme and Resolution Profile is outputted for each; there will be no graphical output. If this option is NOT selected, then a multiple-digestwith all selected enzyme occurs and a graphical t-RFLP output is produced.

Test each enzyme (May take a long time).

A t-RFLP analysis can be performed in 3 modes with considering terminal 3', 5' or both tagged fragments. In the graphical output, all 3' fragments will appear in upper half while all 5' fragments will appear in lower half of the image.
If you wish to introduce abundance values in your analysis, check Consider abundances option. If so, you must include abundance values in each sequence name as stated in Sequence Options above.

Select type of t-RFLP analysis:

Consider abundances.

The 2 following options are of importance if you wish to get an image of particular height. The first option adjusts the maximum vertical value seen in Y's axis and is so important when your abundance values may accumulate to higher values.

Select Maximum vertical value for abundances.

Select height of graphical output in pixels. (Picture width will be adjusted to maximum cut fragment.)

^^ Analysis Output

The analysis gives a Resolution Profile and a graphical t-RFLP Gel output. In fact, the application automatically adds a sq_n_ prefix to your sequence names which will make sequences appear as they were ordered in the originally uploaded FASTA file. Resolution Profile calculation method is explained in What is T-DistinctiEnz section.
Therefore the sequence names will look like: sq_1_The name of 1st seq, sq_2_The name of 2nd seq, etc. A table is produced where the terminal fragment size(s) from each sequence is shown. When a dual t-RFLP is performed 3' and 5' fragments are represented by for example T56 and F100. Here T56 shows a 56 base pair 3'-tagged fragment and a 100 base pair 5'-tagged fragment.
In the graphical output, X's axis shows length of fragments in base pairs and Y's axis shows abundance values. All 3' fragments will appear in upper half while all 5' fragments will appear in lower half of the image. This is to enhance visual information the image can provide. Images are produced in png format which is supported by most browsers. Below is an image of the analysis of the sequences file described above cut by EcoNI and EcoRI.
In the real output, you can move your mouse over the image and get exact value of any fragment size in base pairs.