JaMBW Chapter 3.1.1
Window Composition
Aim
Given a sequence of nucleic acids or amino acids, and a compositional
pattern, this program computes the running average percentage
composition of that pattern on a window of a chosen size. Application
of this analysis allows the visualization of composition-specific
patterns (e.g. A/T, C/G, etc.) and thus visualize hypothesis on
the functionality of the considered macromolecule. For example,
one can identify potentially active regions of chromatin by observing
the richness in A/T, or one can observe potentially hydrophobic
regions by visualizing the compositional richness in aminoacids
Leu, Ile, Val, Met, Phe, Tyr and Trp.
Mode of operation
The program accepts the following parameters:
- Sequence
- Symbols used
Either paste or type in the Sequence area the sequence of interest.
By using protein sequences, adopt one-letter notation.
- removal of header information
Only the sequence must be placed in the top window: heading comments
must be removed
- long sequences and small window
In order to allow users with small screens to still be able of
use this program, the size of each window had been made rather
small. Therefore, use the scroll-bars in order to move around
in the input and output windows. The suggested strategy is to
double click in the specified area and then do copy/paste from/to
the text-editor of choice or across different applications.
- Pattern
Either paste or type the symbol(s) whose windowed composition
is requested. Please note that the program performs a logical
"or" on each single symbol. Example:
ATGCCCTTCGGAAGGTTCGCTAGCGA input sequence
AT pattern
** ** ** ** ** * matches
12345678901234567890123456 base position
1 2
The above example shows that the compositional pattern AT
is found at positions 1,2,7,8,12,13,16,17,21,22, although there
is only a single occurrence of the dinucleotide AT.
- Window
The "window" parameter indicates the size of the averaging
window on which compute the percentage on the composition for
the specified pattern. By following the above example, given a
window of size 5, the following will be the values that will be
used for visualization:
base position
| 3 | 4 | 5 | 6 | 7 |
8 | 9 | 10 |
% | 40 | 20 | 20
| 40 | 40 | 40 | 40 | 40 |
Therefore, the effect of a large window size is to smooth differences.
- Step
It indicates how to proceed along the sequence for the computation.
A step of 1 (used in the above examples) has the effect of computing
for each position along the sequence, while a value greater than
1 introduces "jumps" across the sequence.
- Locking parameters
It also offers a detailed way to control the parameters to share
with other applets present on the same page, by the following
buttons:
- Sequence
- Pattern
- Window
- Step
- Horizontal scroll
- Vertical scroll
Clicking on one of the above 6 buttons allows you to perform
modifications on other applets present in the same page. This
mode of operation is extremely useful since allows to see how
a certain pattern is present, for instance, in different sequences,
and then allow to move along one sequence and see how the other
sequences compare. Another useful application of these "locks"
is to have the same sequence in several different copies of the
program in the same page, and then compare how the graphics differ
for different parameters, as example assessing how acidic, hydrophilic
and hydrophobic regions correlate across the same sequence. A
typical application of this program would be as "viewer"
spawned from network-based applications (as done by SRS5 or by
BIOCCELERATOR Services).
In the following table are reported some commonly used combinations
of parameters, in order to achieve specific aims of useful biological
relevance to visualize biological functions in sequences as based
on compositional richness.
aim | pattern
| size of averaging window |
DNA, identify A/T reach regions | at | 5 |
DNA, identify C/G reach regions | ct | 5 |
PROTEINS, identify basic regions | krh | 5
|
PROTEINS, identify acidic regions | andcqegpsy |
5 |
PROTEINS, identify hydrophilic regions | qnedbzhkr
| 5 |
PROTEINS, identify hydrophobic regions | livmfyw
| 5 |
PROTEINS, identify aromatic regions | fyw | 5
|
PROTEINS, identify neutral regions | pagst |
5 |
PROTEINS, identify crosslink-forming regions | c
| 5 |
- Compute
Once the parameters have been chosen, by pressing the button "COMPUTE"
the patterns are searched and the running average is displayed
on a scrollable window.
How to understand its output
The output consists of the running average percentage composition
of a defined pattern into the given sequence. It is presented
to the user as a line chart which can be scrolled horizontally
and vertically by using the provided scrollbars. The presence
of peaks across the sequence indicates abundance of the specified
pattern, while a line on the base level suggests its absence.
References
Doelz,R.(1990)BioCompanion, Biocomputing Essentials series, ISBN
3-905 434-00-8
Author:Luca I.G. TOLDO,
Edition date: 28 February 1997