JaMBW Chapter 3.1.1

Window Composition

Aim

Given a sequence of nucleic acids or amino acids, and a compositional pattern, this program computes the running average percentage composition of that pattern on a window of a chosen size. Application of this analysis allows the visualization of composition-specific patterns (e.g. A/T, C/G, etc.) and thus visualize hypothesis on the functionality of the considered macromolecule. For example, one can identify potentially active regions of chromatin by observing the richness in A/T, or one can observe potentially hydrophobic regions by visualizing the compositional richness in aminoacids Leu, Ile, Val, Met, Phe, Tyr and Trp.

Mode of operation

The program accepts the following parameters:

Sequence
- Symbols used
  Either paste or type in the Sequence area the sequence of interest. By using protein sequences, adopt one-letter notation.
- removal of header information
  Only the sequence must be placed in the top window: heading comments must be removed
- long sequences and small window
  In order to allow users with small screens to still be able of use this program, the size of each window had been made rather small. Therefore, use the scroll-bars in order to move around in the input and output windows. The suggested strategy is to double click in the specified area and then do copy/paste from/to the text-editor of choice or across different applications.
Pattern
Either paste or type the symbol(s) whose windowed composition is requested. Please note that the program performs a logical "or" on each single symbol. Example:
```
ATGCCCTTCGGAAGGTTCGCTAGCGA  input sequence
AT                          pattern
**    **   **  **   **   *  matches
12345678901234567890123456  base position
         1         2
```
The above example shows that the compositional pattern AT is found at positions 1,2,7,8,12,13,16,17,21,22, although there is only a single occurrence of the dinucleotide AT.
Window
The "window" parameter indicates the size of the averaging window on which compute the percentage on the composition for the specified pattern. By following the above example, given a window of size 5, the following will be the values that will be used for visualization:

base position
3 4 5 6 7 8 9 10

% 40 20 20 40 40 40 40 40

Therefore, the effect of a large window size is to smooth differences.
Step
It indicates how to proceed along the sequence for the computation. A step of 1 (used in the above examples) has the effect of computing for each position along the sequence, while a value greater than 1 introduces "jumps" across the sequence.

**base position**
	3	4	5	6	7	8	9	10
%	40	20	20	40	40	40	40	40

Locking parameters
It also offers a detailed way to control the parameters to share with other applets present on the same page, by the following buttons:

Sequence
Pattern
Window
Step
Horizontal scroll
Vertical scroll

Clicking on one of the above 6 buttons allows you to perform modifications on other applets present in the same page. This mode of operation is extremely useful since allows to see how a certain pattern is present, for instance, in different sequences, and then allow to move along one sequence and see how the other sequences compare. Another useful application of these "locks" is to have the same sequence in several different copies of the program in the same page, and then compare how the graphics differ for different parameters, as example assessing how acidic, hydrophilic and hydrophobic regions correlate across the same sequence. A typical application of this program would be as "viewer" spawned from network-based applications (as done by SRS5 or by BIOCCELERATOR Services).
In the following table are reported some commonly used combinations of parameters, in order to achieve specific aims of useful biological relevance to visualize biological functions in sequences as based on compositional richness.

aim pattern size of averaging window

DNA, identify A/T reach regions at 5

DNA, identify C/G reach regions ct 5

PROTEINS, identify basic regions krh 5

PROTEINS, identify acidic regions andcqegpsy 5

PROTEINS, identify hydrophilic regions qnedbzhkr 5

PROTEINS, identify hydrophobic regions livmfyw 5

PROTEINS, identify aromatic regions fyw 5

PROTEINS, identify neutral regions pagst 5

PROTEINS, identify crosslink-forming regions c 5

Compute
Once the parameters have been chosen, by pressing the button "COMPUTE" the patterns are searched and the running average is displayed on a scrollable window.

aim	pattern	size of averaging window
DNA, identify A/T reach regions	at	5
DNA, identify C/G reach regions	ct	5
PROTEINS, identify basic regions	krh	5
PROTEINS, identify acidic regions	andcqegpsy	5
PROTEINS, identify hydrophilic regions	qnedbzhkr	5
PROTEINS, identify hydrophobic regions	livmfyw	5
PROTEINS, identify aromatic regions	fyw	5
PROTEINS, identify neutral regions	pagst	5
PROTEINS, identify crosslink-forming regions	c	5

A Java-enabled browser would have in this place two windows similar to this picture:

How to understand its output

The output consists of the running average percentage composition of a defined pattern into the given sequence. It is presented to the user as a line chart which can be scrolled horizontally and vertically by using the provided scrollbars. The presence of peaks across the sequence indicates abundance of the specified pattern, while a line on the base level suggests its absence.

References

Doelz,R.(1990)BioCompanion, Biocomputing Essentials series, ISBN 3-905 434-00-8

Author:Luca I.G. TOLDO, Edition date: 28 February 1997