### Alphabets

There are 20(21) amino-acids. The standard protein alphabet is therefore 20. However, for certain uses it is better to use a redundant alphabet based on fewer letters, where each letter represents a physico-chemical grouping of several residue types. When requested, PeCoP uses the following seven-letter alphabet scheme:

Representative letter Physico-chemical property Included residue types
F Hydrophobic A, V, L, I, M, C
R Aromatic F, W, Y, H
O Polar S, T, N, Q
T Positive R, K
N Negative E, D
P Proline P
G Glycine G

### Calculation of conservation based on different alphabets

The information content may be based on different alphbetical representations of the protein, as explained above.

### Display the protein using different alphbaetical representations

It is possible to calculate the positional information content based on one alphabet, (e.g. 7-letter) and display it using another (e.g. 20-letter).

### Priors

The use of priors in information content calculations is one of those tricky issues that basically depends on what you are asking. If the question is: "How conserved is amino-acid X in position j?" then you should not use priors. However, if the question is "How conserved is amino-acid X in position j, given its background distribution?" (or in other words: "how surprised are we to see X in position j?") then priors should be used.

### First & last vs. Plurality

Here you set the method by which it is determined in the final consensus whether a position is conserved or not.
First & Last: this means that a position will be marked as conserved if it is conserved in the first PSI-BLAST iteration and in the last PSI-BLAST iteration
Plurality: this means that a position will be marked as conserved by a vote. If a given number of PSI-BLAST iterations mark it as conserved, then it is conserved. The number is user determined, in the next field.

### E-value

The e-value (``Expect-value'') is a parameter that describes the number of hits one can ``expect'' to see just by chance when searching a database of a particular size. Essentially, the e-value describes the random background noise that exists for matches between sequences. The e-value is used as a convenient way to create a significance threshold for reporting results.