[ssml] Unusual amino-acid composition ?

Mensur Dlakic mdlakic at montana.edu
Thu Jun 16 12:37:17 EDT 2005


Hi Gerard,

Below I'll paste background AA frequencies that were in use by BLAST and 
HMMer suites of programs last time I checked (probably 5 years ago). I 
doubt that they changed much in the meantime and you can find out for sure 
by digging through the codes of these two programs.

HMMER background amino-acid frequencies
0.075520        # A
0.016973        # C
0.053029        # D
0.063204        # E
0.040762        # F
0.068448        # G
0.022406        # H
0.057284        # I
0.059398        # K
0.093399        # L
0.023569        # M
0.045293        # N
0.049262        # P
0.040231        # Q
0.051573        # R
0.072214        # S
0.057454        # T
0.065252        # V
0.012513        # W
0.031985        # Y

BLAST background amino-acid frequencies
0.075218        # A
0.016797        # C
0.05034 # D
0.059838        # E
0.035457        # F
0.070384        # G
0.019362        # H
0.047907        # I
0.05414 # K
0.08773 # L
0.019904        # M
0.041418        # N
0.04884 # P
0.039238        # Q
0.048101        # R
0.067798        # S
0.054987        # T
0.061265        # V
0.0114  # W
0.029223        # Y

For example, present AA frequencies from uniref50 database ( 
http://www.pir.uniprot.org/ ) are below and they don't seem to be much 
different (I have the program to count these for Windows and Linux if there 
is interest). Finally, when you say that there is no compositional bias I 
assume you used SEG or something similar. It is worth trying CAST ( 
http://www.ebi.ac.uk/research/cgg/services/cast/ ), which also delineates 
biased regions but in a conceptually different way from SEG ( 
http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=pubmed&dopt=Abstract&list_uids=11120681&query_hl=1 
).

Cheers,

Mensur


Total residues in uniref50.fasta: 286009290
A = 8.06% (23043066)
C = 1.54% (4400242)
D = 5.26% (15043876)
E = 6.26% (17902613)
F = 3.90% (11163384)
G = 6.71% (19182732)
H = 2.30% (6588442)
I = 5.54% (15847258)
K = 5.41% (15481893)
L = 9.71% (27766187)
M = 2.19% (6255004)
N = 4.40% (12598140)
P = 5.17% (14782885)
Q = 4.04% (11556514)
R = 5.74% (16423417)
S = 7.63% (21834321)
T = 5.51% (15761081)
V = 6.37% (18231066)
W = 1.24% (3539328)
Y = 2.96% (8466613)
X = 0.05% (128845)

Negative (DE)                   = 11.52% (32946489)
Positive (HKR)                  = 13.46% (38493752)
Aliphatic (ILV)                 = 21.62% (61844511)
Aromatic (FHWY)                 = 10.40% (29757767)
Charged (DEHKR)                 = 24.98% (71440241)
Polar (CDEHKNQRST)              = 48.11% (137590539)
Hydrophobic (ACFGHILMTVWY)      = 56.03% (160244403)
Small (ACDGNPSTV)                       = 50.65% (144877409)
Big (EFHIKLMQRWY)                       = 49.30% (140990653)
Other (X)                       = 0.05% (128845)



At 09:02 AM 6/16/2005, you wrote:

>hi all,
>
>we are writing up the structure determination of a dimeric human enzyme. 
>while going through the model (~750 residues per monomer), i noticed that 
>the protein contains rather few lysines (1.8%) and isoleucines (2.7%), and 
>rather many prolines (7.5%) and phenylalanines (6.5%). (if i remember 
>correctly, there are no low-complexity regions in the sequence.)
>
>i would be grateful for any clues or literature references that might tell 
>us if this is statistically to be expected or unusual and -if the latter- 
>what could explain it, and whether or not it might have any significance. 
>also, a pointer to a table of the average amino-acid composition of 
>soluble human proteins (or enzymes) would be useful.
>
>thanks in advance for any input !
>
>--gerard

==========================================================================
| Mensur Dlakic, PhD                | Tel: (406) 994-6576                |
| Department of Microbiology        | Fax: (406) 994-4926                |
| Montana State University          |                                    |
| 109 Lewis Hall, P.O. Box 173520   | http://myprofile.cos.com/mensur    |
| Bozeman, MT 59717-3520            | E-mail: mdlakic at montana.edu        |
==========================================================================



More information about the ssml-general mailing list