[ssml] Unusual amino-acid composition ?

Dan Bolser dmb at mrc-dunn.cam.ac.uk
Thu Jun 16 11:56:15 EDT 2005

On Thu, 16 Jun 2005, Gerard DVD Kleywegt wrote:

>hi all,
>we are writing up the structure determination of a dimeric human enzyme. while 
>going through the model (~750 residues per monomer), i noticed that the 
>protein contains rather few lysines (1.8%) and isoleucines (2.7%), and rather 
>many prolines (7.5%) and phenylalanines (6.5%). (if i remember correctly, 
>there are no low-complexity regions in the sequence.)
>i would be grateful for any clues or literature references that might tell us 
>if this is statistically to be expected or unusual and -if the latter- what 
>could explain it, and whether or not it might have any significance. also, a 
>pointer to a table of the average amino-acid composition of soluble human 
>proteins (or enzymes) would be useful.
>thanks in advance for any input !

I don't know of any tables or literature off hand (I am sure there are
plenty), but you can quite easily generate the statistics from a
non-redundant set of sequences (for example UniParc).

Use this 'background' set to generate your 'expected' frequency for
each amino acid, then compare this to the 'observed' frequency from
your protein.

The stats are simply a case of comparing the observed and expected
frequencies to get some measure of 'unusual' (along with a significance).

Often people quote log(likelyhood), coming from the log odds ratio.

It gets rapidly more complecated (technically) when you try to consider
different 'populations' of amino acids, for example suface amino acids
(which are known to have a different distribution from core amino
acids). However, the basic idea is the same.


>                         Gerard J.  Kleywegt
>     [Research Fellow of the Royal  Swedish Academy of Sciences]
>Dept. of Cell & Molecular Biology  University of Uppsala
>                 Biomedical Centre  Box 596
>                 SE-751 24 Uppsala  SWEDEN
>     http://xray.bmc.uu.se/gerard/  mailto:gerard at xray.bmc.uu.se
>    The opinions in this message are fictional.  Any similarity
>    to actual opinions, living or dead, is purely coincidental.
>ssml-general mailing list
>ssml-general at bioinformatics.org

More information about the ssml-general mailing list