On Thu, 16 Jun 2005, Gerard DVD Kleywegt wrote: > >hi all, > >we are writing up the structure determination of a dimeric human enzyme. while >going through the model (~750 residues per monomer), i noticed that the >protein contains rather few lysines (1.8%) and isoleucines (2.7%), and rather >many prolines (7.5%) and phenylalanines (6.5%). (if i remember correctly, >there are no low-complexity regions in the sequence.) > >i would be grateful for any clues or literature references that might tell us >if this is statistically to be expected or unusual and -if the latter- what >could explain it, and whether or not it might have any significance. also, a >pointer to a table of the average amino-acid composition of soluble human >proteins (or enzymes) would be useful. > >thanks in advance for any input ! I don't know of any tables or literature off hand (I am sure there are plenty), but you can quite easily generate the statistics from a non-redundant set of sequences (for example UniParc). Use this 'background' set to generate your 'expected' frequency for each amino acid, then compare this to the 'observed' frequency from your protein. The stats are simply a case of comparing the observed and expected frequencies to get some measure of 'unusual' (along with a significance). Often people quote log(likelyhood), coming from the log odds ratio. It gets rapidly more complecated (technically) when you try to consider different 'populations' of amino acids, for example suface amino acids (which are known to have a different distribution from core amino acids). However, the basic idea is the same. Dan. >--gerard > >****************************************************************** > Gerard J. Kleywegt > [Research Fellow of the Royal Swedish Academy of Sciences] >Dept. of Cell & Molecular Biology University of Uppsala > Biomedical Centre Box 596 > SE-751 24 Uppsala SWEDEN > > http://xray.bmc.uu.se/gerard/ mailto:gerard at xray.bmc.uu.se >****************************************************************** > The opinions in this message are fictional. Any similarity > to actual opinions, living or dead, is purely coincidental. >****************************************************************** > >_______________________________________________ >ssml-general mailing list >ssml-general at bioinformatics.org >https://bioinformatics.org/mailman/listinfo/ssml-general >