[ssml] Unusual amino-acid composition ?

Thu Jun 16 13:07:30 EDT 2005

I have some Dirichlet mixtures, trained on the composition of proteins in
a reduced-redundancy database.  There are many proteins with
compositions a long way from the background.

One can look at the statistics for 
	log P(counts| Dirichlet mixture)
but there is a strong length dependence (roughly linear), so 
	log(P(counts| Dirichlet mixture))  / length
is probably the statistic to look at.

One can compute this for a large number of proteins, then compare with
the value for the specific protein, to see how unusual it is.