[ssml] Unusual amino-acid composition ?
Kevin Karplus
karplus at soe.ucsc.edu
Thu Jun 16 13:07:30 EDT 2005
I have some Dirichlet mixtures, trained on the composition of proteins in
a reduced-redundancy database. There are many proteins with
compositions a long way from the background.
One can look at the statistics for
log P(counts| Dirichlet mixture)
but there is a strong length dependence (roughly linear), so
log(P(counts| Dirichlet mixture)) / length
is probably the statistic to look at.
One can compute this for a large number of proteins, then compare with
the value for the specific protein, to see how unusual it is.
More information about the ssml-general
mailing list