I have some Dirichlet mixtures, trained on the composition of proteins in a reduced-redundancy database. There are many proteins with compositions a long way from the background. One can look at the statistics for log P(counts| Dirichlet mixture) but there is a strong length dependence (roughly linear), so log(P(counts| Dirichlet mixture)) / length is probably the statistic to look at. One can compute this for a large number of proteins, then compare with the value for the specific protein, to see how unusual it is.