[ssml] HMM weighting?

Dan Bolser dmb at mrc-dunn.cam.ac.uk
Fri Jan 16 06:05:52 EST 2004


Hello,

Now I know a (tiny) bit more about hmm sequence weighting during model building, I
would like to ask a question.

Weighting is performed to remove potential bias caused by 'over represented' sub
families in a family of proteins. The particular sequence perculularities of the sub
family could lead to 'over fitting' of the model, leading to a loss of generality
across the whole family (including all sub families).

I would like to ask if the objective (direct or indirect) of weighting during model
building is to make the model score every true hit equally?

I.E. With 50 sequences from sub family A, and only 1 from sub family B, the A's
would bias the model and tend to lower the score of B. Is what we want is for every
sequence to score the same.

Is the above flattening suggestion again too biased?

Are these problems addressed in PSI-BLAST?

Thanks for any feedback,
Dan.






More information about the ssml-general mailing list