"Dan Bolser" <dmb at mrc-dunn.cam.ac.uk> asked > I would like to ask if the objective (direct or indirect) of > weighting during model building is to make the model score every > true hit equally? Anders Krogh tried that approach. It did not work very well, because it amplified the noise too much. One incorrect or misaligned sequence does a lot of damage then, since it needs a huge weight to score as well as the rest, and then the model is grossly distorted. I believe that PSI-BLAST does do some sequence weighting, and the SAM T99 and T2K scripts certainly do. In fact, if you don't do some sort of sequence weighting when you use Dirichlet mixtures you get very bad results, because almost all training sets have many similar sequences. The main problem with Dirichlet mixtures or pseudocounts is setting the total weight of the data (how much you believe the data rather than the prior). How you allocate the weight to the individual sequences is much less important, though methods that allocate some more weight to the outliers generally do a better job of generalization than flat weighting. There isn't a clean mathematical optimization here, since we are dealing with noisy data with unknown but strong sampling biases. About all we can do is to experiment with weighting schemes and see what works---and what works well in one application (like fold-recognition) might not work well in another (like identifying subfamilies) since different levels of generalization are needed in different applications. Kevin Karplus