I don't use similarity matrices for multiple alignment---I use Dirichlet mixture priors. Similarity matrices are great for sequence-sequence alignment, since they do a very good job of estimating a probability distribution from a single sample. They're terrible though at estimating probability distributions from samples with more than one amino acid in them. I do iterative search to build my multiple alignments, starting with a seed (usually a single sequence, but can be a hand-generated alignment), and using gradually looser thresholds on the search. This is similar in spirit to the psi-blast iteration, though independently developed at about the same time. My iterations cycle through multiple alignment-> HMM search for similar sequences thin resulting alignment retrain HMM on thinned set of sequences realign all found sequences using HMM The method is fairly robust to changes in parameters, as long as the search threshold is never set so loose as to get in unrelated sequences. Since I usually use this method in a fully automated way (I've run it on at least 15,000 seeds), I can't rely on eyeballing the results to decide when contamination happens, so I've set the default values fairly strictly. If you set thresholds too tight, you miss some homologs, though, so I have occasionally played with loosening them up on specific cases that I was playing with by hand. Since I usually start with a single sequence, the question "How does the quality of the inital multiple alignment affect the later development of the HMM on a given database?" is not one I can easily answer. Obviously a bad seed alignment is going to cause some problems. A good seed alignment can help, but we (and others who have tried) have not gotten better fold-recognition by starting with a structural alignment (FSSP with Z score >=7 ) as a seed for HMMs. It seems best to have multiple HMMs, each of which is somewhat more specific. Kevin Karplus