If you are trying to match the performance of psi-blast, you need to read one of the more recent papers, not just the original. They made several refinements in 2001: @article{improved-psiblast-2001, author={Sch\"affer, Alejandro A. and Aravind, L. and Madden, Thomas L. and Shavirin, Sergei and Spouge, John L. and Wolf, Yuri I. and Koonin, Eugene and Altschul, Stephen F.}, title ="Improving the accuracy of {PSI-BLAST} protein database searches with composition-based statistics and other refinements", journal="Nucleic Acids Research", volume=29, number=14, year=2001, pages="2994-3005" } If you are just trying to come with a decent PSSM procedure, not a duplicate of PSI-BLAST, I'd recommend using the Henikoffs' relative weighting for sequences (for ease of implementation) and Dirichlet mixtures for setting the probabilities. Using the Dirichlet mixtures is not trivial, as you not only have to get the computation of the probabilities right, you have to adjust the total sequence weight to get the level of generality you want. This is the method used in the SAM tool suite for HMMs, and it has been implemented in C and C++, but not in perl (perl is not a great language for numeric computations that involve iteration). ------------------------------------------------------------ Kevin Karplus karplus at soe.ucsc.edu http://www.soe.ucsc.edu/~karplus Professor of Biomolecular Engineering, University of California, Santa Cruz Undergraduate and Graduate Director, Bioinformatics (Senior member, IEEE) (Board of Directors, ISCB) life member (LAB, Adventure Cycling, American Youth Hostels) Effective Cycling Instructor #218-ck (lapsed) Affiliations for identification only.