In fact, I do not want to re-implement a fully functional PSI-blast with Perl, Perl is not good at this and PSI-blast is a big program. My goal is simple, that, I just want to regenerate a PSSM matrix file from the PSI-blast output, such as a plain-text result produced from blast command : "blastpgp -i test.seq -d swissprot -j 3 -m 6 -o test.blast ", here test.blast is an the output with multi-alignment format. My question is whether the output file test.blast can provide enough information to re-generate a PSSM file somewhat like that the file produced by "blastpgp -Q" ? In some conditions, I need to manually filter those sequences in the alignment of the blast output, so re-generate a PSSM file is important for me . Thanks. Best wishes. Jarod On Sunday 09 January 2005 01:08, you wrote: > If you are trying to match the performance of psi-blast, you need to > read one of the more recent papers, not just the original. They made > several refinements in 2001: > > @article{improved-psiblast-2001, > author={Sch\"affer, Alejandro A. and Aravind, L. > and Madden, Thomas L. and Shavirin, Sergei > and Spouge, John L. and Wolf, Yuri I. > and Koonin, Eugene and Altschul, Stephen F.}, > title ="Improving the accuracy of {PSI-BLAST} protein database > searches with composition-based statistics and other refinements", > journal="Nucleic Acids Research", > volume=29, number=14, > year=2001, > pages="2994-3005" > } > > > If you are just trying to come with a decent PSSM procedure, not a > duplicate of PSI-BLAST, I'd recommend using the Henikoffs' relative > weighting for sequences (for ease of implementation) and Dirichlet > mixtures for setting the probabilities. Using the Dirichlet mixtures > is not trivial, as you not only have to get the computation of the > probabilities right, you have to adjust the total sequence weight to > get the level of generality you want. This is the method used in the > SAM tool suite for HMMs, and it has been implemented in C and C++, but > not in perl (perl is not a great language for numeric computations > that involve iteration). > > > > ------------------------------------------------------------ > Kevin Karplus karplus at soe.ucsc.edu http://www.soe.ucsc.edu/~karplus > Professor of Biomolecular Engineering, University of California, Santa Cruz > Undergraduate and Graduate Director, Bioinformatics > (Senior member, IEEE) (Board of Directors, ISCB) > life member (LAB, Adventure Cycling, American Youth Hostels) > Effective Cycling Instructor #218-ck (lapsed) > Affiliations for identification only.