[ssml] Howto generate position specific score matrix (PSSM) in the way of psi-blast ?

Jarod jarod at nlbmol.ibp.ac.cn
Wed Jan 12 07:21:16 EST 2005

In fact, I do not want to re-implement a fully functional PSI-blast with Perl, 
Perl is not good at this and PSI-blast is a big program. My goal is simple, 
that, I just want to regenerate a PSSM matrix file from the PSI-blast output, 
such as a plain-text result produced from blast command : "blastpgp -i 
test.seq -d swissprot -j 3 -m 6 -o test.blast ", here test.blast is an the 
output with multi-alignment format. My question is whether the output file 
test.blast can provide enough information to re-generate a PSSM file somewhat 
like that the file produced by "blastpgp -Q" ? In some conditions, I need to 
manually filter those sequences in the alignment of the blast output, so 
re-generate a PSSM file is important for me .


Best wishes.


On Sunday 09 January 2005 01:08, you wrote:
> If you are trying to match the performance of psi-blast, you need to
> read one of the more recent papers, not just the original.  They made
> several refinements in 2001:
> @article{improved-psiblast-2001,
>  author={Sch\"affer, Alejandro A.  and Aravind, L.
>   and Madden, Thomas L. and Shavirin, Sergei
>   and Spouge, John L. and Wolf, Yuri I.
>   and Koonin, Eugene and Altschul, Stephen F.},
>  title ="Improving the accuracy of {PSI-BLAST} protein database
>  searches with composition-based statistics and other refinements",
>  journal="Nucleic Acids Research",
>  volume=29, number=14,
>  year=2001,
>  pages="2994-3005"
>  }
> If you are just trying to come with a decent PSSM procedure, not a
> duplicate of PSI-BLAST, I'd recommend using the Henikoffs' relative
> weighting for sequences (for ease of implementation) and Dirichlet
> mixtures for setting the probabilities.  Using the Dirichlet mixtures
> is not trivial, as you not only have to get the computation of the
> probabilities right, you have to adjust the total sequence weight to
> get the level of generality you want.  This is the method used in the
> SAM tool suite for HMMs, and it has been implemented in C and C++, but
> not in perl (perl is not a great language for numeric computations
> that involve iteration).
> ------------------------------------------------------------
> Kevin Karplus 	karplus at soe.ucsc.edu	http://www.soe.ucsc.edu/~karplus
> Professor of Biomolecular Engineering, University of California, Santa Cruz
> Undergraduate and Graduate Director, Bioinformatics
> (Senior member, IEEE)	(Board of Directors, ISCB)
> life member (LAB, Adventure Cycling, American Youth Hostels)
> Effective Cycling Instructor #218-ck (lapsed)
> Affiliations for identification only.

More information about the ssml-general mailing list