[ssml] Howto generate position specific score matrix (PSSM) in
the way of psi-blast ?
Kevin Karplus
karplus at soe.ucsc.edu
Sat Jan 8 12:08:22 EST 2005
If you are trying to match the performance of psi-blast, you need to
read one of the more recent papers, not just the original. They made
several refinements in 2001:
@article{improved-psiblast-2001,
author={Sch\"affer, Alejandro A. and Aravind, L.
and Madden, Thomas L. and Shavirin, Sergei
and Spouge, John L. and Wolf, Yuri I.
and Koonin, Eugene and Altschul, Stephen F.},
title ="Improving the accuracy of {PSI-BLAST} protein database
searches with composition-based statistics and other refinements",
journal="Nucleic Acids Research",
volume=29, number=14,
year=2001,
pages="2994-3005"
}
If you are just trying to come with a decent PSSM procedure, not a
duplicate of PSI-BLAST, I'd recommend using the Henikoffs' relative
weighting for sequences (for ease of implementation) and Dirichlet
mixtures for setting the probabilities. Using the Dirichlet mixtures
is not trivial, as you not only have to get the computation of the
probabilities right, you have to adjust the total sequence weight to
get the level of generality you want. This is the method used in the
SAM tool suite for HMMs, and it has been implemented in C and C++, but
not in perl (perl is not a great language for numeric computations
that involve iteration).
------------------------------------------------------------
Kevin Karplus karplus at soe.ucsc.edu http://www.soe.ucsc.edu/~karplus
Professor of Biomolecular Engineering, University of California, Santa Cruz
Undergraduate and Graduate Director, Bioinformatics
(Senior member, IEEE) (Board of Directors, ISCB)
life member (LAB, Adventure Cycling, American Youth Hostels)
Effective Cycling Instructor #218-ck (lapsed)
Affiliations for identification only.
More information about the ssml-general
mailing list