[ssml] Howto generate position specific score matrix (PSSM) in the way of psi-blast ?

Kevin Karplus karplus at soe.ucsc.edu
Sat Jan 8 12:08:22 EST 2005


If you are trying to match the performance of psi-blast, you need to
read one of the more recent papers, not just the original.  They made
several refinements in 2001:

@article{improved-psiblast-2001,
	author={Sch\"affer, Alejandro A.  and Aravind, L. 
		and Madden, Thomas L. and Shavirin, Sergei
		and Spouge, John L. and Wolf, Yuri I.
		and Koonin, Eugene and Altschul, Stephen F.},
	title ="Improving the accuracy of {PSI-BLAST} protein database
	searches with composition-based statistics and other refinements",
	journal="Nucleic Acids Research",
	volume=29, number=14,
	year=2001,
	pages="2994-3005"
	}


If you are just trying to come with a decent PSSM procedure, not a
duplicate of PSI-BLAST, I'd recommend using the Henikoffs' relative
weighting for sequences (for ease of implementation) and Dirichlet
mixtures for setting the probabilities.  Using the Dirichlet mixtures
is not trivial, as you not only have to get the computation of the
probabilities right, you have to adjust the total sequence weight to
get the level of generality you want.  This is the method used in the
SAM tool suite for HMMs, and it has been implemented in C and C++, but
not in perl (perl is not a great language for numeric computations
that involve iteration).



------------------------------------------------------------
Kevin Karplus 	karplus at soe.ucsc.edu	http://www.soe.ucsc.edu/~karplus
Professor of Biomolecular Engineering, University of California, Santa Cruz
Undergraduate and Graduate Director, Bioinformatics
(Senior member, IEEE)	(Board of Directors, ISCB)
life member (LAB, Adventure Cycling, American Youth Hostels)
Effective Cycling Instructor #218-ck (lapsed)
Affiliations for identification only.


More information about the ssml-general mailing list