[ssml] Howto generate position specific score matrix (PSSM) in
the way of psi-blast ?
Jarod
jarod at nlbmol.ibp.ac.cn
Wed Jan 12 07:21:16 EST 2005
In fact, I do not want to re-implement a fully functional PSI-blast with Perl,
Perl is not good at this and PSI-blast is a big program. My goal is simple,
that, I just want to regenerate a PSSM matrix file from the PSI-blast output,
such as a plain-text result produced from blast command : "blastpgp -i
test.seq -d swissprot -j 3 -m 6 -o test.blast ", here test.blast is an the
output with multi-alignment format. My question is whether the output file
test.blast can provide enough information to re-generate a PSSM file somewhat
like that the file produced by "blastpgp -Q" ? In some conditions, I need to
manually filter those sequences in the alignment of the blast output, so
re-generate a PSSM file is important for me .
Thanks.
Best wishes.
Jarod
On Sunday 09 January 2005 01:08, you wrote:
> If you are trying to match the performance of psi-blast, you need to
> read one of the more recent papers, not just the original. They made
> several refinements in 2001:
>
> @article{improved-psiblast-2001,
> author={Sch\"affer, Alejandro A. and Aravind, L.
> and Madden, Thomas L. and Shavirin, Sergei
> and Spouge, John L. and Wolf, Yuri I.
> and Koonin, Eugene and Altschul, Stephen F.},
> title ="Improving the accuracy of {PSI-BLAST} protein database
> searches with composition-based statistics and other refinements",
> journal="Nucleic Acids Research",
> volume=29, number=14,
> year=2001,
> pages="2994-3005"
> }
>
>
> If you are just trying to come with a decent PSSM procedure, not a
> duplicate of PSI-BLAST, I'd recommend using the Henikoffs' relative
> weighting for sequences (for ease of implementation) and Dirichlet
> mixtures for setting the probabilities. Using the Dirichlet mixtures
> is not trivial, as you not only have to get the computation of the
> probabilities right, you have to adjust the total sequence weight to
> get the level of generality you want. This is the method used in the
> SAM tool suite for HMMs, and it has been implemented in C and C++, but
> not in perl (perl is not a great language for numeric computations
> that involve iteration).
>
>
>
> ------------------------------------------------------------
> Kevin Karplus karplus at soe.ucsc.edu http://www.soe.ucsc.edu/~karplus
> Professor of Biomolecular Engineering, University of California, Santa Cruz
> Undergraduate and Graduate Director, Bioinformatics
> (Senior member, IEEE) (Board of Directors, ISCB)
> life member (LAB, Adventure Cycling, American Youth Hostels)
> Effective Cycling Instructor #218-ck (lapsed)
> Affiliations for identification only.
More information about the ssml-general
mailing list