On Sat, 8 Jan 2005, Jarod wrote: >Hi, > >I try to write a short program to generate position specific score matrix >(PSSM) from a multi-alignment of sequences in Perl language, I know there are >methods to do this, but I want to do this like PSI-blast. Unfortunately, the >original article about PSI-blast does not make me clear, and the NCBI source >code is too difficult to read. > >Anyone who can tell me the its principle and how psi-blast works? If you are finding the original papers confusing (I know I did), the best thing to do next is probably to try reading a text book on sequence analysis, and/or you can probably find lots of online tutorials and descriptions of PSSM's. If you find any good ones please post them up on the list :) There is a short section specifically about PSSM in the book 'Biological sequence analysis' by Durbin, Eddy, Krogh and Mitchison (section 5.1), which gives a very clear description of a simple (PSSM) scoring system. While the focus of that book is profile HMMs, it might help your understanding to know that a PSSM is exactly like a profile HMM except without insert or delete *states*. It is just a string of match states. The two main issues (as I understand it) are; 1) calculating the probabilities of finding a particular amino acid at a particular position, and 2) translating the score a particular PSSM gives to a particular sequence into a measure of significance. The suggested book has lots of background reading on different methods for doing both of the above. I really like the idea of a Perl implementation of psi-blast (for educational purposes). If you are really ambitious perhaps your code could be the first contribution to a new open source project at bioinformatics.org? All the best, Dan. > >Thanks. > > Jarod >_______________________________________________ >ssml-general mailing list >ssml-general at bioinformatics.org >https://bioinformatics.org/mailman/listinfo/ssml-general >