[BiO BB] Re: [ssml] Substitution matrices vs HMM
karplus at soe.ucsc.edu
Fri Oct 29 15:10:50 EDT 2004
Manisha Goel asked
> I was trying to develop an algorithm for describing/predicting a
> pattern (e.g. transmembrane region, signal peptide etc) in protein
> sequences. I want to derive this pattern from the multiple sequence
> alignments. But I was wondering if I should use substitution matrices
> or HMMs to describe/represent these patterns. Are there any definite
> advantages of using one over the other? Does the choice depend on what
> I am trying to define? Can somebody please direct me to relevant
> literature or suggest something from personal experience?
HMMs are currently the best method for representing patterns of the
type you have in mind. Profile HMMs are the most popular, and are
supported by two main packages HMMer and SAM. Both packages are free
to academics, non-profits, and government researchers, but the HMMer
package is open-source and SAM is not (at least not yet---we're
thinking of making it open-source but have not had the time or
resources to clean up the source code enough to do that reasonably).
SAM and HMMer models are slightly different, but similar enough to be
interconvertible with only fairly small losses (SAM models are
slightly more general than HMMer models, and use a different
calibration method). There is sam2hmmer and hmmer2sam software
available on the web.
For developing new profile HMMs, SAM is a better choice, because there
has been more development on the model-building code. (HMMer is more
popular than SAM, largely because of the prebuilt PFAM resource, which
is a very valuable database.)
See http://stash.mrc-lmb.cam.ac.uk/HMMER-SAM/ for information about a
test comparing HMMER and SAM, be people who were not on the
development team for either and were trying to decide which to use.
If you want to do non-profile HMMs (such as the transmembrane models
of TMHMM), then you may have to build your own code---I've not
seen general-purpose HMM code that was a good utility kit for building
new HMMs. Of course, I haven't been looking for one, so I may have
missed some major developments.
Kevin Karplus karplus at soe.ucsc.edu http://www.soe.ucsc.edu/~karplus
Professor of Biomolecular Engineering, University of California, Santa Cruz
Undergraduate and Graduate Director, Bioinformatics
Senior member, IEEE Board of Directors, ISCB (starting Jan 2005)
life member (LAB, Adventure Cycling, American Youth Hostels)
Effective Cycling Instructor #218-ck (lapsed)
Affiliations for identification only.
More information about the BBB