[ssml] Re: Sequence defined domains?

Kevin Karplus karplus at soe.ucsc.edu
Thu Nov 27 13:56:04 EST 2003


The SAM T99 and T2K scripts use an A2M alignment as a seed.  That
alignment is often a single sequence, but it can be any multiple alignment.
Upper and lower case letters are used for identifying the alignment
columns and insertions respectively ("-" is used for a gap in an
alignment column, and "." may be used to pad insertions).

If you provide a sequence as a seed that has a large insertion in the
middle, the constructed HMM will favor having insertions at that
point, and the target99 or target2k script will (usually) construct a
pretty good HMM for recognizing the split domain, without any further
guidance.  Occassionally, the target99 or target2k script will have
"model drift" in which the HMM no longer aligns the initial sequence
the way we would want---this causes problems in about 1% of the HMMs
we build, but may be a bigger problem in split-domain HMMs.  The next
version of SAM will, I hope, have some new features to fix this
problem (I have some ideas on how to fix it, but am waiting for
Richard Hughey to implement them, or for a grad student to volunteer
to implement them).





More information about the ssml-general mailing list