[ssml] Re: pdb-l: secondary structure prediction

Kevin Karplus karplus at soe.ucsc.edu
Tue Aug 9 15:49:00 EDT 2005

Manisha asked

> It is probably widely accepted that the center of the helix or stand
> is predicted more accurately than the boundaries of these elements
> (start and end positions). A number of studies focusing on improving
> the prediction of alpha-helical ends have been reported but I havn't
> come across any similar work for the prediction of Beta-strand
> ends. Is it because Beta-stand ends are already predicted with much
> higher accuracy than the helix ends? I havn't been able to find
> reference to any such comparison or observation either.

There are at least two causes for the difficulty in predicting the
ends of alpha helices:  1) the helix/non-helix decision is a bit
arbitrary at the ends of helices (witness the differences in helix
labels from DSSP and Stride) 2) different homologs may end the helices
in different places, and most of the prediction methods are basing the
predictions on a multiple alignment of similar sequences.  If the
sequences don't have identical structures in some position, the
prediction is going to have to be flawed for some of them.

Beta strands have a bit less ambiguity in definition (though still
some, particularly in low-quality models), so that source of error is
reduced.   Beta strands also tend to be short than helices, so the end
points are a greater fraction of the training set, making them less
subject to being overwhelmed in the training.  Still, I certainly see
lower confidence at the edges of beta strands than in the middles of
them---particularly for amphipathic anti-parallel strands.  

Although I have worked on improving the accuracy of my local-structure
predictors in various ways, I have not found it valuable to focus on
the ends of helices and strands.  

Instead I have increased the alphabet size to get a finer division of
local structures.  For example, the "str" alphabet subdivides DSSP's
beta strand letter into 6 letters: A anti-parallel middle strand, Z
anti-parallel edge strand, P parallel middle strand, Q parallel edge
strand, M mixed middle strand, E other (identified by DSSP as a
strand, but not easily classified by the hbonds of itself and
neighbors).  The str2 alphabet further subdivides Z into Y and Z
according to whether the residue or its neighbor has the hydrogen
bonds.  This finer subdivision of local structure has helped with fold
recognition and alignment.

One could use this approach by labeling the ends of helices or strands
with different letters (perhaps like the I-sites classes for turns).
I have not tried precisely that, because I am not convinced that the
ends of helices and strands are well enough conserved to justify such
a labeling.  There are undoubtedly many such enriched alphabets that
could be tried, and we have only looked at a few of them.  Rachel
Karchin and I developed a protocol for evaluating new local-structure
alphabets, which seems to work fairly well:

@string{prosfg= "Proteins: Structure, Function, and Genetics"}
  author =       {Rachel Karchin and Melissa Cline and 
  		  Yael Mandel-Gutfreund and Kevin Karplus}, 
  title =        {Hidden {Markov} models that use predicted local structure for fold recognition: alphabets of backbone geometry},
  year =      {2003},	
  journal = prosfg,
  volume =    {51}, number={4},
  pages =     {504--514}

    author = "Rachel Karchin and Melissa Cline and Kevin Karplus",
    title= "Evaluation of local structure alphabets based on residue burial",
    journal = prosfg,
    year = "2004",	month = "5 "#mar,
    volume = "55", 	number="3",
    pages = "508-518",
    note = "Online: http://www3.interscience.wiley.com/cgi-bin/abstract/107632554/ABSTRACT"

Kevin Karplus 	karplus at soe.ucsc.edu	http://www.soe.ucsc.edu/~karplus
Professor of Biomolecular Engineering, University of California, Santa Cruz
Undergraduate and Graduate Director, Bioinformatics
(Senior member, IEEE)	(Board of Directors, ISCB)
life member (LAB, Adventure Cycling, American Youth Hostels)
Effective Cycling Instructor #218-ck (lapsed)
Affiliations for identification only.

More information about the ssml-general mailing list