[ssml] Assess sequence alignment quality without reference alignments?

Fri Jul 27 08:28:09 EDT 2007

On 27/07/07, Steven Platt <Steven.Platt at hpa.org.uk> wrote:
>
>
>
>
> Is this list still active? I've not heard anything here since the end of
> last year, but hey… maybe we all know what we need and have kept quiet.

Funny... I saw the [ssml] archive in a Google query earlier today and
then here it is again!

Perhaps we need to get more users? I asked Sean Eddy to point HMMER
users at SSML, because he has no specific mailing list;

>From http://hmmer.janelia.org/

Support (lack thereof)
HMMER is not intended to be commercial software. In particular, it
comes with no promise of support whatsoever. Because I receive more
email than is humanly possible to answer, unfortunately I am not able
to respond individually to requests for help with the package, beyond
the help already provided in HMMER's general documentation. I do
respond gratefully to useful suggestions for improved documentation,
and to bug reports. If you find that you need more help, you may wish
to consider one of the several commercial versions of HMMER; see the
"Commercial versions" links on the left of this page.

But he never replied.

> Serious question:
>
> Is there an algorithm / method / program or server that can assess the
> quality of a nucleotide multiple sequence alignment (generated by a
> progressive clustering algorithm e.g. Clustal), without the need for a
> reference alignment?
>
>
>
> I've come across several papers & servers via Google & PubMed but all need
> reference alignments or are for protein sequences where structural
> information is available. The closest so far is by Ahola et al
> (http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=1687212),
> which looks great until you reach the end of the alignment quality section
> of the methods and find the sentence 'By comparing the ConsAAs calculated
> from the test and reference alignments…'.
>
> The ability to bootstrap trees generated from alignments is already well
> established and I'd like to get some thing that examines the alignment more
> directly.
>
>
>
> What I'm really looking for is a method that I can apply directly to the
> alignments generated by our users that is better than a simple set of column
> conservation scores across the alignment … but I'm starting to think that
> such a thing does not exist.

I am not sure either - its a good question though and an important
issue. People generally use a ROC curve to benchmark remote homology
detection algorithms using something like SCOP as a reference.
However, assessing the quality of two different (multiple) alignments
is harder. The difference is that the former methods only tries to
score the match of a sequence to a familiy, while the later tries to
match equivalent positions. This latter task again subdivides into the
different goals of obtaining the longest or the most highly scoring
alignment.

This search may give some hints;

http://www.google.com/search?&q=site%3Ahttps%3A%2F%2Flists.sdsc.edu%2Fpipermail%2Fpdb-l%2F%20alignment%20quality

And I just turned up this if you haven't seen it yet;

http://www.biomedcentral.com/1471-2105/7/471

One idea would be to compare the effectiveness of profile searches
based on the different alignments... I guess this raises the question
of 'what do you mean by alignment X is better than Y'. If you can
answer that question clearly (i.e. what do you want to use your
alignment for?), then you can probably work out how to test the
effectiveness of the two different alignments.

> Any advice would be helpful.

Does a 'true' multiple alignment really exist? And if it does, does it
perform any better than what you had given what you want to do?

Dan.

> Steve
>
>
>
> Bioinformatics Unit: Statistics, Modelling & Bioinformatics Department
>
> Center for Infections
>
> Health Protection Agency
>
> London
>
> UK
>
> http://www.hpa.org.uk/cfi/bioinformatics/index.htm
>
>
> _______________________________________________
> ssml-general mailing list
> ssml-general at bioinformatics.org
> https://bioinformatics.org/mailman/listinfo/ssml-general
>
>