[Biodevelopers] Could banded search help a general purposeSmith-Waterman based algorithm?

Mon Apr 9 18:40:40 EDT 2007

Hi Theodore,

Doing a banded Smith-Waterman can help the speed of many searches,
depending on the application. Cross-match is banded. FASTA and SSAHA
use this method to do the final alignment. TimeLogic offers it as an
option on their DeCypher products. So, it is not a new idea, but could
always be improved.

Cheers,
Marty

On 4/6/07, Theodore H. Smith <delete at elfdata.com> wrote:
> Hi people,
>
> As a hobbyist (although hopefully I'll make something real people
> will use in the real world), I am working on some bio-informatics
> code involving Smith-Waterman. Sort of like SSEARCH, but faster.
>
> Now, I am trying to make my algorithm fast AND accurate for general
> purpose use. I do not want to lose any accuracy over the full Smith-
> Waterman implementation, so any bandedness might do, will have to be
> banded in such a way that it only ignores cells that would never
> contribute to the final answer anyhow.
>
> That is tricky to implement, making the bandedness accurate. I've got
> some potential solutions which are based around making a second
> matrix, which would be a "score recovery matrix". That would contain
> the most amount of score that any cell could recover, assuming
> everything went perfect!
>
> Or I could just skip bandedness altogether, which would save me a lot
> of brain time and theoretical complication in inventing solutions to
> minimise CPU time taken to process the vast amount of data needed to
> process.
>
> ...
>
> So my question is... is banding helpful to a general purpose Smith-
> Waterman based searcher?
>
> I can imagine a few situations. But I'm not sure if I am imagining
> them properly. Maybe someone can correct on this? This is what I
> think right now:
>
> 1) Searching for a short protein sequence (30-ish?) within a database
> of long sequences (about 300?). I don't think bandedness will help me
> more than 3% speed increase here overall.
>
> 2) Searching for a short protein sequence (30-ish) within a database
> of short sequences (30-ish). Bandedness might help here. Not sure how
> much. Maybe 3x speed increase.
>
> 3) Searching for a large sequence in a database of short sequences,
> allowing for loose matches... I think bandnedness wouldn't help,
> maybe 3% gain at best, possibly the overhead would hurt.
>
> 4) Searching for large sequences in a database of large sequences,
> and we want tight matches. Bandedness would DEFINITELY help here! We
> may get 20x or more speed up.
>
> I guess from my guesswork... it seems like bandedness would help me
> sometimes a lot. And othertimes hurt a bit or just be useless.
>
> How often do bio-informatists do large sequence searching against
> databases containing many large sequences?
>
> My leaning is towards implementing the bandedness... After all it's
> supposed to be general purpose.
>
>
> --
> http://elfdata.com/plugin/
> "String processing, done right"
>
>
> _______________________________________________
> Biodevelopers mailing list
> Biodevelopers at bioinformatics.org
> https://bioinformatics.org/mailman/listinfo/biodevelopers
>

-- 
-- 
Martin Gollery
Associate Director
Center For Bioinformatics
University of Nevada at Reno
Dept. of Biochemistry / MS334
775-784-7042
-----------