Hi Theodore, Doing a banded Smith-Waterman can help the speed of many searches, depending on the application. Cross-match is banded. FASTA and SSAHA use this method to do the final alignment. TimeLogic offers it as an option on their DeCypher products. So, it is not a new idea, but could always be improved. Cheers, Marty On 4/6/07, Theodore H. Smith <delete at elfdata.com> wrote: > Hi people, > > As a hobbyist (although hopefully I'll make something real people > will use in the real world), I am working on some bio-informatics > code involving Smith-Waterman. Sort of like SSEARCH, but faster. > > Now, I am trying to make my algorithm fast AND accurate for general > purpose use. I do not want to lose any accuracy over the full Smith- > Waterman implementation, so any bandedness might do, will have to be > banded in such a way that it only ignores cells that would never > contribute to the final answer anyhow. > > That is tricky to implement, making the bandedness accurate. I've got > some potential solutions which are based around making a second > matrix, which would be a "score recovery matrix". That would contain > the most amount of score that any cell could recover, assuming > everything went perfect! > > Or I could just skip bandedness altogether, which would save me a lot > of brain time and theoretical complication in inventing solutions to > minimise CPU time taken to process the vast amount of data needed to > process. > > ... > > So my question is... is banding helpful to a general purpose Smith- > Waterman based searcher? > > I can imagine a few situations. But I'm not sure if I am imagining > them properly. Maybe someone can correct on this? This is what I > think right now: > > 1) Searching for a short protein sequence (30-ish?) within a database > of long sequences (about 300?). I don't think bandedness will help me > more than 3% speed increase here overall. > > 2) Searching for a short protein sequence (30-ish) within a database > of short sequences (30-ish). Bandedness might help here. Not sure how > much. Maybe 3x speed increase. > > 3) Searching for a large sequence in a database of short sequences, > allowing for loose matches... I think bandnedness wouldn't help, > maybe 3% gain at best, possibly the overhead would hurt. > > 4) Searching for large sequences in a database of large sequences, > and we want tight matches. Bandedness would DEFINITELY help here! We > may get 20x or more speed up. > > I guess from my guesswork... it seems like bandedness would help me > sometimes a lot. And othertimes hurt a bit or just be useless. > > How often do bio-informatists do large sequence searching against > databases containing many large sequences? > > My leaning is towards implementing the bandedness... After all it's > supposed to be general purpose. > > > -- > http://elfdata.com/plugin/ > "String processing, done right" > > > _______________________________________________ > Biodevelopers mailing list > Biodevelopers at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/biodevelopers > -- -- Martin Gollery Associate Director Center For Bioinformatics University of Nevada at Reno Dept. of Biochemistry / MS334 775-784-7042 -----------