Hi people, As a hobbyist (although hopefully I'll make something real people will use in the real world), I am working on some bio-informatics code involving Smith-Waterman. Sort of like SSEARCH, but faster. Now, I am trying to make my algorithm fast AND accurate for general purpose use. I do not want to lose any accuracy over the full Smith- Waterman implementation, so any bandedness might do, will have to be banded in such a way that it only ignores cells that would never contribute to the final answer anyhow. That is tricky to implement, making the bandedness accurate. I've got some potential solutions which are based around making a second matrix, which would be a "score recovery matrix". That would contain the most amount of score that any cell could recover, assuming everything went perfect! Or I could just skip bandedness altogether, which would save me a lot of brain time and theoretical complication in inventing solutions to minimise CPU time taken to process the vast amount of data needed to process. ... So my question is... is banding helpful to a general purpose Smith- Waterman based searcher? I can imagine a few situations. But I'm not sure if I am imagining them properly. Maybe someone can correct on this? This is what I think right now: 1) Searching for a short protein sequence (30-ish?) within a database of long sequences (about 300?). I don't think bandedness will help me more than 3% speed increase here overall. 2) Searching for a short protein sequence (30-ish) within a database of short sequences (30-ish). Bandedness might help here. Not sure how much. Maybe 3x speed increase. 3) Searching for a large sequence in a database of short sequences, allowing for loose matches... I think bandnedness wouldn't help, maybe 3% gain at best, possibly the overhead would hurt. 4) Searching for large sequences in a database of large sequences, and we want tight matches. Bandedness would DEFINITELY help here! We may get 20x or more speed up. I guess from my guesswork... it seems like bandedness would help me sometimes a lot. And othertimes hurt a bit or just be useless. How often do bio-informatists do large sequence searching against databases containing many large sequences? My leaning is towards implementing the bandedness... After all it's supposed to be general purpose. -- http://elfdata.com/plugin/ "String processing, done right"