[BiO BB] Testing a smith-waterman algorithm?
Theodore H. Smith
delete at elfdata.com
Sat Mar 11 09:35:21 EST 2006
I've successfully designed, written and compiled a program that uses
the smith-waterman algorithm.
Nothing new there, but it's for an interesting project, and before
the project is complete, perhaps some questions asked to
bioinformaticians can help bring me up to your level.
The next stage after compiling, is testing my algorithm. I now must
write some tests for my code.
This is where I am seeing that I'm unsure if I even understand Smith-
Waterman properly! I understand Levenshtein OK (similar to Needleman-
Wunsch), but Smith-Waterman I'm a bit unclear on.
Mostly I'm wondering exactly how does local matching help us, over
global matching. I got a lay person's description of why it helps,
but I'm more interested in getting an exact feel for it.
Does it make sense to use English words as an example here, instead
of protein sequences? That would help me understand this a bit
better, as I have a better feel for English than proteins (unlike
many of you).
Would then the main advantage be, for searching for short sequences
within long ones, without being unfairly penalised by the non-
matching ends of the long sequence?
For example: "extrapolate" could match "extra", far better in Smith-
Waterman than it could using Levenshtein, because we aren't being
penalised so badly by the "polate" part.
Or perhaps: "specialisation" would match "lisation" far better using
local than global, because we aren't being penalised by the "specia"
part so much.
Or even: "disestablishmentarianism" would match "establishment" far
better using local than global, because we aren't being penalised by
"dis" or "arianism".
Is that how local searches like Smith-Waterman benefit us?
What about when we are searching for two long sequences of which only
a small part will match?
Let's say "disestabishmentarianism" against
A local alignment should be able to figure out that "establishment"
aligns well in this case.
Is that basically how Smith-Waterman helps us?
More information about the BBB