[BiO BB] Testing a smith-waterman algorithm?

Theodore H. Smith delete at elfdata.com
Sat Mar 11 09:35:21 EST 2006


Hi people,

I've successfully designed, written and compiled a program that uses  
the smith-waterman algorithm.

Nothing new there, but it's for an interesting project, and before  
the project is complete, perhaps some questions asked to  
bioinformaticians can help bring me up to your level.

The next stage after compiling, is testing my algorithm. I now must  
write some tests for my code.

This is where I am seeing that I'm unsure if I even understand Smith- 
Waterman properly! I understand Levenshtein OK (similar to Needleman- 
Wunsch), but Smith-Waterman I'm a bit unclear on.

Mostly I'm wondering exactly how does local matching help us, over  
global matching. I got a lay person's description of why it helps,  
but I'm more interested in getting an exact feel for it.

Does it make sense to use English words as an example here, instead  
of protein sequences? That would help me understand this a bit  
better, as I have a better feel for English than proteins (unlike  
many of you).

Would then the main advantage be, for searching for short sequences  
within long ones, without being unfairly penalised by the non- 
matching ends of the long sequence?

For example: "extrapolate" could match "extra", far better in Smith- 
Waterman than it could using Levenshtein, because we aren't being  
penalised so badly by the "polate" part.

Or perhaps: "specialisation" would match "lisation" far better using  
local than global, because we aren't being penalised by the "specia"  
part so much.

Or even: "disestablishmentarianism" would match "establishment" far  
better using local than global, because we aren't being penalised by  
"dis" or "arianism".

Is that how local searches like Smith-Waterman benefit us?


What about when we are searching for two long sequences of which only  
a small part will match?

Let's say "disestabishmentarianism" against  
"reestablishmentSomeNonMatchingPart".

A local alignment should be able to figure out that "establishment"  
aligns well in this case.

Is that basically how Smith-Waterman helps us?

--
http://elfdata.com/plugin/






More information about the BBB mailing list