# [BiO BB] Understanding Smith-Waterman scoring

Theodore H. Smith delete at elfdata.com
Fri Feb 10 09:13:29 EST 2006

```Hi people,

I'm trying to learn about Smith-Waterman. There is one thing I
haven't seen answered in explanations of the Smith-Waterman algorithm.

How does it score alignments that come in sections? Does it give a
penalty if a sequence must be split up?

For example, let's say I had the protein AAAABBBB, and I wanted to
see how this scored against the protein BBBBAAAA. Let's ignore the
fact that it can be reversed, for the moment, just so I can
understand how should Smith-Waterman work.

Now, what would the match score be? Let's assume that A to A has a
score of 1 and B to B also has a score of 1. Its a really simple
example. So matching AAAABBBB to itself, would give a SW score of 8.

What would matching BBBBAAAA to AAAABBBB give?

I'd expect it to generate two "sections", like this:

AAAA
::::
AAAA

BBBB
::::
BBBB

But what should the overall score be? Is it still 8? Or should we
give a penalty because we've had to split this up? Is it normal for
alignment tools to give penalties to segmented sequences. Also is
there some kind of "minimum length" that a Smith-Waterman based
aligner would allow? Would it say that you can't have sections below
a certain length? Are there any tools which let you specify such a
minimum section length?

If you don't like that example above of AAAABBBB (as it can be
reversed), then try this example. Assume all the proteins get a score
of 1 against themselves. The protein: ABCDEFGH, if I did a Smith-
Waterman score comparison against DCHABGEF, would the score still be
8. After all, all the proteins are there, just in a different order.

I would expect this to get a score of zero or below.

It's a really basic question, sorry about that!

```