[BiO BB] A question on Smith-Waterman algorithm

pmr at ebi.ac.uk pmr at ebi.ac.uk
Fri Sep 15 05:09:58 EDT 2006


Dear WoA

> In the SW algo. mismatches are given negative scores.
> Does this mean I can not use an Identity Scoring
> Matrix ( 1 for match and 0 for mismatch) for aligning
> DNA sequences? Does the term "Mismatch" applies for
> protein scoring matrices like PAM and BLOSUM

No, you cannot use a matrix with only 1 and 0. Well, you can - but it will
not work.

This is because of the way the Smith Waterman algorithm works. It
calculates scores for all pairwise matches, allows for gap penalties,
finds the highest score anywhere in the matrix and works back until the
score becomes negative.

It is the "becomes negative" that catches you. With no negative scores in
the matrix you will get a global (Needleman Wunsch) alignment instead,
starting at one terminmating edge of the matrix (because scores will never
go down) and ending at one of the starting edges.

Mismatch scores for nucleotide are simply mismatches usually all with the
same score (you can adjust for G:U base pairing in RNA) - there is not the
same concept of partial matches that you have with protein matrices.

So, pick a reasonable identity score (it doesn't have to be 1, you can try
10 to avoid a +1 and -1 matrix)) and something negative for everything
else.

Hope that helps,

Peter Rice




More information about the BBB mailing list