[BiO BB] Understanding Smith-Waterman scoring

Sat Feb 11 01:25:50 EST 2006

Theodore,

Smith-Waterman will find all the alignments.  Remember, a mismatch must
have a negative score.  Once the aligned region drops to 0, the end of
the alignment is reached.  

A second area alignment is found by looking at the matrix of scores it
generated and locating the next highest score. You can then trace back
along the diagonal until you get zero at which point you reached the end
of the next alignment.

Ryan

-----Original Message-----
From: bio_bulletin_board-bounces+golharam=umdnj.edu at bioinformatics.org
[mailto:bio_bulletin_board-bounces+golharam=umdnj.edu at bioinformatics.org
] On Behalf Of Theodore H. Smith
Sent: Friday, February 10, 2006 9:53 AM
To: The general forum at Bioinformatics.Org
Subject: Re: [BiO BB] Understanding Smith-Waterman scoring

On 10 Feb 2006, at 14:22, Peter Rice wrote:

> Theodore H. Smith wrote:
>
>> How does it score alignments that come in sections? Does it give
>> a  penalty if a sequence must be split up?
>
> You get one alignment.
>
> If more than one "section" aligns ... with the parts in the same
> order in both proteins ... you can have a misaligned region and/or  
> gaps in the sequences. There are penalty scores for the  
> misalignments and the gaps.

OK. I understand. The most popular tools in use today, only find the  
best (or at least one) locally aligned section, but not all of them.

Is this a problem in general? Or is it that multiple sections to be  
aligned, are quite rare in the kind of queries that biologists do today?

> There is also a Smith-Waterman-Eggert variation of the algorithm
> that finds a scond, third, fourth ... alignment that excludes all  
> those already reported.

Am I right in seeing that this isn't talked about as much as Smith- 
Waterman though? It sounds promising for the line of work I am doing  
however, thanks very much for telling me of Smith-Waterman-Eggert, it  
looks like a good lead.

>> What would matching BBBBAAAA to AAAABBBB give?
>
> AAAA matching AAAA or BBBB matching BBBB (unless A has a positive
> score to match B, then other results are possible)

Which would I get? Does it depend on the tool? Do I get the first  
alignment, the last, or the best?

>> I'd expect it to generate two "sections", like this:
>
> No, but you will get the second section from the Smith-Waterman-
> Eggert algorithm. Each will have its own local alignment score.

Thanks. Sounds very interesting.

>> But what should the overall score be? Is it still 8? Or should we   
>> give a penalty because we've had to split this up? Is it normal
>> for  alignment tools to give penalties to segmented sequences.  
>> Also is  there some kind of "minimum length" that a Smith-Waterman  
>> based  aligner would allow? Would it say that you can't have  
>> sections below  a certain length? Are there any tools which let  
>> you specify such a  minimum section length?
>
>> If you don't like that example above of AAAABBBB (as it can be   
>> reversed), then try this example. Assume all the proteins get a
>> score  of 1 against themselves. The protein: ABCDEFGH, if I did a  
>> Smith- Waterman score comparison against DCHABGEF, would the score  
>> still be  8. After all, all the proteins are there, just in a  
>> different order.
>> I would expect this to get a score of zero or below.
>
> Be careful not to confuse protein (the whole sequence) with amino
> acid or residue (one character).

You might not be surprised to find out that I come from a software  
developer background. I won't make that mistake again.

> You will get at least 1 residue matching. Maybe more as some of the
> mismatches will have a positive score.
>
> Hope that helps. It is cmoplicated :-)

Yes it's been of great help. And yes it is complicated :)

_______________________________________________
Bioinformatics.Org general forum  -
BiO_Bulletin_Board at bioinformatics.org
https://bioinformatics.org/mailman/listinfo/bio_bulletin_board