Hi people, I have a question on gap penalties and substitution scoring. The answer to this question could make a factor of around 1.5x difference the speed of a blast-like algorithm I'm working on. And double the complexity of one small component used in the algorithm. So, I'm using a Smith-Waterman approach combined with some other approaches to speed up Smith-Waterman by a few order of magnitude. So my problem area is totally Smith-Waterman based and anyone who knows Smith-waterman algorithm might be able to help me. Is there ever a case where two inserts, could be less costly than a substituion? In fact it's not even just "two inserts", it's an insert followed by a delete. Basically, a case where going right, then down, is less costly than going diagonally? So if we had a matrix like this: ... , 5, 4, ... ... , 4, 3, ... Let's say in this case, the replacement from the "5" cell, to the "3" cell would have taken 7 points, leaving us with -2 (probably would be raised to zero as how S-W works). But going right from 5 to 4 only took 1 point away, and then going down from 4 to 3 took 1 point. This would require that an insert followed by a deletion, is cheaper than a replacement. Can this be true? I've inspected the BLOSUM62 matrix (-4 is the worst penalty for replacement), and the default gap penalties (-11,-1), and it seems like using BLOSUM62, it's impossible to get "insert+delete" instead of one replacement. So this would be a good case for me, as it would simplify my algorithm. I'm happier if insert+delete is always worse than replacement. But does this pattern hold across all biological uses? Can there be a case where an insert followed by a delete, is better than a replacement? If so, would it be something so contrived and unlikely and not really useful, that I can just ignore it and tell my users that my software has a design restriction that insert+delete must always score worse than a substitution? Sorry for this basic question! I know I am asking a basic question, but then it really does help to have your basics correct before forming some super-fast algorithm. I can't have it super-fast and wrong now can I :) Basics always come first. Also, if an insert is followed by a delete, does it count as -11-11 (-22) or -11-1 (-12)? Thanks to all who can help! My algorithm is really close to finishing. I can see it almost in my hands. I will be very proud to finish it. Whether or not the algorithm fast enough to be is of use, is another matter, but it will function correctly at least! -- http://elfdata.com/plugin/ "String processing, done right"