[Biodevelopers] Question on gaps vs substitution

Michael Nuhn nuhn at rhrk.uni-kl.de
Thu Jul 12 05:25:15 EDT 2007


Hello, Theodore!

> Is there ever a case where two inserts, could be less costly than a
> substituion? In fact it's not even just "two inserts", it's an insert
> followed by a delete.

I don't know any such case but it would be possible. The program blast
allows the user to specify the gap penalty on the command line (though I
have never ever seen anyone do that)...

> This would require that an insert followed by a deletion, is cheaper
> than a replacement.
>
> Can this be true?
>
> I've inspected the BLOSUM62 matrix (-4 is the worst penalty for
> replacement), and the default gap penalties (-11,-1), and it seems
> like using BLOSUM62, it's impossible to get "insert+delete" instead
> of one replacement.

...so a user could specify a gap penalty which is >-4.

It is not wise to restrict your algorithm to one specific matrix since users
will claim that they will want to use other matrices. (Even if they wont.)

The blast algorithm would not even find this sort of thing.

> So this would be a good case for me, as it would simplify my
> algorithm. I'm happier if insert+delete is always worse than
> replacement.
>
> But does this pattern hold across all biological uses? Can there be a
> case where an insert followed by a delete, is better than a
> replacement? If so, would it be something so contrived and unlikely
> and not really useful, that I can just ignore it and tell my users
> that my software has a design restriction that insert+delete must
> always score worse than a substitution?

If you tell your users that, they will not understand the special case the
restriction applies to. So they will only remember "has some sort of
restriction" and not use it out of fear of loosing relevant alignments.

To handle your special case, it would suffice to scan the matrix (which is
small) and check if there is any substitution value with a greater penalty
than the insert+delete penalty. If this is the case your program terminates
and blames the user or the matrix for it.

> Also, if an insert is followed by a delete, does it count as -11-11
> (-22) or -11-1 (-12)?

It is <cost for opening a gap> + <cost for deletion>

The -1 is the penalty for extending a gap. That is not the case here.

> Thanks to all who can help! My algorithm is really close to
> finishing. I can see it almost in my hands. I will be very proud to
> finish it. Whether or not the algorithm fast enough to be is of use,
> is another matter, but it will function correctly at least!

Announce here, when you are done!

Greetings,
Michael.

--
-----------------------------------------------------------
Dipl.-Inform. Michael Nuhn
Bioinformatik
Zentrum für Nanostrukturtechnologie und
Molekularbiologische Technologie

+49 (0)631 - 205 4334
nuhn at rhrk.uni-kl.de
http://nbc3.biologie.uni-kl.de/
-----------------------------------------------------------



More information about the Biodevelopers mailing list