[Biodevelopers] Re: BLAST asymmetrical

Fri Jan 19 05:19:29 EST 2007

http://owa1.dcs.gla.ac.uk - Re: BLAST asymmetricalHi,

>BLAST: my understanding is as follows: BLAST takes the
>query and builds for DNA an index of all the words of a
>given length w (parameter w).

Correct.

>Those words do not overlap.

They do overlap. My source is the O'Reilly book "BLAST", Chapter 5: "Blast",
subsection "Seeding". The example given is the sequence:

MGQLV which produces the words
MGQ
GQL and
QLV.

>For any two of the genomes, the words will be different, and if

I agree.

>there is a shift in what you would like to match,

There is no shift, because the words overlap, see above.

>So, basically, your starting segmentation of the query defines
>partly your outcome.

>And then during extension by DP between two hits, some
>statisical effects possibly come into play.

The statstics in the two hit algorithm must be hell. They skipped that in
the O'Reilly book. Luckily the two hit algo only applies for protein
searches. When DNA is compared to DNA, as in my case comparing genomes, the
two hit algorithm is not applied.

>You also have a filter, and I am not sure if repeat filter is
>applied to both query and database, or just to the
>database. This might influence the outcome as well.

I had deactivated the filter for my search.

>I am more of an algorithms person than a practitioner, so
>you may want to probe someone with more hands-on skills.
>David Leader may help, if you ask him. He teaches that stuff.

Thanks for the pointer. I will take a close look at his program.

Thanks also for your ideas on blast!

Cheers,
Michael.