BLAST results (was RE: [Biodevelopers] Blast not symmetrical?)
Christopher Dwan
cdwan at bioteam.net
Fri Jan 19 09:42:23 EST 2007
BLAST e-values are (as mentioned in another response) statistical
hell. They vary with both the size of the query and the size of the
target set.
An e-value of "50" might best be read as "You expect to find about 50
alignments of this quality in the dataset that you searched."
Searching against a larger dataset will increase that number. Insert
an analogy of balls and urns here.
In theory, at least, the alignment scores should be the same, once
you dig the hit of interest out of all the other junk that will
surround it, from the larger dataset. Even more theoretically, the
alignment ought to be the same.
Another tricky bit is that, particularly at such low levels of
similarity, the many layers of heuristic approximation that BLAST
uses to make searches *faster* mean that you may anchor or extend at
a different point. In other words, since you're using a heuristic
rather than a complete algorithm, you need to accept that you're not
going to see all the answers that are, theoretically, out there.
My experience is that any BLAST hit worse than around 10e-6 or so is
probably "in the noise" of the big target sets.
-Chris Dwan
On Jan 18, 2007, at 5:30 PM, Angulo, David wrote:
> Marty (or anyone),
>
> Perhaps you can explain this to me. I searched an AA string
> against the rat genome. I got a poor hit (e-value about 50), so I
> decided to see if there was a homologous gene, and I searched
> against all genomes. Instead of returning the same hit (or
> better), but best hit I received was about 360! Why is this? I'm
> confounded.
>
> Thanks for your help.
>
> Dave
>
>
> -----Original Message-----
> From: biodevelopers-bounces
> +dangulo=cti.depaul.edu at bioinformatics.org on behalf of Martin Gollery
> Sent: Wed 1/17/2007 5:23 PM
> To: Development in Bioinformatics
> Subject: Re: [Biodevelopers] Blast not symmetrical?
>
> This is correct, BLAST is not symmetrical. Some assume that it is and
> make some pretty serious mistakes. Switch to Smith-Waterman and you
> will eliminate this problem.
>
> Marty
>
> On 1/17/07, Michael Nuhn <nuhn at rhrk.uni-kl.de> wrote:
>> Hello, Everybody!
>>
>> While I was trying to track down a "bug" in my program I found out
>> that the
>> blast program (Blastn v2.2.11) is not symmetrical, that is:
>>
>> If I blast a query sequence Q against a database S (1 sequence), I
>> get a
>> result set B(S,Q).
>>
>> If I do the blast the other way around, that is, I use S as query
>> sequence
>> and blast it against the database Q, I get a result B(Q,S).
>>
>> And the problem is: B(S,Q) and B(Q,S) are not equal. Each blast
>> set has some
>> blast hits that the other does not have and also some blast hits
>> that have
>> one common coordinate but end at another.
>>
>> Both blasts were made with the blast defaults, no filter was used.
>> The two
>> sequences are large (~2Mb each, the sequences are genomes).
>> According to the
>> statistics used in blast (at least the part I understand), it
>> should not
>> play a role which sequence is the query and which is the subject.
>>
>> Does anyone have an explanation for this? Since I don't really
>> have a clue
>> at where to start, hints and wild guesses are also appreciated.
>>
>> Thanks in advance,
>> Michael.
>>
>> _______________________________________________
>> Biodevelopers mailing list
>> Biodevelopers at bioinformatics.org
>> https://bioinformatics.org/mailman/listinfo/biodevelopers
>>
>
>
> --
> --
> Martin Gollery
> Associate Director
> Center For Bioinformatics
> University of Nevada at Reno
> Dept. of Biochemistry / MS334
> 775-784-7042
> -----------
> _______________________________________________
> Biodevelopers mailing list
> Biodevelopers at bioinformatics.org
> https://bioinformatics.org/mailman/listinfo/biodevelopers
>
> _______________________________________________
> Biodevelopers mailing list
> Biodevelopers at bioinformatics.org
> https://bioinformatics.org/mailman/listinfo/biodevelopers
More information about the Biodevelopers
mailing list