BLAST results (was RE: [Biodevelopers] Blast not symmetrical?)

Christopher Dwan cdwan at bioteam.net
Fri Jan 19 09:42:23 EST 2007


BLAST e-values are (as mentioned in another response) statistical  
hell.  They vary with both the size of the query and the size of the  
target set.

An e-value of "50" might best be read as "You expect to find about 50  
alignments of this quality in the dataset that you searched."     
Searching against a larger dataset will increase that number.  Insert  
an analogy of balls and urns here.

In theory, at least, the alignment scores should be the same, once  
you dig the hit of interest out of all the other junk that will  
surround it, from the larger dataset.  Even more theoretically, the  
alignment ought to be the same.

Another tricky bit is that, particularly at such low levels of  
similarity, the many layers of heuristic approximation that BLAST  
uses to make searches *faster* mean that you may anchor or extend at  
a different point.  In other words, since you're using a heuristic  
rather than a complete algorithm, you need to accept that you're not  
going to see all the answers that are, theoretically, out there.

My experience is that any BLAST hit worse than around 10e-6 or so is  
probably "in the noise" of the big target sets.

-Chris Dwan

On Jan 18, 2007, at 5:30 PM, Angulo, David wrote:

> Marty (or anyone),
>
> Perhaps you can explain this to me.  I searched an AA string  
> against the rat genome.  I got a poor hit (e-value about 50), so I  
> decided to see if there was a homologous gene, and I searched  
> against all genomes.  Instead of returning the same hit (or  
> better), but best hit I received was about 360!  Why is this?  I'm  
> confounded.
>
> Thanks for your help.
>
> Dave
>
>
> -----Original Message-----
> From: biodevelopers-bounces 
> +dangulo=cti.depaul.edu at bioinformatics.org on behalf of Martin Gollery
> Sent: Wed 1/17/2007 5:23 PM
> To: Development in Bioinformatics
> Subject: Re: [Biodevelopers] Blast not symmetrical?
>
> This is correct, BLAST is not symmetrical. Some assume that it is and
> make some pretty serious mistakes. Switch to Smith-Waterman and you
> will eliminate this problem.
>
> Marty
>
> On 1/17/07, Michael Nuhn <nuhn at rhrk.uni-kl.de> wrote:
>> Hello, Everybody!
>>
>> While I was trying to track down a "bug" in my program I found out  
>> that the
>> blast program (Blastn v2.2.11) is not symmetrical, that is:
>>
>> If I blast a query sequence Q against a database S (1 sequence), I  
>> get a
>> result set B(S,Q).
>>
>> If I do the blast the other way around, that is, I use S as query  
>> sequence
>> and blast it against the database Q, I get a result B(Q,S).
>>
>> And the problem is: B(S,Q) and B(Q,S) are not equal. Each blast  
>> set has some
>> blast hits that the other does not have and also some blast hits  
>> that have
>> one common coordinate but end at another.
>>
>> Both blasts were made with the blast defaults, no filter was used.  
>> The two
>> sequences are large (~2Mb each, the sequences are genomes).  
>> According to the
>> statistics used in blast (at least the part I understand), it  
>> should not
>> play a role which sequence is the query and which is the subject.
>>
>> Does anyone have an explanation for this? Since I don't really  
>> have a clue
>> at where to start, hints and wild guesses are also appreciated.
>>
>> Thanks in advance,
>> Michael.
>>
>> _______________________________________________
>> Biodevelopers mailing list
>> Biodevelopers at bioinformatics.org
>> https://bioinformatics.org/mailman/listinfo/biodevelopers
>>
>
>
> -- 
> -- 
> Martin Gollery
> Associate Director
> Center For Bioinformatics
> University of Nevada at Reno
> Dept. of Biochemistry / MS334
> 775-784-7042
> -----------
> _______________________________________________
> Biodevelopers mailing list
> Biodevelopers at bioinformatics.org
> https://bioinformatics.org/mailman/listinfo/biodevelopers
>
> _______________________________________________
> Biodevelopers mailing list
> Biodevelopers at bioinformatics.org
> https://bioinformatics.org/mailman/listinfo/biodevelopers



More information about the Biodevelopers mailing list