BLAST results (was RE: [Biodevelopers] Blast not symmetrical?)

Iddo Friedberg idoerg at burnham.org
Fri Jan 19 12:35:03 EST 2007


Angulo, David wrote:
> Marty (or anyone),
> 
> Perhaps you can explain this to me.  I searched an AA string against the rat genome.  

I got a poor hit (e-value about 50), so I decided to see if there was a 
homologous gene, and I searched against all genomes.

Instead of returning the same hit (or better), but best hit I received 
was about 360!  Why is this?  I'm confounded.


the E-value is dependent upon the size of your search space. The reason 
you received a worse score when searching a larger database is simply 
due to more noise in the databse. E values of that magnitude (50 or 360) 
are essentially the same as far as (non) significance is concerned.



> 
> Thanks for your help.
> 
> Dave
> 
> 
> -----Original Message-----
> From: biodevelopers-bounces+dangulo=cti.depaul.edu at bioinformatics.org on behalf of Martin Gollery
> Sent: Wed 1/17/2007 5:23 PM
> To: Development in Bioinformatics
> Subject: Re: [Biodevelopers] Blast not symmetrical?
>  
> This is correct, BLAST is not symmetrical. Some assume that it is and
> make some pretty serious mistakes. Switch to Smith-Waterman and you
> will eliminate this problem.
> 
> Marty
> 
> On 1/17/07, Michael Nuhn <nuhn at rhrk.uni-kl.de> wrote:
>> Hello, Everybody!
>>
>> While I was trying to track down a "bug" in my program I found out that the
>> blast program (Blastn v2.2.11) is not symmetrical, that is:
>>
>> If I blast a query sequence Q against a database S (1 sequence), I get a
>> result set B(S,Q).
>>
>> If I do the blast the other way around, that is, I use S as query sequence
>> and blast it against the database Q, I get a result B(Q,S).
>>
>> And the problem is: B(S,Q) and B(Q,S) are not equal. Each blast set has some
>> blast hits that the other does not have and also some blast hits that have
>> one common coordinate but end at another.
>>
>> Both blasts were made with the blast defaults, no filter was used. The two
>> sequences are large (~2Mb each, the sequences are genomes). According to the
>> statistics used in blast (at least the part I understand), it should not
>> play a role which sequence is the query and which is the subject.
>>
>> Does anyone have an explanation for this? Since I don't really have a clue
>> at where to start, hints and wild guesses are also appreciated.
>>
>> Thanks in advance,
>> Michael.
>>
>> _______________________________________________
>> Biodevelopers mailing list
>> Biodevelopers at bioinformatics.org
>> https://bioinformatics.org/mailman/listinfo/biodevelopers
>>
> 
> 
> 
> ------------------------------------------------------------------------
> 
> _______________________________________________
> Biodevelopers mailing list
> Biodevelopers at bioinformatics.org
> https://bioinformatics.org/mailman/listinfo/biodevelopers


-- 
Iddo Friedberg, Ph.D.
Burnham Institute for Medical Research
10901 N. Torrey Pines Rd.
La Jolla, CA 92037, USA
T: +1 858 646 3100 x3516
http://iddo-friedberg.org
http://BioFunctionPrediction.org


More information about the Biodevelopers mailing list