Michael Nuhn wrote: > Hello, Everybody! > > While I was trying to track down a "bug" in my program I found out that the > blast program (Blastn v2.2.11) is not symmetrical, that is: > > If I blast a query sequence Q against a database S (1 sequence), I get a > result set B(S,Q). > > If I do the blast the other way around, that is, I use S as query sequence > and blast it against the database Q, I get a result B(Q,S). > > And the problem is: B(S,Q) and B(Q,S) are not equal. Each blast set has some > blast hits that the other does not have and also some blast hits that have > one common coordinate but end at another. > > Both blasts were made with the blast defaults, no filter was used. The two > sequences are large (~2Mb each, the sequences are genomes). According to the > statistics used in blast (at least the part I understand), it should not > play a role which sequence is the query and which is the subject. > > Does anyone have an explanation for this? Since I don't really have a clue > at where to start, hints and wild guesses are also appreciated. > > Thanks in advance, > Michael. > > _______________________________________________ > Biodevelopers mailing list > Biodevelopers at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/biodevelopers This is a very well known phenomenon. You can read more about it here: http://www.ncbi.nlm.nih.gov/BLAST/tutorial/Altschul-1.html Why this appears to be counter-intuitive is because people mistake BLAST scores (a sort of bit value) with alignment score. The raw alignment score will be same for both the alignments. Doesn't matter which one is query. But Blast bit score is then calculated from the raw alignment score taking into consideration of background distribution of the amino acids of the query sequence. Because, the query sequences differs in composition the bit-scores will be different (and the e-value, which is calculated from the bit-score). I guess that's a over-simplified explanation. -- Malay K Basu www.malaybasu.net