[Bioclusters] blast 2.2.9 issues with expect value (-e)

bioclusters@bioinformatics.org bioclusters@bioinformatics.org
Fri, 25 Jun 2004 15:19:06 -0500


Hi Fernan,

Thanks for pointing me in the right direction.  The problem was indeed with
formatdb.  But, it wasn't a problem with the version of format db, it was a
problem with how it handled a sequence id in my blastable database.

I noticed that the erroneous results only occurred for one of the sequences
in my blastable database.  So I started to investigate that sequence id:

The sequence id that lead to erroneous results was:  ">8BP_ 5714596"  (has
underscore and then a space)

when I changed it to ">8BP_5714596" (has just underscore)  or "> 8BP
5714596" (has just space) the weird results disappeared.

So it seems to be a problem with having "_<space>" in the id line in
formatdb.  I also found that this was also true for 2.2.6, but the
discovery happened to coincide with my update so I thought that the blastdb
update was the issue.

Kind of a weird one!

Thanks to everyone for the help.

-Bonnie




                                                                                                                                      
                      Fernan Aguero                                                                                                   
                      <fernan@iib.unsam.edu.ar>        To:       BHurwitz@twt.com                                                     
                      Sent by:                         cc:       biocluster <bioclusters@bioinformatics.org>                          
                      bioclusters-admin@bioinfo        Subject:  Re: [Bioclusters] blast 2.2.9 issues with expect value (-e)          
                      rmatics.org                                                                                                     
                                                                                                                                      
                                                                                                                                      
                      06/25/2004 04:15 PM                                                                                             
                      Please respond to                                                                                               
                      bioclusters                                                                                                     
                                                                                                                                      
                                                                                                                                      




+----[ BHurwitz@twt.com <BHurwitz@twt.com> (25.Jun.2004 15:56):
|
| Sequences are appearing to have the same expect value
| although the matches in the hsp are very different.

The question you should be asking is why the alignments
below have the same score. If they have the same score,
they'll have the same E value, since the E value is (by
definition) the number of alignments with a score >= S (24.3
bits in this case) that you expect to occur by chance.

Have you indexed the databases using a 2.2.9 formatdb?
Perhaps you're using databases indexed with an older
formatdb?.  I've also seen strange results myself in the
past, and the problem was that somehow the indices created
by older versions of formatdb would not be properly read by
the newer blast executables.

Hope this helps,

Fernan

| From blast:
|
| > 5846704
|           Length = 177
|
|  Score = 24.3 bits (12), Expect = 0.16
|  Identities = 18/20 (90%)
|  Strand = Plus / Minus
|
| Query: 13  ctccagacattgggcgggtt 32
|            |||||| ||||| |||||||
| Sbjct: 127 ctccaggcattgagcgggtt 108
|
|
| > 5714596
|           Length = 9401
|
|  Score = 24.3 bits (12), Expect = 0.16
|  Identities = 8/20 (40%)
|  Strand = Plus / Minus
|
|
| Query: 13  ctccagacattgggcgggtt 32
|              | ||| |     |||
| Sbjct: 202 tccaagaaaggacccggtcg 183
|
| -Bonnie
|
+----]

--
F e r n a n   A g u e r o
http://genoma.unsam.edu.ar/~fernan
_______________________________________________
Bioclusters maillist  -  Bioclusters@bioinformatics.org
https://bioinformatics.org/mailman/listinfo/bioclusters