[Bioclusters] formatdb -o T/blast problem

Joe Landman bioclusters@bioinformatics.org
Thu, 11 Dec 2003 19:54:31 -0500


Hi Samir:

   Old memory (likely incorrect), but I seem to recall that blast uses a 
fixed number of characters from the identifier for generating the db 
hash index.  If you subset your database down to say 4 sequences, do you 
see the same error?  If so,  can you change the identifiers to something 
short and unique, and see if you get the same error?

Joe

Samir Pandurangi wrote:

>I'm blasting against databases I've created myself with formatdb. Each of
>one the deflines contains a one word unique identifier.
>When I run formatdb with the -o T option, I get the following errors from
>blastn:
>
>[blastallnew] ERROR: ncbiapi [000.000]  Jf_2959984_fasta.screen.Contig45:
>SeqPortNew: lcl|NM_031858 start(2490) >= len(2209)
>[blastallnew] ERROR: ncbiapi [000.000]  Jf_2959984_fasta.screen.Contig45:
>SeqPortNew: lcl|NM_031858 start(2490) >= len(2209)
>[blastallnew] ERROR: ncbiapi [000.000]  Jf_2959984_fasta.screen.Contig45:
>SeqPortNew: lcl|NM_005899 start(2490) >= len(1833)
>[blastallnew] ERROR: ncbiapi [000.000]  Jf_2959984_fasta.screen.Contig45:
>SeqPortNew: lcl|NM_005899 start(2490) >= len(1833)
>
>However, when I remove the -o T option, these error disappear, but I run
>into problems with duplicate target hits (where the HSPs are split under
>multiple hits with the same identifier). An example:
>
>Blastn run without -o T, exhibiting duplicate target problem:
>*******************************************
>  
>
>>NM_033178
>>    
>>
>          Length = 2560
>
> Score = 38.2 bits (19), Expect = 8.9
> Identities = 19/19 (100%)
> Strand = Plus / Minus
>
>
>Query: 14026 gccagccagccagccagcc 14044
>             |||||||||||||||||||
>Sbjct: 1265  gccagccagccagccagcc 1247
>
>
>
> Score = 38.2 bits (19), Expect = 8.9
> Identities = 19/19 (100%)
> Strand = Plus / Plus
>
>
>Query: 46274 ggctggctggctggctggc 46292
>             |||||||||||||||||||
>Sbjct: 1247  ggctggctggctggctggc 1265
>
>
>  
>
>>NM_033178
>>    
>>
>          Length = 2560
>
> Score = 38.2 bits (19), Expect = 8.9
> Identities = 19/19 (100%)
> Strand = Plus / Minus
>
>
>Query: 14026 gccagccagccagccagcc 14044
>             |||||||||||||||||||
>Sbjct: 1265  gccagccagccagccagcc 1247
>
>
>
> Score = 38.2 bits (19), Expect = 8.9
> Identities = 19/19 (100%)
> Strand = Plus / Plus
>
>
>Query: 46274 ggctggctggctggctggc 46292
>             |||||||||||||||||||
>Sbjct: 1247  ggctggctggctggctggc 1265
>**************************************
>
>Blastn run with same database except with -o T formatdb option (no
>duplicates seqIds):
>****************************************
>
>  
>
>>NM_033178
>>    
>>
>          Length = 2558
>
> Score = 38.2 bits (19), Expect = 8.9
> Identities = 0/19 (0%)
> Strand = Plus / Minus
>
>
>Query: 14026 gccagccagccagccagcc 14044
>
>Sbjct: 1265  cagccagccagccagccag 1247
>
>
>
> Score = 38.2 bits (19), Expect = 8.9
> Identities = 0/19 (0%)
> Strand = Plus / Plus
>
>
>Query: 46274 ggctggctggctggctggc 46292
>
>Sbjct: 1247  ctggctggctggctggctg 1265
>
> Score = 38.2 bits (19), Expect = 8.9
> Identities = 0/19 (0%)
> Strand = Plus / Minus
>
>
>Query: 14026 gccagccagccagccagcc 14044
>
>Sbjct: 1265  cagccagccagccagccag 1247
>
>
>
> Score = 38.2 bits (19), Expect = 8.9
> Identities = 0/19 (0%)
> Strand = Plus / Plus
>
>
>Query: 46274 ggctggctggctggctggc 46292
>
>Sbjct: 1247  ctggctggctggctggctg 1265
>
>
>************************************
>Does anyone know what is happening here?
>--
>Samir
>
>_______________________________________________
>Bioclusters maillist  -  Bioclusters@bioinformatics.org
>https://bioinformatics.org/mailman/listinfo/bioclusters
>  
>

-- 
Joseph Landman, Ph.D
Scalable Informatics LLC,
email: landman@scalableinformatics.com
web  : http://scalableinformatics.com
phone: +1 734 612 4615