[Bioclusters] formatdb -o T/blast problem

Samir Pandurangi bioclusters@bioinformatics.org
Thu, 11 Dec 2003 15:34:54 -0800


I'm blasting against databases I've created myself with formatdb. Each of
one the deflines contains a one word unique identifier.
When I run formatdb with the -o T option, I get the following errors from
blastn:

[blastallnew] ERROR: ncbiapi [000.000]  Jf_2959984_fasta.screen.Contig45:
SeqPortNew: lcl|NM_031858 start(2490) >= len(2209)
[blastallnew] ERROR: ncbiapi [000.000]  Jf_2959984_fasta.screen.Contig45:
SeqPortNew: lcl|NM_031858 start(2490) >= len(2209)
[blastallnew] ERROR: ncbiapi [000.000]  Jf_2959984_fasta.screen.Contig45:
SeqPortNew: lcl|NM_005899 start(2490) >= len(1833)
[blastallnew] ERROR: ncbiapi [000.000]  Jf_2959984_fasta.screen.Contig45:
SeqPortNew: lcl|NM_005899 start(2490) >= len(1833)

However, when I remove the -o T option, these error disappear, but I run
into problems with duplicate target hits (where the HSPs are split under
multiple hits with the same identifier). An example:

Blastn run without -o T, exhibiting duplicate target problem:
*******************************************
>NM_033178
          Length = 2560

 Score = 38.2 bits (19), Expect = 8.9
 Identities = 19/19 (100%)
 Strand = Plus / Minus


Query: 14026 gccagccagccagccagcc 14044
             |||||||||||||||||||
Sbjct: 1265  gccagccagccagccagcc 1247



 Score = 38.2 bits (19), Expect = 8.9
 Identities = 19/19 (100%)
 Strand = Plus / Plus


Query: 46274 ggctggctggctggctggc 46292
             |||||||||||||||||||
Sbjct: 1247  ggctggctggctggctggc 1265


>NM_033178
          Length = 2560

 Score = 38.2 bits (19), Expect = 8.9
 Identities = 19/19 (100%)
 Strand = Plus / Minus


Query: 14026 gccagccagccagccagcc 14044
             |||||||||||||||||||
Sbjct: 1265  gccagccagccagccagcc 1247



 Score = 38.2 bits (19), Expect = 8.9
 Identities = 19/19 (100%)
 Strand = Plus / Plus


Query: 46274 ggctggctggctggctggc 46292
             |||||||||||||||||||
Sbjct: 1247  ggctggctggctggctggc 1265
**************************************

Blastn run with same database except with -o T formatdb option (no
duplicates seqIds):
****************************************

>NM_033178
          Length = 2558

 Score = 38.2 bits (19), Expect = 8.9
 Identities = 0/19 (0%)
 Strand = Plus / Minus


Query: 14026 gccagccagccagccagcc 14044

Sbjct: 1265  cagccagccagccagccag 1247



 Score = 38.2 bits (19), Expect = 8.9
 Identities = 0/19 (0%)
 Strand = Plus / Plus


Query: 46274 ggctggctggctggctggc 46292

Sbjct: 1247  ctggctggctggctggctg 1265

 Score = 38.2 bits (19), Expect = 8.9
 Identities = 0/19 (0%)
 Strand = Plus / Minus


Query: 14026 gccagccagccagccagcc 14044

Sbjct: 1265  cagccagccagccagccag 1247



 Score = 38.2 bits (19), Expect = 8.9
 Identities = 0/19 (0%)
 Strand = Plus / Plus


Query: 46274 ggctggctggctggctggc 46292

Sbjct: 1247  ctggctggctggctggctg 1265


************************************
Does anyone know what is happening here?
--
Samir