Hi Samir: Old memory (likely incorrect), but I seem to recall that blast uses a fixed number of characters from the identifier for generating the db hash index. If you subset your database down to say 4 sequences, do you see the same error? If so, can you change the identifiers to something short and unique, and see if you get the same error? Joe Samir Pandurangi wrote: >I'm blasting against databases I've created myself with formatdb. Each of >one the deflines contains a one word unique identifier. >When I run formatdb with the -o T option, I get the following errors from >blastn: > >[blastallnew] ERROR: ncbiapi [000.000] Jf_2959984_fasta.screen.Contig45: >SeqPortNew: lcl|NM_031858 start(2490) >= len(2209) >[blastallnew] ERROR: ncbiapi [000.000] Jf_2959984_fasta.screen.Contig45: >SeqPortNew: lcl|NM_031858 start(2490) >= len(2209) >[blastallnew] ERROR: ncbiapi [000.000] Jf_2959984_fasta.screen.Contig45: >SeqPortNew: lcl|NM_005899 start(2490) >= len(1833) >[blastallnew] ERROR: ncbiapi [000.000] Jf_2959984_fasta.screen.Contig45: >SeqPortNew: lcl|NM_005899 start(2490) >= len(1833) > >However, when I remove the -o T option, these error disappear, but I run >into problems with duplicate target hits (where the HSPs are split under >multiple hits with the same identifier). An example: > >Blastn run without -o T, exhibiting duplicate target problem: >******************************************* > > >>NM_033178 >> >> > Length = 2560 > > Score = 38.2 bits (19), Expect = 8.9 > Identities = 19/19 (100%) > Strand = Plus / Minus > > >Query: 14026 gccagccagccagccagcc 14044 > ||||||||||||||||||| >Sbjct: 1265 gccagccagccagccagcc 1247 > > > > Score = 38.2 bits (19), Expect = 8.9 > Identities = 19/19 (100%) > Strand = Plus / Plus > > >Query: 46274 ggctggctggctggctggc 46292 > ||||||||||||||||||| >Sbjct: 1247 ggctggctggctggctggc 1265 > > > > >>NM_033178 >> >> > Length = 2560 > > Score = 38.2 bits (19), Expect = 8.9 > Identities = 19/19 (100%) > Strand = Plus / Minus > > >Query: 14026 gccagccagccagccagcc 14044 > ||||||||||||||||||| >Sbjct: 1265 gccagccagccagccagcc 1247 > > > > Score = 38.2 bits (19), Expect = 8.9 > Identities = 19/19 (100%) > Strand = Plus / Plus > > >Query: 46274 ggctggctggctggctggc 46292 > ||||||||||||||||||| >Sbjct: 1247 ggctggctggctggctggc 1265 >************************************** > >Blastn run with same database except with -o T formatdb option (no >duplicates seqIds): >**************************************** > > > >>NM_033178 >> >> > Length = 2558 > > Score = 38.2 bits (19), Expect = 8.9 > Identities = 0/19 (0%) > Strand = Plus / Minus > > >Query: 14026 gccagccagccagccagcc 14044 > >Sbjct: 1265 cagccagccagccagccag 1247 > > > > Score = 38.2 bits (19), Expect = 8.9 > Identities = 0/19 (0%) > Strand = Plus / Plus > > >Query: 46274 ggctggctggctggctggc 46292 > >Sbjct: 1247 ctggctggctggctggctg 1265 > > Score = 38.2 bits (19), Expect = 8.9 > Identities = 0/19 (0%) > Strand = Plus / Minus > > >Query: 14026 gccagccagccagccagcc 14044 > >Sbjct: 1265 cagccagccagccagccag 1247 > > > > Score = 38.2 bits (19), Expect = 8.9 > Identities = 0/19 (0%) > Strand = Plus / Plus > > >Query: 46274 ggctggctggctggctggc 46292 > >Sbjct: 1247 ctggctggctggctggctg 1265 > > >************************************ >Does anyone know what is happening here? >-- >Samir > >_______________________________________________ >Bioclusters maillist - Bioclusters@bioinformatics.org >https://bioinformatics.org/mailman/listinfo/bioclusters > > -- Joseph Landman, Ph.D Scalable Informatics LLC, email: landman@scalableinformatics.com web : http://scalableinformatics.com phone: +1 734 612 4615