Hello Tim, There are exact examples of the abberant blast output in my 16 august post to the list. Yes, I'll definitely submit a bug report to the NCBI. To our knowledge, the -A F option obviates the need for the other workarounds. Someone told me that there was a flaming thread about this on the emboss lists (the bug breaking lots of stuff in emboss), but I haven't checked that list. Take care, Nathan Tim Harsch wrote: > I would like to get some clarification from you because I have some > processes that do not use the -A F parameter, but do not use the ASN.1 > deflines. I'm worried this issue may be causing problems I'm not yet aware > of. Can you summarize what the exact symptoms are and include > blast-help@ncbi.nlm.nih.gov in your reply so that they might have a chance > to fix the problem in future releases. > > Also, setting -A F, obviates the need for the workarounds you talked about > right? > > ----- Original Message ----- > From: "Nathan O. Siemers" <Nathan.Siemers@bms.com> > To: <bioclusters@bioinformatics.org> > Sent: Wednesday, August 27, 2003 5:48 AM > Subject: SOLVED Re: [Bioclusters] Opteron Perl64 segfault issues > > > >> >>Sorry for the Opteron spam, but I hope this will help folks doing this >>in the future ;) >> >>We now believe that the abberant behavior in NCBI blast in some >>configurations can be completely traced to a single character change in >>the source code... >> >>In recent releases of the ncbi toolkit, the formatdb options to create >>ASN.1 structured deflines (-A) has been turned on by default, a >>divergence from previous behavior. Unpredictable (and wrong!) things >>happen when sequences are input to formatdb that do not follow the >>arcane NCBI fasta naming terminology (foo|bar|etc|blah) when this option >>is selected. In our case, we were using very simple naming conventions: >> >> >name1 >> >name2 >> >name3 >> >>(ncbi would have demanded something like >lcl|name1 ) >> >> >>etc. This is not compatible with the new default behavior of formatdb. >> >>Solution: if you do not follow the NCBI fasta naming structure exactly, >>use the -A F option of formatdb and/or change the default in formatdb.c. >> >>NCBI toolkit versions somewhere after 2.2.1 have this problem. >> >>Classic NCBI. >> >>Nathan >> >> >> >> >> >> >> >> >>Nathan O. Siemers wrote: >> >>>All: >>> >>> Joe Landman from Scalable Informatics, Lawrence Hannon from IBM, and >>>I have been working on issues running blast on the AMD opteron platform. >>>I've summarized my results (with much help from Joe and Lawrence) in >>>validating the blastall and formatdb code. There are quirks with the >>>latest versions of the NCBI toolkit, producing corrupt blast results in >>>some situations. They only appear with some (large) databases but we >>>are not sure what exactly causes this behavior at the present time. We >>>have tentative workarounds, listed below. >>> >>> >>>Thanks to everyone who has helped me over the past few weeks - the >>>bottom line is that *none* of the problems I have seen over the past >>>weeks could actually be traced to problems with Opteron hardware (other >>>than a RAM chip) or Linux OS. This is great news for Opteron. >>> >>> >>> >>>SUMMARY >>> >>>Builds of formatdb and blastall from the NCBI Toolkit version 2.2.6 >>>can produce corrupted output when used with some formatdb parameters >>>in all builds so far tested on the AMD Opteron 64 bit platform. >>>Symptoms include failure to produce a correctly named .nal or .pal >>>file when databases are split up into volumes. Pointer errors produce >>>incorrect results and alignments with some large databases. NCBI >>>Toolkit 2.2.1 does not show this behavior. Some of these errors have >>>been reproduced by us on SGI MIPS IRIX platforms with SGI compilers, >>>suggesting that the errors are neither Opteron nor compiler specific. >>> >>> >>> >>> >>> >>>Current workarounds are to: >>> >>> 1. explicitly name the formatdb output database with the -n option >>> >>> 2. use the '-o T' option in formatdb to alter the way blast indices >>> are created. >>> >>> Alternatively: >>> >>> 3. Use the 2.2.1 version of the blastall tools. >>> >>> >>> >>> >>> >>>_______________________________________ >>> >>>TESTS >>> >>>Machine, OS, libs: >>> >>>2 CPU AMD Opteron (Penguin), 6G RAM, SUSE Linux 8, 2.4.19 SMP Linux >>>Kernel. >>> >>>Current configuration: >>> >>>opt:/gcgblast # gcc -v >>>Reading specs from /usr/lib64/gcc-lib/x86_64-suse-linux/3.2.2/specs >>>Configured with: ../configure --enable-threads=posix --prefix=/usr >>>--with-local-prefix=/usr/local --infodir=/usr/share/info >>>--mandir=/usr/share/man --libdir=/usr/lib64 >>>--enable-languages=c,c++,f77,objc,java,ada --enable-libgcj >>>--with-gxx-include-dir=/usr/include/g++ --with-slibdir=/lib >>>--with-system-zlib --enable-shared --enable-__cxa_atexit > > x86_64-suse-linux > >>>Thread model: posix >>>gcc version 3.2.2 (SuSE Linux) >>> >>>(gcc-3.2.2-26.x86_64.rpm) >>>(glibc-2.2.5-184.x86_64.rpm) >>> >>>ldd /usr/local/bin/blastall: >>> >>> libm.so.6 => /lib64/libm.so.6 (0x0000002a9566d000) >>> libpthread.so.0 => /lib64/libpthread.so.0 (0x0000002a957c6000) >>> libc.so.6 => /lib64/libc.so.6 (0x0000002a958e2000) >>> /lib64/ld-linux-x86-64.so.2 => /lib64/ld-linux-x86-64.so.2 >>>(0x0000002a95556000) >>> >>> >>>_______________________________________ >>> >>> >>>Databases: >>> >>>ncbi: Human genome scaffold broken into 100KB pieces, 50KB overlap ( >>>5.9G ) >>> >>>sncbi: same as above but long sequence names converted to shorter form >>>(some names were very long and I wanted to make sure this was not an >>>name indexing problem) >>> >>>htg: 20 August download of NCBI htg sequence file (11G uncompressed) >>> >>>_______________________________________ >>> >>>Formatdb options: >>> >>>o: using '-o T' option for indexing >>> >>>no_o: no -o option >>> >>>Other formatdb options used: '-p F -n <name> -i <fasta_file>' >>> >>>_______________________________________ >>> >>>blastall options: '-p tblastn -v 3 -b 3 -a 2 -d <db> -i <input_file>' >>> >>>_______________________________________ >>> >>>Input file: 12 protein sequences from fly refseq: >>> >BMSPROT:NP_478140 >>> >BMSPROT:NP_523807 >>> >BMSPROT:NP_609725 >>> >BMSPROT:NP_524716 >>> >BMSPROT:NP_524665 >>> >BMSPROT:NP_524468 >>> >BMSPROT:NP_523392 >>> >BMSPROT:NP_572997 >>> >BMSPROT:NP_524671 >>> >BMSPROT:NP_608480 >>> >BMSPROT:NP_524763 >>> >BMSPROT:NP_524817 >>> >>>(I've checked, the 'BMSPROT:' prefix doesn't seem to affect the > > analysis). > >>>_______________________________________ >>> >>>R E S U L T S >>>____________________________________________________________________ >>> >>>NCBI Toolkit ncbi-o ncbi-no_o sncbi_o sncbi-no_o htg-o htg-no_o >>> >>>2.2.1 pass pass pass pass pass pass >>> >>>2.2.6 pass FAIL* pass FAIL* pass pass >>> >>>____________________________________________________________________ >>> >>> >>>* - FAIL symptoms include error messages: '[blastall] ERROR: ncbiapi >>>[000.000] >>>BMSPROT:NP_478140: ObjMgrChoice: pointer [0] type [1] not found', >>>missing names for >>>sequence names of db hits in BLAST summary and sporadic nonsense >>>alignments. >>> >>>CONFIGURATION >>> >>>IBM,Siemers Opteron linux.ncbi.mk directives for 2.2.6 (April 2003), >>>SUSE 8.1 opteron >>>Linux >>> >>>NCBI_DEFAULT_LCL = lnx >>>NCBI_MAKE_SHELL = /bin/sh >>>NCBI_CC = gcc -pipe -D_FILE_OFFSET_BITS=64 -D_LARGEFILE64_SOURCE -O3 >>>-DOS_UNIX_PPCLINUX -I../include -I/usr/X11R6/include -L/usr/X11R6/lib64 >>>-DWIN_MOTIF >>># should probably be /usr/X11R6/lib64 above on SUSE 8.1 >>>NCBI_CFLAGS1 = -c >>>NCBI_LDFLAGS1 = >>>NCBI_OPTFLAG = >>> >>>Opteron linux.ncbi.mk directives for 2.2.1 NCBI Toolkit: >>> >>> >>>NCBI_DEFAULT_LCL = lnx >>>NCBI_MAKE_SHELL = /bin/sh >>>NCBI_CC = gcc -pipe -D__USE_FILE_OFFSET64 -D__USE_LARGEFILE64 >>>NCBI_CFLAGS1 = -c -DOS_UNIX_PPCLINUX >>>NCBI_LDFLAGS1 = -O2 >>>NCBI_OPTFLAG = -O2 >>> >>>_______________________________________________ >>>Bioclusters maillist - Bioclusters@bioinformatics.org >>>https://bioinformatics.org/mailman/listinfo/bioclusters >> >>-- >>Nathan Siemers|Associate Director|Applied Genomics|Bristol-Myers Squibb >>Pharmaceutical Research >>Institute|HW3-0.07|P.O. Box 5400|Princeton, NJ >>08543-5400|(609)818-6568|nathan.siemers@bms.com >> >>_______________________________________________ >>Bioclusters maillist - Bioclusters@bioinformatics.org >>https://bioinformatics.org/mailman/listinfo/bioclusters > > > _______________________________________________ > Bioclusters maillist - Bioclusters@bioinformatics.org > https://bioinformatics.org/mailman/listinfo/bioclusters -- Nathan Siemers|Associate Director|Applied Genomics|Bristol-Myers Squibb Pharmaceutical Research Institute|HW3-0.07|P.O. Box 5400|Princeton, NJ 08543-5400|(609)818-6568|nathan.siemers@bms.com