All: Joe Landman from Scalable Informatics, Lawrence Hannon from IBM, and I have been working on issues running blast on the AMD opteron platform. I've summarized my results (with much help from Joe and Lawrence) in validating the blastall and formatdb code. There are quirks with the latest versions of the NCBI toolkit, producing corrupt blast results in some situations. They only appear with some (large) databases but we are not sure what exactly causes this behavior at the present time. We have tentative workarounds, listed below. Thanks to everyone who has helped me over the past few weeks - the bottom line is that *none* of the problems I have seen over the past weeks could actually be traced to problems with Opteron hardware (other than a RAM chip) or Linux OS. This is great news for Opteron. SUMMARY Builds of formatdb and blastall from the NCBI Toolkit version 2.2.6 can produce corrupted output when used with some formatdb parameters in all builds so far tested on the AMD Opteron 64 bit platform. Symptoms include failure to produce a correctly named .nal or .pal file when databases are split up into volumes. Pointer errors produce incorrect results and alignments with some large databases. NCBI Toolkit 2.2.1 does not show this behavior. Some of these errors have been reproduced by us on SGI MIPS IRIX platforms with SGI compilers, suggesting that the errors are neither Opteron nor compiler specific. Current workarounds are to: 1. explicitly name the formatdb output database with the -n option 2. use the '-o T' option in formatdb to alter the way blast indices are created. Alternatively: 3. Use the 2.2.1 version of the blastall tools. _______________________________________ TESTS Machine, OS, libs: 2 CPU AMD Opteron (Penguin), 6G RAM, SUSE Linux 8, 2.4.19 SMP Linux Kernel. Current configuration: opt:/gcgblast # gcc -v Reading specs from /usr/lib64/gcc-lib/x86_64-suse-linux/3.2.2/specs Configured with: ../configure --enable-threads=posix --prefix=/usr --with-local-prefix=/usr/local --infodir=/usr/share/info --mandir=/usr/share/man --libdir=/usr/lib64 --enable-languages=c,c++,f77,objc,java,ada --enable-libgcj --with-gxx-include-dir=/usr/include/g++ --with-slibdir=/lib --with-system-zlib --enable-shared --enable-__cxa_atexit x86_64-suse-linux Thread model: posix gcc version 3.2.2 (SuSE Linux) (gcc-3.2.2-26.x86_64.rpm) (glibc-2.2.5-184.x86_64.rpm) ldd /usr/local/bin/blastall: libm.so.6 => /lib64/libm.so.6 (0x0000002a9566d000) libpthread.so.0 => /lib64/libpthread.so.0 (0x0000002a957c6000) libc.so.6 => /lib64/libc.so.6 (0x0000002a958e2000) /lib64/ld-linux-x86-64.so.2 => /lib64/ld-linux-x86-64.so.2 (0x0000002a95556000) _______________________________________ Databases: ncbi: Human genome scaffold broken into 100KB pieces, 50KB overlap ( 5.9G ) sncbi: same as above but long sequence names converted to shorter form (some names were very long and I wanted to make sure this was not an name indexing problem) htg: 20 August download of NCBI htg sequence file (11G uncompressed) _______________________________________ Formatdb options: o: using '-o T' option for indexing no_o: no -o option Other formatdb options used: '-p F -n <name> -i <fasta_file>' _______________________________________ blastall options: '-p tblastn -v 3 -b 3 -a 2 -d <db> -i <input_file>' _______________________________________ Input file: 12 protein sequences from fly refseq: >BMSPROT:NP_478140 >BMSPROT:NP_523807 >BMSPROT:NP_609725 >BMSPROT:NP_524716 >BMSPROT:NP_524665 >BMSPROT:NP_524468 >BMSPROT:NP_523392 >BMSPROT:NP_572997 >BMSPROT:NP_524671 >BMSPROT:NP_608480 >BMSPROT:NP_524763 >BMSPROT:NP_524817 (I've checked, the 'BMSPROT:' prefix doesn't seem to affect the analysis). _______________________________________ R E S U L T S ____________________________________________________________________ NCBI Toolkit ncbi-o ncbi-no_o sncbi_o sncbi-no_o htg-o htg-no_o 2.2.1 pass pass pass pass pass pass 2.2.6 pass FAIL* pass FAIL* pass pass ____________________________________________________________________ * - FAIL symptoms include error messages: '[blastall] ERROR: ncbiapi [000.000] BMSPROT:NP_478140: ObjMgrChoice: pointer [0] type [1] not found', missing names for sequence names of db hits in BLAST summary and sporadic nonsense alignments. CONFIGURATION IBM,Siemers Opteron linux.ncbi.mk directives for 2.2.6 (April 2003), SUSE 8.1 opteron Linux NCBI_DEFAULT_LCL = lnx NCBI_MAKE_SHELL = /bin/sh NCBI_CC = gcc -pipe -D_FILE_OFFSET_BITS=64 -D_LARGEFILE64_SOURCE -O3 -DOS_UNIX_PPCLINUX -I../include -I/usr/X11R6/include -L/usr/X11R6/lib64 -DWIN_MOTIF # should probably be /usr/X11R6/lib64 above on SUSE 8.1 NCBI_CFLAGS1 = -c NCBI_LDFLAGS1 = NCBI_OPTFLAG = Opteron linux.ncbi.mk directives for 2.2.1 NCBI Toolkit: NCBI_DEFAULT_LCL = lnx NCBI_MAKE_SHELL = /bin/sh NCBI_CC = gcc -pipe -D__USE_FILE_OFFSET64 -D__USE_LARGEFILE64 NCBI_CFLAGS1 = -c -DOS_UNIX_PPCLINUX NCBI_LDFLAGS1 = -O2 NCBI_OPTFLAG = -O2