[Bioclusters] Opteron Perl64 segfault issues

Thu, 21 Aug 2003 16:46:10 -0400

All:

	Joe Landman from Scalable Informatics, Lawrence Hannon from IBM, and I 
have been working on issues running blast on the AMD opteron platform. 
I've summarized my results (with much help from Joe and Lawrence) in 
validating the blastall and formatdb code.  There are quirks with the 
latest versions of the NCBI toolkit, producing corrupt blast results in 
some situations.  They only appear with some (large) databases but we 
are not sure what exactly causes this behavior at the present time.  We 
have tentative workarounds, listed below.

Thanks to everyone who has helped me over the past few weeks - the 
bottom line is that *none* of the problems I have seen over the past 
weeks could actually be traced to problems with Opteron hardware (other 
than a RAM chip) or Linux OS.  This is great news for Opteron.

SUMMARY

Builds of formatdb and blastall from the NCBI Toolkit version 2.2.6
can produce corrupted output when used with some formatdb parameters
in all builds so far tested on the AMD Opteron 64 bit platform.
Symptoms include failure to produce a correctly named .nal or .pal
file when databases are split up into volumes.  Pointer errors produce
incorrect results and alignments with some large databases.  NCBI
Toolkit 2.2.1 does not show this behavior.  Some of these errors have
been reproduced by us on SGI MIPS IRIX platforms with SGI compilers,
suggesting that the errors are neither Opteron nor compiler specific.

Current workarounds are to:

	1.  explicitly name the formatdb output database with the -n option

	2.  use the '-o T' option in formatdb to alter the way blast indices
	    are created.

	Alternatively:

	3.  Use the 2.2.1 version of the blastall tools.

_______________________________________

TESTS

Machine, OS, libs:

2 CPU AMD Opteron (Penguin), 6G RAM, SUSE Linux 8, 2.4.19 SMP Linux
Kernel.

Current configuration:

opt:/gcgblast # gcc -v
Reading specs from /usr/lib64/gcc-lib/x86_64-suse-linux/3.2.2/specs
Configured with: ../configure --enable-threads=posix --prefix=/usr 
--with-local-prefix=/usr/local --infodir=/usr/share/info 
--mandir=/usr/share/man --libdir=/usr/lib64 
--enable-languages=c,c++,f77,objc,java,ada --enable-libgcj 
--with-gxx-include-dir=/usr/include/g++ --with-slibdir=/lib 
--with-system-zlib --enable-shared --enable-__cxa_atexit x86_64-suse-linux
Thread model: posix
gcc version 3.2.2 (SuSE Linux)

(gcc-3.2.2-26.x86_64.rpm)
(glibc-2.2.5-184.x86_64.rpm)

ldd /usr/local/bin/blastall:

         libm.so.6 => /lib64/libm.so.6 (0x0000002a9566d000)
         libpthread.so.0 => /lib64/libpthread.so.0 (0x0000002a957c6000)
         libc.so.6 => /lib64/libc.so.6 (0x0000002a958e2000)
         /lib64/ld-linux-x86-64.so.2 => /lib64/ld-linux-x86-64.so.2 
(0x0000002a95556000)

_______________________________________

Databases:

ncbi:  Human genome scaffold broken into 100KB pieces, 50KB overlap (
5.9G )

sncbi:  same as above but long sequence names converted to shorter form
(some names were very long and I wanted to make sure this was not an
name indexing problem)

htg:  20 August download of NCBI htg sequence file (11G uncompressed)

_______________________________________

Formatdb options:

o:  using '-o T' option for indexing

no_o:	 no -o option

Other formatdb options used:  '-p F -n <name> -i <fasta_file>'

_______________________________________

blastall options:  '-p tblastn -v 3 -b 3 -a 2 -d <db> -i <input_file>'

_______________________________________

Input file:  12 protein sequences from fly refseq:
 >BMSPROT:NP_478140
 >BMSPROT:NP_523807
 >BMSPROT:NP_609725
 >BMSPROT:NP_524716
 >BMSPROT:NP_524665
 >BMSPROT:NP_524468
 >BMSPROT:NP_523392
 >BMSPROT:NP_572997
 >BMSPROT:NP_524671
 >BMSPROT:NP_608480
 >BMSPROT:NP_524763
 >BMSPROT:NP_524817

(I've checked, the 'BMSPROT:' prefix doesn't seem to affect the analysis).
_______________________________________

R E S U L T S
____________________________________________________________________

NCBI Toolkit  ncbi-o  ncbi-no_o  sncbi_o  sncbi-no_o htg-o  htg-no_o

2.2.1         pass    pass       pass      pass      pass   pass

2.2.6         pass    FAIL*      pass      FAIL*     pass   pass

____________________________________________________________________

* - FAIL symptoms include error messages: '[blastall] ERROR: ncbiapi 
[000.000]
BMSPROT:NP_478140: ObjMgrChoice: pointer [0] type [1] not found', 
missing names for
sequence names of db hits in BLAST summary and sporadic nonsense alignments.

CONFIGURATION

IBM,Siemers Opteron linux.ncbi.mk directives for 2.2.6 (April 2003), 
SUSE 8.1 opteron
Linux

NCBI_DEFAULT_LCL = lnx
NCBI_MAKE_SHELL = /bin/sh
NCBI_CC = gcc -pipe -D_FILE_OFFSET_BITS=64 -D_LARGEFILE64_SOURCE -O3 
-DOS_UNIX_PPCLINUX  -I../include -I/usr/X11R6/include -L/usr/X11R6/lib64 
-DWIN_MOTIF
# should probably be /usr/X11R6/lib64 above on SUSE 8.1
NCBI_CFLAGS1 = -c
NCBI_LDFLAGS1 =
NCBI_OPTFLAG =

Opteron linux.ncbi.mk directives for 2.2.1 NCBI Toolkit:

NCBI_DEFAULT_LCL = lnx
NCBI_MAKE_SHELL = /bin/sh
NCBI_CC = gcc -pipe -D__USE_FILE_OFFSET64 -D__USE_LARGEFILE64
NCBI_CFLAGS1 = -c -DOS_UNIX_PPCLINUX
NCBI_LDFLAGS1 = -O2
NCBI_OPTFLAG = -O2