[Bioclusters] ncbi blast

Wed, 23 Jun 2004 15:53:36 +0100

Joe,

Regarding SMP, Redhat9 system uname -a gives
Linux prada.local 2.4.20-8bigmem #1 SMP Thu Mar 13 17:32:29 EST 2003 i686
i686 i386 GNU/Linux

Redhat7.1 system gives
Linux versace 2.4.2-2smp #1 SMP Sun Apr 8 20:21:34 EDT 2001 i686 unknown

I've just noticed the RH9 dual xeon system actually has 4.25 G Ram, not 4
as previously advised: the relevant output from dmesg is: Memory:
4251160k/4456448k available (1485k kernel code, 69544k reserved, 1094k
data, 156k init, 3407808k highmem)

I may pull out the extra 256 and redo the test tomorrow if I get time.

Curiously the rh7.1 dual athlon system, whilst it as 4gb Ram installed
physically, reported the following:

Memory: 3669292k/3735040k available (1500k kernel code, 65296k reserved,
103k data, 252k init, 2817472k highmem)

which reminded me that there was some funny issue with the motherboards we
used on the athlons (Tyan S2468) which means they reserve a portion of Ram
for the PCI devices. According to the tyan website the system will see
between 3.5 and 3.8G depending on devices installed.

So actually I guess both systems have slightly unusual memory
configurations.

[root@prada log]# ldd /usr/local/bin/blastall
	libm.so.6 => /lib/tls/libm.so.6 (0x4002c000)
	libpthread.so.0 => /lib/tls/libpthread.so.0 (0x4004e000)
	libc.so.6 => /lib/tls/libc.so.6 (0x42000000)
	/lib/ld-linux.so.2 => /lib/ld-linux.so.2 (0x40000000

[root@versace /root]# ldd /usr/local/bin/blastall
	libm.so.6 => /lib/i686/libm.so.6 (0x40027000)
	libpthread.so.0 => /lib/i686/libpthread.so.0 (0x4004b000)
	libc.so.6 => /lib/i686/libc.so.6 (0x40060000)
	/lib/ld-linux.so.2 => /lib/ld-linux.so.2 (0x40000000)

The query file is:

>abc123|my random label
actacgactagcatcagctacgctagatgactacgatcagctacgactagcatcgactacg

or if you want to check it for funny characters:

[root@prada blasttest]# xxd tempdna
0000000: 3e61 6263 3132 337c 6d79 2072 616e 646f  >abc123|my rando
0000010: 6d20 6c61 6265 6c0a 6163 7461 6367 6163  m label.actacgac
0000020: 7461 6763 6174 6361 6763 7461 6367 6374  tagcatcagctacgct
0000030: 6167 6174 6761 6374 6163 6761 7463 6167  agatgactacgatcag
0000040: 6374 6163 6761 6374 6167 6361 7463 6761  ctacgactagcatcga
0000050: 6374 6163 670a                           ctacg.

but honestly I don't think the query makes much difference - I first
noticed the problem with users not getting their results back from the
cluster for various queries.

The nt database I am currently using I downloaded from
ftp.ncbi.nlm.nih.gov/blast/db/nt*

Its the June15 version when unpacked, but the one before went wrong too,
and I think I started seeing this about 1 month ago, without tying it
down.

I issue

/blastall -p blastn -a 2 -d /usr/blasttest/nt -i /usr/blasttest/tempdna

I'll try the strace shortly.

Justin

On Wed, 23 Jun 2004, Joe Landman wrote:

> On Wed, 2004-06-23 at 10:02, Justin Powell wrote:
> > Hi Joe,
> >
> > Thanks for the info.  I've tested with the -a 1, it does indeed only go
> > wrong with -a 2, so I've kludged it for the time being.  However as to
>
> Interesting.  This does implicate the threading somehow.  The "-a N"
> invokes the pthread library paths.
>
> > your theory about RedHat9 NPTL being involved, I also get exactly the same
> > behaviour on a RedHat7.1 system running ncbi blast 2.2.6. (i.e. goes wrong
> > on nt database but not est database, and only if -a 2, not if -a 1).
>
> These are SMP systems I presume.
>
> > So I guess if the -a switch changes things its not likely to be bad ram?
>
> Well, I would think that it makes that possibility more remote.  It is a
> good idea to beat on the systems overnight or over a weekend with
> Memtest86 3.1 just to be sure (not a guarantee, but a good filter for
> failing stuff).
>
> Is this behavior seen with other compiled binary versions of the the
> libraries?  If you could wrap the blastall execution with a
>
> 	strace -f -o trace.blastall ...
>
> where ... is the blastall -a 2 [ ] command you have which fails.  The
> trace.blastall will be pretty large.  Don't post it here, try
> compressing it and mailing, or let me know and I can enable a one off
> ftp.
>
> The idea is that with -a 2 the program tries to use the threading
> library.  If it is dying, it should return an error message in the
> threading calls, which we are not seeing.  You might also wish to make
> sure the code sees all the libraries it needs by doing a
>
> 	ldd /path/to/blastall
>
> and making sure that none of the libraries say "not found".  That output
> would be interesting to see here.
>
>
> >
> > In reply to your other questions, the output from swapon -s is
> >
> > Filename			Type		Size	Used	Priority
> > /dev/sda2                       partition	1807304	15036	-1
>
> Ok, this is good.
>
> >
> > for the rh7.1 system
> >
> > Filename			Type		Size	Used	Priority
> > /dev/sda3                       partition	1020116	10496	-1
> >
> > for the rh9 system.
> >
>
> Interesting.  You have enough swap.  It seems to be unlikely to be a VM
> issue.
>
> > Adding a name line to the query makes no difference.
> >
> > Neither system is overclocked. I've not run the memory checker yet, but I
> > have two identical Redhat9 boxes and they both do it. So that makes 3
> > systems, and I can test a 4th shortly too.
> >
> > I've not had time to run the graphical debugger - I'm pretty snowed under
> > till Monday.
>
> Ok.  Have you isolated it to a single sequence and a single db?  This
> would let some of us try it.
>
> Joe