[Bioclusters] ncbi blast

Aaron Darling bioclusters@bioinformatics.org
Fri, 18 Jun 2004 15:26:54 -0500 (CDT)


The fact that blastall is crashing in different places points towards
faulty hardware.  You may be experiencing overheating in your machines,
especially if problems arise only during CPU intensive tasks.
I had an AMD K6-2 box that ran fine most of the time but couldn't finish a
kernel compile because the CPU would heat up.  The gcc process
would segfaults and I had to wait 10 minutes for it to cool off before
restarting the build.
memtest86 is an excellent tool for diagnosing memory problems, but may not
stress your machines CPUs enough to induce the overheating.  Even if it
does find problems, make sure your memory sticks are truly faulty rather
than faulty only when running at 75C...

Good luck
-Aaron

On Fri, 18 Jun 2004, Steve O wrote:

> Hi,
> I was running into similar problems with dual athlons.
> Run memtest86 overnight and likely you'll have test 5 fail.
>    http://www.memtest86.com/#trouble
> Removing some sticks of ram fixed my errors.
> -steve
>
> Chris Dwan wrote:
>
> >
> > Justin,
> >
> > I've poked around a bit, and run your queries on a variety of machines
> > (P-III and Athalon...as well as a few others) which I have sitting
> > around the shop here.  I was unable to replicate your observed behavior.
> >
> > The main difference between my test machines and yours is that I only
> > have 2GB of RAM.  That, and your observed intermittent behavior makes me
> > suspect some sort of evil high-memory behavior is behind your core
> > dump.  Because you're able to replicate it on various machines, it's
> > most likely not a bad memory chip.  There are a lot of kernel / extended
> > memory gurus on this list.  I'd love to hear their thoughts.
> >
> > You are correct that thousands of us run millions of BLASTs vs. NCBI
> > formatted NT every day...however there have been several instances where
> > a particular hardware / query combination is rare enough that only a few
> > people ever see it.  What's more fun (to my mind) is the ratio of people
> > whose results are erroneous to those who notice the problem.  Depending
> > on your parser, job failed to run ~= no hits found.
> >
> > Good luck with this.
> >
> > -Chris Dwan
> >
> > On Jun 16, 2004, at 10:46 AM, Justin Powell wrote:
> >
> >>
> >> Hi Chris
> >>
> >> A short query which goes wrong is
> >>
> >> actacgactagcatcagctacgctagatgactacgatcagctacgactagcatcgactacg
> >>
> >> I just have this in a text file on its own with no name line. The nt
> >> database I'm using is from the ncbi ftp site blast/db directory and the
> >> unzipped database files have the date June 11 2004.
> >>
> >> I've found the intermittency varies. Sometimes it seems it can be
> >> provoked
> >> by running a blast against est first, and sometimes it seems to work
> >> correctly time after time.
> >>
> >> A second longer sequence I've had go wrong is
> >>
> >> TCCCCCGAATTTAAACGCGTTGAAAGGGTCATCCTTACTAGAAAAGAGAGTTG
> >> ATTCTCTCCGACAGCTTAACACTACCACGGTTAACCAGCTGCTGGGGTTGCCGGGGATGACCTCTACATT
> >> CACGGCTCCGCAACTGTTGCAGTTAAGAATAATAGCTATAACTGCGTCTGCCGTGTCCCTTATTGCCGGT
> >> TGCCTCGGAATGTTCTTCCTTTCTAAAATGGATAAGAGACGAAAAGTCTTCAGACATGATCTCATCGCAT
> >> TTTTGATAATTTGCGACTTTCTTAAAGCTTTTATTCTGATGATTTATCCCATGATTATCCTTATTAATAA
> >> TAGTGTGTATGCAACACCTGCATTTTTTAATACCTTGGGTTGGTTTACGGCCTTTGCCATCGAAGGTGCA
> >> GACATGGCCATAATGATATTCGCCATACATTTTGCTATTTTGATCTTCAAGCCTAATTGGAAATGGCGAA
> >> ATAAAAGATCGGGAAATATGGAGGGTGGCTTGTACAAAAAAAGGTCATATATCTGGCCAATTACTGCATT
> >> AGTACCTGCCATTTTAGCAAGCTTAGCCTTCATTAATTATAATAAACTCAATGACGATTCTGACACCACT
> >> ATTATACTGGATAATAATAACTACAACTTTCCCGATTCTCCCAGGCAAGGTGGCTACAAACCTTGGAGTG
> >> CATGGTGCTATTTACCACCCAAGCCGTACTGGTATAAAATTGTTTTAAGCTGGGGTCCCAGATATTTCAT
> >> TATTATTTTCATATTTGCAGTCTACCTCAGTATTTATATTTTCATTACCAGTGAAAGTAAAAGAATTAAA
> >> GCGCAAATTGGAGACTTTAACC
> >>
> >>
> >> I've tried recompiling with the -g flag on (and the -O3 flag off) and run
> >> gdb on the coredump. However I'm not a c programmer (though I did once
> >> read a book on it) and am not at all familiar with either C, gdb or even
> >> the details of the call stack, so I'm not sure I've done all this
> >> correctly. An example backtrace is like this, though others I've had
> >> looked different:
> >>
> >> [root@prada bin]# gdb blastall core.9520
> >> GNU gdb Red Hat Linux (5.3post-0.20021129.18rh)
> >> Copyright 2003 Free Software Foundation, Inc.
> >> GDB is free software, covered by the GNU General Public License, and you
> >> are
> >> welcome to change it and/or distribute copies of it under certain
> >> conditions.
> >> Type "show copying" to see the conditions.
> >> There is absolutely no warranty for GDB.  Type "show warranty" for
> >> details.
> >> This GDB was configured as "i386-redhat-linux-gnu"...
> >> Core was generated by `./blastall -p blastn -a 2 -d /usr/blasttest/nt -i
> >> /usr/blasttest/tempdna'.
> >> Program terminated with signal 11, Segmentation fault.
> >> Reading symbols from /lib/tls/libm.so.6...done.
> >> Loaded symbols for /lib/tls/libm.so.6
> >> Reading symbols from /lib/tls/libpthread.so.0...done.
> >> Loaded symbols for /lib/tls/libpthread.so.0
> >> Reading symbols from /lib/tls/libc.so.6...done.
> >> Loaded symbols for /lib/tls/libc.so.6
> >> Reading symbols from /lib/ld-linux.so.2...done.
> >> Loaded symbols for /lib/ld-linux.so.2
> >> Reading symbols from /lib/libnss_files.so.2...done.
> >> Loaded symbols for /lib/libnss_files.so.2
> >> #0  0x0805ea52 in BlastNtWordFinder (search=0x84363e8, lookup=0x842e6b8)
> >>     at blast.c:9265
> >> 9265             next_lindex = (((lookup_index) &
> >> mask)<<char_size) + *(s+1);
> >> (gdb) backtrace
> >> #0  0x0805ea52 in BlastNtWordFinder (search=0x84363e8, lookup=0x842e6b8)
> >>     at blast.c:9265
> >> #1  0x0805a473 in BlastWordFinder (search=0x84363e8) at blast.c:6847
> >> #2  0x0805a336 in BlastExtendWordSearch (search=0x84363e8,
> >>     multiple_hits=0 '\0') at blast.c:6803
> >> #3  0x08059d7c in BLASTPerformFinalSearch (search=0x84363e8,
> >>     subject_length=117793,
> >>     subject_seq=0x7e12b129 <Address 0x7e12b129 out of bounds>) at
> >> blast.c:6612
> >> #4  0x080596c8 in BLASTPerformSearch (search=0x84363e8,
> >> subject_length=117793,
> >>     subject_seq=0x7e12b129 <Address 0x7e12b129 out of bounds>) at
> >> blast.c:6365
> >> #5  0x0805967b in BLASTPerformSearchWithReadDb (search=0x84363e8,
> >>     sequence_number=1629625) at blast.c:6344
> >> #6  0x0805066f in do_blast_search (ptr=0x84363e8) at blast.c:3335
> >> #7  0x0804d600 in NlmThreadWrapper (wrapper_arg=0x8439c80) at
> >> ncbithr.c:647
> >> #8  0x400522b6 in start_thread () from /lib/tls/libpthread.so.0
> >> (gdb) quit
> >>
> >>
> >> Hopefully this is some use.
> >>
> >> Justin
> >>
> >>
> >> On Wed, 16 Jun 2004, Chris Dwan wrote:
> >>
> >>>
> >>> Please forward an example of the error-provoking query sequence.  I'm
> >>> curious to see if I can replicate this behavior.
> >>>
> >>> -Chris Dwan
> >>>   University of Minnesota
> >>>
> >>>> I'm experiencing trouble with blastall 2.2.9 running blastn on a linux
> >>>> cluster against a recently downloaded version of the 'nt' database from
> >>>> ncbi.  Intermittently I get a segmentation fault partway through the
> >>>> search.
> >>>>
> >>>> This happens both with precompiled blast and blast I compile myself. It
> >>>> happens on a two dual xeon systems running redhat9.0 and a dual athlon
> >>>> system running redhat7.1.  Both systems have 4GB ram. It happens with
> >>>> several different query sequences, but never with the est nucleotide
> >>>> database. It also happens if I use fastacmd to dump the ncbi nt
> >>>> database
> >>>> into fasta format and then formatdb it myself. Blastdbs are kept
> >>>> locally
> >>>> so its not a networking issue.
> >>>>
> >>>> Strangely this also happens with blastall2.2.6 on the athlon system,
> >>>> I've
> >>>> not tested it on the xeon systems (or other releases).
> >>>>
> >>>> So I would guess, given the variety of systems, that its a bug which nt
> >>>> provokes specifically - but then I assume huge numbers of people must
> >>>> use
> >>>> blast to search nt on linux boxes and would have noticed already if
> >>>> this
> >>>> were the case. Anyone have any ideas what might be going on?
> >>>>
> >>>>
> >>>> Justin
> >>>> jacp1@mole.bio.cam.ac.uk
> >>>>
> >>>> _______________________________________________
> >>>> Bioclusters maillist  -  Bioclusters@bioinformatics.org
> >>>> https://bioinformatics.org/mailman/listinfo/bioclusters
> >>>>
> >>>
> >>> _______________________________________________
> >>> Bioclusters maillist  -  Bioclusters@bioinformatics.org
> >>> https://bioinformatics.org/mailman/listinfo/bioclusters
> >>>
> >>
> >
> > _______________________________________________
> > Bioclusters maillist  -  Bioclusters@bioinformatics.org
> > https://bioinformatics.org/mailman/listinfo/bioclusters
> >
>
> _______________________________________________
> Bioclusters maillist  -  Bioclusters@bioinformatics.org
> https://bioinformatics.org/mailman/listinfo/bioclusters
>