[Bioclusters] ncbi blast

Joe Landman bioclusters@bioinformatics.org
Wed, 23 Jun 2004 12:22:33 -0400


On Wed, 23 Jun 2004 10:56:58 -0400, Susan Chacko wrote
> I'm seeing exactly this same problem on our systems. It does not 

Hi Susan and Justin:

  Ok.  Which binary are you using?  Did you build it yourself?  Is it one of the
precompiled ones (including the Scalable Informatics LLC version)?  Would you
mind sharing it?  

> appear to be related to hardware, as I see it on all of the following: 
> - 2.8 GHz Xeon, 2 Gb RAM - 2.8 GHz Xeon, 4 Gb RAM - 1.8 GHz Athlon, 2 
> Gb RAM - 1.4 GHz Athlon, 2 Gb RAM - 866 MHz Pentium III, 1 Gb RAM

This strongly suggests software, either application or OS.

> 
> They're all running RH 7.1 with updated kernels. I'm using the NCBI 
> blastall and the NCBI nt db. There aren't any missing libraries, 
> according to ldd.
> 
> Our databases sit on a Netapp 960 Filer, and at first I thought that 
> was the problem, but I still see the failures when I copy the db to 
> local scratch on the nodes.

Ok.

What happens if you boot one of the units with a "mem=1024M" option, which
forces Linux to use only 1 GB ram?  There are some oddities that happen at the 4
GB region (really 3.8 GB or so depending upon which kernel you are using).

> The problem appears _only_ with the combination of the nt database and 
> the '-a 2' flag on Blast. It is random, in that I get somewhere 
> between 5 and 20 failures out of 20 Blast runs with the same db+query. 
> I get no failures with other dbs (e.g. est) or if I don't use the -a flag.

Ok, worth a test on my systems as well.  If I build a static binary of blastall,
would you be willing to give it a try?  RH7.1 means probably i686 optimizations
at best.  It also means that the 2.96 GCC was probably used.  This compiler had
some problems generating good (and in some instances, correct) code.

> I've also run Blast with the same db+query on an SGI Origin 3400 with 
> no failures, using -a 2, -a 3, -a 4.

I presume this is with the native MIPSpro compilers (very good compilers BTW)
and not gcc?

> 
> I've emailed NCBI about this and am waiting for a response.

Ok, please let me know if you would like help.  Thanks.

> 
> Susan.
> 
> On Jun 16, 2004, at 11:21 AM, Justin Powell wrote:
> 
> > I'm experiencing trouble with blastall 2.2.9 running blastn on a linux
> > cluster against a recently downloaded version of the 'nt' database from
> > ncbi.  Intermittently I get a segmentation fault partway through the
> > search.
> >
> > This happens both with precompiled blast and blast I compile myself. It
> > happens on a two dual xeon systems running redhat9.0 and a dual athlon
> > system running redhat7.1.  Both systems have 4GB ram. It happens with
> > several different query sequences, but never with the est nucleotide
> > database. It also happens if I use fastacmd to dump the ncbi nt 
> > database
> > into fasta format and then formatdb it myself. Blastdbs are kept 
> > locally
> > so its not a networking issue.
> >
> > Strangely this also happens with blastall2.2.6 on the athlon system, 
> > I've
> > not tested it on the xeon systems (or other releases).
> >
> > So I would guess, given the variety of systems, that its a bug which nt
> > provokes specifically - but then I assume huge numbers of people must 
> > use
> > blast to search nt on linux boxes and would have noticed already if 
> > this
> > were the case. Anyone have any ideas what might be going on?
> >
> >
> > Justin
> > jacp1@mole.bio.cam.ac.uk
> >
> > _______________________________________________
> > Bioclusters maillist  -  Bioclusters@bioinformatics.org
> > https://bioinformatics.org/mailman/listinfo/bioclusters
> 
> _______________________________________________
> Bioclusters maillist  -  Bioclusters@bioinformatics.org
> https://bioinformatics.org/mailman/listinfo/bioclusters


--
Joseph Landman, Ph.D
Scalable Informatics LLC,
email: landman@scalableinformatics.com
web  : http://scalableinformatics.com
phone: +1 734 612 4615