[Bioclusters] questions with mpiBLAST

Jeremy Mann bioclusters@bioinformatics.org
Wed, 7 May 2003 18:37:49 -0500 (CDT)


> How many nodes are you running mpiblast on? Notice that splitting up a
> database
> via mpiformatdb -N X ... produces X + 1 database fragments. If you are
> running
> on fewer than X + 2 nodes (X + 1 workers and 1 master) than at least one
> node
> will have to process more than one database fragments.

The cluster is 20 nodes. The primary database is protein nr, so I have
nr.00 thru nr.19 in shared storage.

> I was under the impression that mpiblast first checks to see if the
> correct
> database fragment is present in the local storage directory. If you run
> mpiblast
> with the --debug flag it should tell what fragments are being staged out
> to the
> worker nodes.

I thought so too, but if you run another sequence, that node's segments
gets copied as well. An example is our node2, I have nr.01, nr.03, nr.08
and nr.11. To me, I thought only that nodes segment would be present,
nr.01, but its not.

> Restrict each node to run only one blast job at a time (via PBS, condor,
> etc). Multiple
> blast jobs running on the same node will compete for memory and cpu
> resources and
> remove the super linear scaling of mpiblast.

We are not at this point yet, mainly in the testing stage. So I am the
testing user being provided samples by another researcher. The problem is,
if I run sequences, I own those copied segments.


-- 
Jeremy Mann
jeremy@biochem.uthscsa.edu

University of Texas Health Science Center
Bioinformatics Core Facility
http://www.bioinformatics.uthscsa.edu
Phone: (210) 567-2672