[Bioclusters] mpiBLAST configuration issues

29 Mar 2004 13:53:37 +0100

Hi Lucas,

do you mean I should treat the cluster as though it only consists of 6
nodes (because we have the three dual processor nodes reserved for short
jobs)? But that would not make use of the other nodes when they are
free?

If PBS treats my multi-sequence query file as a single job, this means
that on our cluster the job will go to a single node as soon as there is
one available and then the job will run there. What happens if this
really is the only node available? Will mpiBLAST then run everything on
this single node sequentially?

cheers

Micha

On Mon, 2004-03-29 at 13:37, Lucas Carey wrote:
> On Monday, March 29, 2004 at 12:58 +0100, Micha Bayer wrote:
> > Hi,
> > We have three nodes reserved for jobs of less than one hour's wall time.
> > I am part of the bio group and we have a share of 20% of the total
> > compute time on this cluster. Jobs get submitted and queued via the
> > OpenPBS batch system. The queue priority is worked out by a formula
> > which among other things takes into account recent usage (if you had
> > lots of jobs recently you get penalised) and job size (if your job is
> > small it gets a higher priority).
> > 
> > Questions:
> > 
> > 1. How many database fragments should I generate?
> You should generate 5 fragments, and always run with '-np 6'. If you want instead to run with a variable number of CPUs (<= 6) creating 15 fragments should give you the ability to do so with good load-balancing. There is a small performance hit moving from 5->15 fragments, but 15 could be faster depending on both the database and queries. 
> > 
> > 2. How will the spasmodic traffic on the cluster affect the performance
> > of mpiBLAST? 
> Once the fragments are distributed to the nodes it shouldn't matter at all. If you keep running queries against the same database(s) and the fragments remain on local storage on those 3 nodes, mpiBLAST does very little communication.
> > 
> > 3. How are jobs partitioned for queuing with PBS (given an input file
> > with one sequence and a different scenario where the input file contains
> > multiple query sequences)?
> One 'run' of mpiBLAST will process an entire query file with multiple individual queries. PBS views this as a single job, no matter how many individual queries the file contains.
> > 
> > 4. When I issue the mpirun command and I specify the number of nodes to
> > be used, what does that do? Will this actually work on a cluster like
> > this where I don't have any control over the scheduling process?
> In the documentation a node refers to a CPU.  As far as both mpiBLAST and PBS are concerned, your cluster has 6 nodes reserved for short jobs.
> 
> -Lucas
> _______________________________________________
> Bioclusters maillist  -  Bioclusters@bioinformatics.org
> https://bioinformatics.org/mailman/listinfo/bioclusters