[Bioclusters] mpiBLAST configuration issues

Micha Bayer bioclusters@bioinformatics.org
29 Mar 2004 12:58:10 +0100


following on from my previous message on BLAST parallelisation I have
managed to persuade our cluster manager to install MPI support for us
and now I can use mpiBLAST after all.

I have a few questions regarding its configuration which I cannot seem
to get any info about elsewhere.

Our Linux cluster is a shared facility of currently 118 cpus on 59 dual
processor machines which gets used heavily by particle physicists
running jobs that often take days or even weeks. Traffic is spasmodic
but most times it is fairly heavy. 

We have three nodes reserved for jobs of less than one hour's wall time.
I am part of the bio group and we have a share of 20% of the total
compute time on this cluster. Jobs get submitted and queued via the
OpenPBS batch system. The queue priority is worked out by a formula
which among other things takes into account recent usage (if you had
lots of jobs recently you get penalised) and job size (if your job is
small it gets a higher priority).


1. How many database fragments should I generate?

2. How will the spasmodic traffic on the cluster affect the performance
of mpiBLAST? 

3. How are jobs partitioned for queuing with PBS (given an input file
with one sequence and a different scenario where the input file contains
multiple query sequences)?

4. When I issue the mpirun command and I specify the number of nodes to
be used, what does that do? Will this actually work on a cluster like
this where I don't have any control over the scheduling process?



Dr Micha M Bayer
Grid Developer, BRIDGES Project
National e-Science Centre, Glasgow Hub
246c Kelvin Building
University of Glasgow
Glasgow G12 8QQ
Scotland, UK
Email: michab@dcs.gla.ac.uk
Project home page: http://www.brc.dcs.gla.ac.uk/projects/bridges/
Personal Homepage: http://www.brc.dcs.gla.ac.uk/~michab/
Tel.: +44 (0)141 330 2958