[Bioclusters] Questions on mpiBLAST

Thu Feb 3 15:39:11 EST 2005

I'd like to make a brief addendum to Jason's excellent reply...

Jason Gans wrote:

> Hello,
>
> There are a number of reasons for the results you show below.
>
> 1) Load balancing.
>
> The latest version of mpiBLAST uses a master node and
> a scheduler node. Hence if you run mpiBLAST on 16 nodes, only 14 worker
> nodes will being performing the actual BLAST search (i.e. the heavy 
> lifting).
>
> If you format your database into 16 fragments, 12 worker nodes will be
> assigned 1 fragment each and 2 worker nodes will get 2 fragments. This 
> is fine
> for a large query (and may actually improve load balancing) but for a 
> small query
> the nodes that must search 2 fragments will be the rate limiting step 
> in your calculation.
>
> You're better off formatting your database into 14 fragments (so that 
> every worker
> node searches a single fragment).

The "scheduler" process performs almost no work, so to really optimize 
performance on a 16 node cluster one could try formatting the database 
into 15 fragments and running 17 processes.  Of course, care must be 
taken that the node which runs two processes is running a scheduler and 
either a worker or output.  The best way I can think of to achieve this 
would be adding the following at line 178 of mpiblast.cpp (version 1.3.0):
scheduler_process = node_count - 1;

That will set the last MPI process to be the scheduler.  AFAIK, mpich 
(and possibly other MPI implementations) will wrap around to the first 
node when assigning processes beyond the number of nodes given in the 
mpich configuration.  The net result being that the scheduler process 
and writer process end up on the same cluster node.

>
> 2) Run time depends not just on the length of the query, but on the 
> sequence composition of
> the query as well.
>
> A query sequence that is "similar" to a large number of database 
> sequences will take longer to
> search than a query sequence that is "similar" to a only small number 
> of database sequences.
>
> The reason for this is two-fold: (a) The BLAST algorithm only fully 
> aligns two sequences if it first
> identifies identical sub-sequences of length W or greater. (b) The 
> time that mpiBLAST spends
> formatting the BLAST output is proportional to the number of database 
> entires that match the
> query (not the query length).
>

One additional factor that can significantly impact the run time is the 
length of DB sequences that your queries hit.  By default, versions 
1.2.x and 1.3.0 of mpiBLAST transmit the *entire* database sequence over 
the wire, not just the portion of the sequence used in the resulting 
alignment.  Nucleotide databases like nt or the human chromosome DB 
contain sequences several MB in length, which can result in LOTS of 
network traffic.  Long sequences are not usually a problem with protein 
sequence databases.  Fortunately, a workaround exists for blastn 
searches.  The command-line option --disable-mpi-db will prevent workers 
from transmitting sequences over the network.  Instead, the writer 
process reads only the necessary parts of the sequence from the database 
on shared storage (e.g. it reads a small amount of data from NFS instead 
of a large amount of data from worker nodes).

Summary: to get good performance, always use --disable-mpi-db when 
performing blastn searches on databases with large sequence entries like 
nt and human chromosomes.

A nice feature for a future mpiBLAST release would be workers 
transmitting only the aligned portion of the bioseq to the writer 
instead of the entire bioseq...

-Aaron