[Bioclusters] Questions on mpiBLAST

Jason Gans jgans at lanl.gov
Thu Feb 3 14:36:07 EST 2005


Hello,

There are a number of reasons for the results you show below.

1) Load balancing.

The latest version of mpiBLAST uses a master node and
a scheduler node. Hence if you run mpiBLAST on 16 nodes, only 14 worker
nodes will being performing the actual BLAST search (i.e. the heavy lifting).

If you format your database into 16 fragments, 12 worker nodes will be
assigned 1 fragment each and 2 worker nodes will get 2 fragments. This is fine
for a large query (and may actually improve load balancing) but for a small 
query
the nodes that must search 2 fragments will be the rate limiting step in 
your calculation.

You're better off formatting your database into 14 fragments (so that every 
worker
node searches a single fragment).

2) Run time depends not just on the length of the query, but on the 
sequence composition of
the query as well.

A query sequence that is "similar" to a large number of database sequences 
will take longer to
search than a query sequence that is "similar" to a only small number of 
database sequences.

The reason for this is two-fold: (a) The BLAST algorithm only fully aligns 
two sequences if it first
identifies identical sub-sequences of length W or greater. (b) The time 
that mpiBLAST spends
formatting the BLAST output is proportional to the number of database 
entires that match the
query (not the query length).

Regards,

Jason


>Hi Everyone:
>
>We have a 16-node Xserve cluster, with 2GB memory on each node and dual
>processors.  I was able to install mpiBLAST on it, along with LAM/MPI.
>However, the performance that I saw with some test runs has not been that
>good and quite confusing.  Here is what I did:
>
>
>1.) I formatted the nt database:
>
>mpiformatdb -N 16 -i nt
>
>2.) I ran the mpiblast on one, two, five, ten, twenty, and more sequences
>(about 500bp each) and with the command:
>
>time mpirun N mpiblast -p blastn -d nt -i single.fa -o blast_results.
>
>Here are the numbers:
>
>Single: 1m39.054s
>Two: 0m11.009s
>Five: 0m16.021s
>Ten: 0m46.591s
>twenty: 3m7.541s
>..
<snip>




More information about the Bioclusters mailing list