[Bioclusters] mpiBLAST Performance

Jason D. Gans bioclusters@bioinformatics.org
Wed, 14 May 2003 09:43:10 -0600


"Osborne, John" wrote:

<snip>

> I'm still wondering though why mpiblast doesn't assign each node a
> specific
> piece of the database in local storage.  I am looking at the local storage
> area on one
> of my nodes (n2) and there is nr.00, nr.05 and nr.13 indices.  Should each
> node just get
> one piece?  
> Or does every node evetually get the entire thing as you run it
> over and
> over again?
> 

Each node will get one piece. If you fragment the database using the command

mpiblast -N 20 ...

you will get 21 fragments. When you run mpiblast, however, you should provide 22 machines
(21 workers + 1 master). If you specify less than 22 nodes, at least one
node will have to process more than one fragment (with the associated cost of 
of copying the needed database fragment and the accumulation of multiple 
database fragments on multiple nodes).

Also, while not a factor when blasting against the nr database, shuffling the 
nt database yields a substantial speed increase in blast searches (I have obtained
a 28% decrease in wall clock time for certain nucleotide queries).

Jason Gans

B-1 Div
Los Alamos National Lab