"Osborne, John" wrote: > > >Each node will get one piece. If you fragment the database using the > command > >mpiblast -N 20 ... > > I thought fragmenting was done only by mpiformatdb? You are correct! My bad -- I mistyped (should have been "mpiformatdb -N ..."). Sorry about that. > >you will get 21 fragments. When you run mpiblast, however, you should > provide 22 machines > >(21 workers + 1 master). If you specify less than 22 nodes, at least one > >node will have to process more than one fragment (with the associated cost > of > >of copying the needed database fragment and the accumulation of multiple > >database fragments on multiple nodes). > > > I'm not sure how you provide the master node exactly, I have just included > mine > making it node 0. Why do you provide 22 machines for 21 fragments? mpiblast will attempt to create a worker process (running on its own node) for each database fragment in addition to a single master process (that is responsible for distributing work to the worker nodes and assembling the final output) running on its own node. > >Also, while not a factor when blasting against the nr database, shuffling > the > >nt database yields a substantial speed increase in blast searches (I have > obtained > >a 28% decrease in wall clock time for certain nucleotide queries). > > > Shuffling? > I should have been more clear here. By "shuffling" I mean randomizing the order of the sequences in the database file in order to improve load balancing. The running time of mpiblast is limited by the time is takes to for the slowest worker to finish its task (assuming one fragment per worker). Since sequences in a particular database (nt for instance) may be ordered according to biological relevance (i.e. pathway, sequence similarity, ...) a query sequence may generate a lot of hits against a cluster of sequences in the database. This will slow down the worker node that happened to have this cluster in its database fragment (and limit the overall speed of mpiblast). To prevent clusters of similar sequences from showing up in the same fragment, one can randomize the order of sequences in the database. Regards, Jason B-1 Div Los Alamos National Lab