I'd like to make a brief addendum to Jason's excellent reply... Jason Gans wrote: > Hello, > > There are a number of reasons for the results you show below. > > 1) Load balancing. > > The latest version of mpiBLAST uses a master node and > a scheduler node. Hence if you run mpiBLAST on 16 nodes, only 14 worker > nodes will being performing the actual BLAST search (i.e. the heavy > lifting). > > If you format your database into 16 fragments, 12 worker nodes will be > assigned 1 fragment each and 2 worker nodes will get 2 fragments. This > is fine > for a large query (and may actually improve load balancing) but for a > small query > the nodes that must search 2 fragments will be the rate limiting step > in your calculation. > > You're better off formatting your database into 14 fragments (so that > every worker > node searches a single fragment). The "scheduler" process performs almost no work, so to really optimize performance on a 16 node cluster one could try formatting the database into 15 fragments and running 17 processes. Of course, care must be taken that the node which runs two processes is running a scheduler and either a worker or output. The best way I can think of to achieve this would be adding the following at line 178 of mpiblast.cpp (version 1.3.0): scheduler_process = node_count - 1; That will set the last MPI process to be the scheduler. AFAIK, mpich (and possibly other MPI implementations) will wrap around to the first node when assigning processes beyond the number of nodes given in the mpich configuration. The net result being that the scheduler process and writer process end up on the same cluster node. > > 2) Run time depends not just on the length of the query, but on the > sequence composition of > the query as well. > > A query sequence that is "similar" to a large number of database > sequences will take longer to > search than a query sequence that is "similar" to a only small number > of database sequences. > > The reason for this is two-fold: (a) The BLAST algorithm only fully > aligns two sequences if it first > identifies identical sub-sequences of length W or greater. (b) The > time that mpiBLAST spends > formatting the BLAST output is proportional to the number of database > entires that match the > query (not the query length). > One additional factor that can significantly impact the run time is the length of DB sequences that your queries hit. By default, versions 1.2.x and 1.3.0 of mpiBLAST transmit the *entire* database sequence over the wire, not just the portion of the sequence used in the resulting alignment. Nucleotide databases like nt or the human chromosome DB contain sequences several MB in length, which can result in LOTS of network traffic. Long sequences are not usually a problem with protein sequence databases. Fortunately, a workaround exists for blastn searches. The command-line option --disable-mpi-db will prevent workers from transmitting sequences over the network. Instead, the writer process reads only the necessary parts of the sequence from the database on shared storage (e.g. it reads a small amount of data from NFS instead of a large amount of data from worker nodes). Summary: to get good performance, always use --disable-mpi-db when performing blastn searches on databases with large sequence entries like nt and human chromosomes. A nice feature for a future mpiBLAST release would be workers transmitting only the aligned portion of the bioseq to the writer instead of the entire bioseq... -Aaron