[Bioclusters] parallel blast search

Ivo Grosse bioclusters@bioinformatics.org
Mon, 24 Jun 2002 10:12:34 -0400


Hi Arun,

you pose an interesting question, and if you check the archives of this 
list, you may find some partial answers.

The essence is: 

- splitting the query sequence into small fragments and BLASTing each 
of those fragments against the (entire) database is super-easy to 
implement.  This makes the whole BLAST job trivially parallel, and 
getting the final P-values and E-values right is trivial.

- splitting the database is also trivial, but getting the P-values and 
E-values right is not.  There are commercial versions of BLAST that do 
exactly that, and you may decide if for you it is easier to buy one of 
those programs or to develop one by yourself.

Best regards, Ivo


"Arun M" <arun_mah@mec.ac.in> wrote on Mon, 24 Jun 2002:

> HI
> 
> We have a Beowulf cluster at our college and we are looking for ways to
> parallelize and perform BLAST searches on the Beowulf cluster. We have seen
> that it is hopeless trying to understand the BLAST source code and work on
> it (to parallelize it). Can anyone give any code for running Blast on the
> cluster or give (implementation) details on "parallel Blast" - especially
> how you split the database into subdomains, how to wrap the BLAST code to
> carry out the BLAST subsearches, and finally to parse and merge the
> subsearch results?
> 
> Thanks
> 
> Arun