[Bioclusters] free parallel versions of BLAST

Dan Bolser bioclusters@bioinformatics.org
Thu, 26 Feb 2004 12:25:49 +0000 (GMT)

I wrote a basic blast job parallelizer (working from a common file system
between each node (nfs mounted)). Nodes communicated via a mysql database
to avoid a potential race condition with result file locks.

Depending on your network this may be the best solution. Alternativly I
think each node will need the whole target database, and you can do crude
load ballancing with query sequence length.

My approach was very basic because we had one central file system, so
didn't have to worry about splitting up the query or target databases
(which could get really complex).

The scripts are written in perl and have several steps in the pipeline -
make db non-redundant, formatdb, split querydb into individual files, run
jobs (very flexible via mysql communication), parse results into mysql.

If you want I can give you these scripts.


On 26 Feb 2004, Micha Bayer wrote:

> Hi,
> does anyone know of a non-commercial, open source/free package that
> provides a parallelisation of BLAST (apart from mpiBLAST which is not
> suitable for us).
> I am interested in something that would split input files into single
> query sequences, partition the database and collate the results (ideally
> with an adjustment of the e-values etc).
> It looks like some of the commercial packages like Paracel do all of the
> above but I really need an open source version and before I get writing
> my own I want to make sure I have tried all the available options.
> I am looking to run a service both on a Windows XP based Condor pool and
> on a cluster that uses OpenPBS but has no message passing capabilities
> to speak of.
> cheers
> Micha