[Bioclusters] How can I make blast job running short time on Gridengine

Elia Stupka bioclusters@bioinformatics.org
Fri, 22 Nov 2002 14:20:41 +0800 (SGT)


Hi Grace, some further thoughts:

>     A popular method for running jobs on two different machines at once is
> to divide the input into parts, send each part to a different machine, run
> to program on each machine using the segment of the input on that machine,

The most optimal way is to split the input (i.e. your sequence to be
analysed) but not the blast database. In other words, if possible, have as
much local storage as possible for the actual blast databases, so that you
don't have to split them, and all you split is the (small) sequences to be
analysed.

> Combining parallel (embarrassingly parallel) job execution with
> scheduling/load-balancing features of DRM tools is really the key to
> achieving the efficiency in a cluster that makes if a valuable
> resource for doing things like BLAST.  

It's exactly for this integration that we've been developing BioPipe
(www.biopipe.org) to make use of a choesn Load Sharing Softwares, as well
as commonly used software such as bioperl, ensembl, biosql,etc. and manage
a bioinformatics workflow through a combination of these in a parallel,
load-balances fashion. You might want to take a look at it, we are
currently using it for genome annotation and it serves us well. Of course
it is open source and in continuous development, feel free to try it and
shout at us ;)

Elia

********************************
* http://www.fugu-sg.org/~elia *
* tel:    +65 6874 1467        *
* mobile: +65 9030 7613        *
* fax:    +65 6779 1117        *
********************************