[Bioclusters] BLAST job time estimates

Tim Cutts bioclusters@bioinformatics.org
Tue, 1 Jun 2004 14:00:07 +0100


On 1 Jun 2004, at 1:33 pm, Micha Bayer wrote:

> Hi Tim,
>
> thanks for that. Can you just clarify what n and m are in your response
> below?

For a given pair of sequences being aligned, n & m are the lengths of 
the two sequences.  So in the case of your blast search, you need to 
know the lengths of the largest query sequence and the largest target 
sequence.

> It looks like I stuck with doing the time prediction because we are
> plugging into an existing cluster with existing rules, much as I would
> like to avoid this issue altogether.... :-)

All I can suggest then is an iterative procedure - submit jobs with a 
very conservative estimate of CPU time.  They'll get low priority, but 
that's better than them being killed because they've been running too 
long.  Then reduce the requirement when you've got a feel for the real 
requirements of the job.

Tim

PS.  I wish the powers that be would let me be as draconian with our 
cluster as your guys are.  It would solve a whole heap of trouble.  :-)

-- 
Dr Tim Cutts
Informatics Systems Group
Wellcome Trust Sanger Institute
Hinxton, Cambridge, CB10 1SA, UK