On 1 Jun 2004, at 11:57 am, Micha Bayer wrote: > A formula would presumably take into account things like the length and > number of the input queries, the size and makeup of the target database > (i.e. number and length of sequences contained in this), the > similarities between the query and the target sequences and local > hardware parameters (processors, memory, local network speeds etc). I think it's very difficult to predict. I'm pretty certain the algorithm is O(n*m) in both memory and time, but a meaningful prediction of a real time is very difficult indeed, since the number of HSPs found will make an enormous difference to both memory use and time. And the number of HSPs you find can vary enormously depending on the exact parameters you give to BLAST, even with identical input sequences. We don't bother with this sort of estimation. LSF can improve its scheduling by using such estimates, but we don't bother, and just use LSF's fairshare mechanism. If a user is submitting very long running jobs, their priority will dynamically fall off to give other users a crack at that CPUs, so it all works out OK in the end. Rather more important, in our experience, is estimating how much RAM the job is going to require. Memory overcommits are one of our biggest problems now, especially on our larger SMP boxes. We have a 32-way machine with 192 GB of memory, which *regularly* runs out of virtual memory. The LSF queue that services that machine now has an esub in place to force people to provide LSF with an estimate of how much memory the job will use, but there's still no way we can force them to be accurate! We've ended up putting strict memory use limits on the LSF queues, and jobs which exceed those limits get killed. Tim -- Dr Tim Cutts Informatics Systems Group Wellcome Trust Sanger Institute Hinxton, Cambridge, CB10 1SA, UK