Hi Tim, thanks for that. Can you just clarify what n and m are in your response below? It looks like I stuck with doing the time prediction because we are plugging into an existing cluster with existing rules, much as I would like to avoid this issue altogether.... :-) cheers Micha On Tue, 2004-06-01 at 13:10, Tim Cutts wrote: > On 1 Jun 2004, at 11:57 am, Micha Bayer wrote: > > > A formula would presumably take into account things like the length and > > number of the input queries, the size and makeup of the target database > > (i.e. number and length of sequences contained in this), the > > similarities between the query and the target sequences and local > > hardware parameters (processors, memory, local network speeds etc). > > I think it's very difficult to predict. I'm pretty certain the > algorithm is O(n*m) in both memory and time, but a meaningful > prediction of a real time is very difficult indeed, since the number of > HSPs found will make an enormous difference to both memory use and > time. And the number of HSPs you find can vary enormously depending on > the exact parameters you give to BLAST, even with identical input > sequences. > > We don't bother with this sort of estimation. LSF can improve its > scheduling by using such estimates, but we don't bother, and just use > LSF's fairshare mechanism. If a user is submitting very long running > jobs, their priority will dynamically fall off to give other users a > crack at that CPUs, so it all works out OK in the end. > > Rather more important, in our experience, is estimating how much RAM > the job is going to require. Memory overcommits are one of our biggest > problems now, especially on our larger SMP boxes. We have a 32-way > machine with 192 GB of memory, which *regularly* runs out of virtual > memory. The LSF queue that services that machine now has an esub in > place to force people to provide LSF with an estimate of how much > memory the job will use, but there's still no way we can force them to > be accurate! We've ended up putting strict memory use limits on the > LSF queues, and jobs which exceed those limits get killed. > > Tim -- -------------------------------------------------- Dr Micha M Bayer Grid Developer, BRIDGES Project National e-Science Centre, Glasgow Hub 246c Kelvin Building University of Glasgow Glasgow G12 8QQ Scotland, UK Email: michab@dcs.gla.ac.uk Project home page: http://www.brc.dcs.gla.ac.uk/projects/bridges/ Personal Homepage: http://www.brc.dcs.gla.ac.uk/~michab/ Tel.: +44 (0)141 330 2958