Thanks Chris, that's useful to know. What do you do when the target databases change in size (as they do with every update) - have you developed a formula for adjusting the runtimes then? Also, what is runtime in this context - CPU time or walltime? Since I wrote the original message I have run a few test runs myself and I have actually found that the size of the target database is a much stronger predictor of the wall time the job takes than the query size. Times seem pretty consistent across different length queries run against the same target (I randomly generate my test queries now). cheers Micha On Mon, 2004-06-07 at 17:04, Chris Dwan wrote: > > It looks like I stuck with doing the time prediction because we are > > plugging into an existing cluster with existing rules, much as I would > > like to avoid this issue altogether.... :-) > > I find that BLAST run time prediction is pretty consistent (within 5% > or so) based only on query length, provided that you're allowed to run > a set of tests on the exact target in question, on the exact machines > in question. I've got an instrumented version of the EnsEMBL pipeline > which saves runtimes (and queue waits, and all sorts of other goodies) > for later perusal. On an analysis containing 627 contigs from > Medicago truncatula the times for blastn vs NCBI NT on our Xserves > (bins of 10,000bp length) look like this: > > mysql> select count(distinct(contig_id)) as num_contigs, floor(length / > 10000) as bp, avg(runtime), \ > std(runtime), run_queue from contig, input_id_analysis > where input_id = name and analysis_id = 3 \ > and run_queue = "CCGB_XSERVE" group by bp, run_queue order > by run_queue, bp; > +-------------+------+--------------+--------------+-------------+ > | num_contigs | bp | avg(runtime) | std(runtime) | run_queue | > +-------------+------+--------------+--------------+-------------+ > | 3 | 0 | 246.0000 | 6.5320 | CCGB_XSERVE | > | 4 | 1 | 424.7500 | 39.3407 | CCGB_XSERVE | > | 1 | 2 | 803.0000 | 0.0000 | CCGB_XSERVE | > | 3 | 3 | 790.6667 | 65.6523 | CCGB_XSERVE | > | 6 | 4 | 1063.8333 | 64.2117 | CCGB_XSERVE | > | 5 | 5 | 1217.8000 | 90.1341 | CCGB_XSERVE | > | 5 | 6 | 1354.4000 | 65.0372 | CCGB_XSERVE | > | 8 | 7 | 1630.7500 | 70.1334 | CCGB_XSERVE | > | 5 | 8 | 1886.8000 | 70.8220 | CCGB_XSERVE | > | 7 | 9 | 2065.2857 | 99.2928 | CCGB_XSERVE | > | 20 | 10 | 2299.0000 | 99.3700 | CCGB_XSERVE | > | 18 | 11 | 2523.0000 | 125.7763 | CCGB_XSERVE | > | 23 | 12 | 2714.9565 | 157.0911 | CCGB_XSERVE | > | 14 | 13 | 3016.5714 | 81.9658 | CCGB_XSERVE | > | 6 | 14 | 3264.6667 | 64.0356 | CCGB_XSERVE | > | 1 | 16 | 3817.0000 | 0.0000 | CCGB_XSERVE | > +-------------+------+--------------+--------------+-------------+ > > BLASTX vs Uniref looks like: > > mysql> select count(distinct(contig_id)) as num_contigs, floor(length / > 10000) as bp, avg(runtime), \ > std(runtime), run_queue from contig, input_id_analysis > where input_id = name and analysis_id = 14 \ > and run_queue = "CCGB_XSERVE" group by bp, run_queue order > by run_queue, bp; > +-------------+------+--------------+--------------+-------------+ > | num_contigs | bp | avg(runtime) | std(runtime) | run_queue | > +-------------+------+--------------+--------------+-------------+ > | 4 | 0 | 96.2500 | 18.9126 | CCGB_XSERVE | > | 5 | 1 | 515.2000 | 85.6491 | CCGB_XSERVE | > | 2 | 2 | 810.0000 | 2.0000 | CCGB_XSERVE | > | 2 | 3 | 1326.0000 | 115.0000 | CCGB_XSERVE | > | 3 | 4 | 1931.6667 | 46.6571 | CCGB_XSERVE | > | 5 | 5 | 2712.2000 | 150.0325 | CCGB_XSERVE | > | 3 | 6 | 3104.0000 | 99.5624 | CCGB_XSERVE | > | 6 | 7 | 3799.5000 | 218.6342 | CCGB_XSERVE | > | 3 | 8 | 5052.0000 | 205.0870 | CCGB_XSERVE | > | 7 | 9 | 5697.5714 | 480.6186 | CCGB_XSERVE | > | 7 | 10 | 6887.2857 | 385.0632 | CCGB_XSERVE | > | 19 | 11 | 7707.6316 | 342.8089 | CCGB_XSERVE | > | 18 | 12 | 8812.0000 | 502.8817 | CCGB_XSERVE | > | 10 | 13 | 9638.8000 | 726.1260 | CCGB_XSERVE | > | 6 | 14 | 10457.5000 | 742.9995 | CCGB_XSERVE | > | 2 | 16 | 13521.0000 | 58.0000 | CCGB_XSERVE | > +-------------+------+--------------+--------------+-------------+ > > -Chris Dwan > The University of Minnesota > > _______________________________________________ > Bioclusters maillist - Bioclusters@bioinformatics.org > https://bioinformatics.org/mailman/listinfo/bioclusters -- -------------------------------------------------- Dr Micha M Bayer Grid Developer, BRIDGES Project National e-Science Centre, Glasgow Hub 246c Kelvin Building University of Glasgow Glasgow G12 8QQ Scotland, UK Email: michab@dcs.gla.ac.uk Project home page: http://www.brc.dcs.gla.ac.uk/projects/bridges/ Personal Homepage: http://www.brc.dcs.gla.ac.uk/~michab/ Tel.: +44 (0)141 330 2958