> It looks like I stuck with doing the time prediction because we are > plugging into an existing cluster with existing rules, much as I would > like to avoid this issue altogether.... :-) I find that BLAST run time prediction is pretty consistent (within 5% or so) based only on query length, provided that you're allowed to run a set of tests on the exact target in question, on the exact machines in question. I've got an instrumented version of the EnsEMBL pipeline which saves runtimes (and queue waits, and all sorts of other goodies) for later perusal. On an analysis containing 627 contigs from Medicago truncatula the times for blastn vs NCBI NT on our Xserves (bins of 10,000bp length) look like this: mysql> select count(distinct(contig_id)) as num_contigs, floor(length / 10000) as bp, avg(runtime), \ std(runtime), run_queue from contig, input_id_analysis where input_id = name and analysis_id = 3 \ and run_queue = "CCGB_XSERVE" group by bp, run_queue order by run_queue, bp; +-------------+------+--------------+--------------+-------------+ | num_contigs | bp | avg(runtime) | std(runtime) | run_queue | +-------------+------+--------------+--------------+-------------+ | 3 | 0 | 246.0000 | 6.5320 | CCGB_XSERVE | | 4 | 1 | 424.7500 | 39.3407 | CCGB_XSERVE | | 1 | 2 | 803.0000 | 0.0000 | CCGB_XSERVE | | 3 | 3 | 790.6667 | 65.6523 | CCGB_XSERVE | | 6 | 4 | 1063.8333 | 64.2117 | CCGB_XSERVE | | 5 | 5 | 1217.8000 | 90.1341 | CCGB_XSERVE | | 5 | 6 | 1354.4000 | 65.0372 | CCGB_XSERVE | | 8 | 7 | 1630.7500 | 70.1334 | CCGB_XSERVE | | 5 | 8 | 1886.8000 | 70.8220 | CCGB_XSERVE | | 7 | 9 | 2065.2857 | 99.2928 | CCGB_XSERVE | | 20 | 10 | 2299.0000 | 99.3700 | CCGB_XSERVE | | 18 | 11 | 2523.0000 | 125.7763 | CCGB_XSERVE | | 23 | 12 | 2714.9565 | 157.0911 | CCGB_XSERVE | | 14 | 13 | 3016.5714 | 81.9658 | CCGB_XSERVE | | 6 | 14 | 3264.6667 | 64.0356 | CCGB_XSERVE | | 1 | 16 | 3817.0000 | 0.0000 | CCGB_XSERVE | +-------------+------+--------------+--------------+-------------+ BLASTX vs Uniref looks like: mysql> select count(distinct(contig_id)) as num_contigs, floor(length / 10000) as bp, avg(runtime), \ std(runtime), run_queue from contig, input_id_analysis where input_id = name and analysis_id = 14 \ and run_queue = "CCGB_XSERVE" group by bp, run_queue order by run_queue, bp; +-------------+------+--------------+--------------+-------------+ | num_contigs | bp | avg(runtime) | std(runtime) | run_queue | +-------------+------+--------------+--------------+-------------+ | 4 | 0 | 96.2500 | 18.9126 | CCGB_XSERVE | | 5 | 1 | 515.2000 | 85.6491 | CCGB_XSERVE | | 2 | 2 | 810.0000 | 2.0000 | CCGB_XSERVE | | 2 | 3 | 1326.0000 | 115.0000 | CCGB_XSERVE | | 3 | 4 | 1931.6667 | 46.6571 | CCGB_XSERVE | | 5 | 5 | 2712.2000 | 150.0325 | CCGB_XSERVE | | 3 | 6 | 3104.0000 | 99.5624 | CCGB_XSERVE | | 6 | 7 | 3799.5000 | 218.6342 | CCGB_XSERVE | | 3 | 8 | 5052.0000 | 205.0870 | CCGB_XSERVE | | 7 | 9 | 5697.5714 | 480.6186 | CCGB_XSERVE | | 7 | 10 | 6887.2857 | 385.0632 | CCGB_XSERVE | | 19 | 11 | 7707.6316 | 342.8089 | CCGB_XSERVE | | 18 | 12 | 8812.0000 | 502.8817 | CCGB_XSERVE | | 10 | 13 | 9638.8000 | 726.1260 | CCGB_XSERVE | | 6 | 14 | 10457.5000 | 742.9995 | CCGB_XSERVE | | 2 | 16 | 13521.0000 | 58.0000 | CCGB_XSERVE | +-------------+------+--------------+--------------+-------------+ -Chris Dwan The University of Minnesota