[Bioclusters] BLAST job time estimates

Chris Dwan bioclusters@bioinformatics.org
Mon, 7 Jun 2004 11:04:51 -0500


> It looks like I stuck with doing the time prediction because we are
> plugging into an existing cluster with existing rules, much as I would
> like to avoid this issue altogether.... :-)

I find that BLAST run time prediction is pretty consistent (within 5% 
or so) based only on query length, provided that you're allowed to run 
a set of tests on the exact target in question, on the exact machines 
in question.  I've got an instrumented version of the EnsEMBL pipeline 
which saves runtimes (and queue waits, and all sorts of other goodies) 
for later perusal.   On an analysis containing 627 contigs from 
Medicago truncatula the times for blastn vs NCBI NT on our Xserves 
(bins of 10,000bp length) look like this:

mysql> select count(distinct(contig_id)) as num_contigs, floor(length / 
10000) as bp, avg(runtime), \
              std(runtime), run_queue  from contig, input_id_analysis 
where input_id = name and analysis_id = 3 \
             and run_queue = "CCGB_XSERVE" group by bp, run_queue order 
by run_queue, bp;
+-------------+------+--------------+--------------+-------------+
| num_contigs | bp   | avg(runtime) | std(runtime) | run_queue   |
+-------------+------+--------------+--------------+-------------+
|           3 |    0 |     246.0000 |       6.5320 | CCGB_XSERVE |
|           4 |    1 |     424.7500 |      39.3407 | CCGB_XSERVE |
|           1 |    2 |     803.0000 |       0.0000 | CCGB_XSERVE |
|           3 |    3 |     790.6667 |      65.6523 | CCGB_XSERVE |
|           6 |    4 |    1063.8333 |      64.2117 | CCGB_XSERVE |
|           5 |    5 |    1217.8000 |      90.1341 | CCGB_XSERVE |
|           5 |    6 |    1354.4000 |      65.0372 | CCGB_XSERVE |
|           8 |    7 |    1630.7500 |      70.1334 | CCGB_XSERVE |
|           5 |    8 |    1886.8000 |      70.8220 | CCGB_XSERVE |
|           7 |    9 |    2065.2857 |      99.2928 | CCGB_XSERVE |
|          20 |   10 |    2299.0000 |      99.3700 | CCGB_XSERVE |
|          18 |   11 |    2523.0000 |     125.7763 | CCGB_XSERVE |
|          23 |   12 |    2714.9565 |     157.0911 | CCGB_XSERVE |
|          14 |   13 |    3016.5714 |      81.9658 | CCGB_XSERVE |
|           6 |   14 |    3264.6667 |      64.0356 | CCGB_XSERVE |
|           1 |   16 |    3817.0000 |       0.0000 | CCGB_XSERVE |
+-------------+------+--------------+--------------+-------------+

BLASTX vs Uniref looks like:

mysql> select count(distinct(contig_id)) as num_contigs, floor(length / 
10000) as bp, avg(runtime), \
              std(runtime), run_queue  from contig, input_id_analysis 
where input_id = name and analysis_id = 14 \
             and run_queue = "CCGB_XSERVE" group by bp, run_queue order 
by run_queue, bp;
+-------------+------+--------------+--------------+-------------+
| num_contigs | bp   | avg(runtime) | std(runtime) | run_queue   |
+-------------+------+--------------+--------------+-------------+
|           4 |    0 |      96.2500 |      18.9126 | CCGB_XSERVE |
|           5 |    1 |     515.2000 |      85.6491 | CCGB_XSERVE |
|           2 |    2 |     810.0000 |       2.0000 | CCGB_XSERVE |
|           2 |    3 |    1326.0000 |     115.0000 | CCGB_XSERVE |
|           3 |    4 |    1931.6667 |      46.6571 | CCGB_XSERVE |
|           5 |    5 |    2712.2000 |     150.0325 | CCGB_XSERVE |
|           3 |    6 |    3104.0000 |      99.5624 | CCGB_XSERVE |
|           6 |    7 |    3799.5000 |     218.6342 | CCGB_XSERVE |
|           3 |    8 |    5052.0000 |     205.0870 | CCGB_XSERVE |
|           7 |    9 |    5697.5714 |     480.6186 | CCGB_XSERVE |
|           7 |   10 |    6887.2857 |     385.0632 | CCGB_XSERVE |
|          19 |   11 |    7707.6316 |     342.8089 | CCGB_XSERVE |
|          18 |   12 |    8812.0000 |     502.8817 | CCGB_XSERVE |
|          10 |   13 |    9638.8000 |     726.1260 | CCGB_XSERVE |
|           6 |   14 |   10457.5000 |     742.9995 | CCGB_XSERVE |
|           2 |   16 |   13521.0000 |      58.0000 | CCGB_XSERVE |
+-------------+------+--------------+--------------+-------------+

-Chris Dwan
   The University of Minnesota