> What do you do when the target databases change in size (as they do > with > every update) - have you developed a formula for adjusting the runtimes > then? Short answer: No, and I don't need to. Only one of the clusters on which I run requires a time estimate, and there is no penalty for overestimating other than scheduling priority (I'm charged for what I use, not what I reserve). Therefore, I say "12 hours" for all blastn vs NT or blastx vs Uniref, and "1 hour" for all the rest (an assortment of chromosomes, tiger gene indexes, and full length cDNA sequences). My questions would be "how accurate do you need to be" and "is there a penalty for overestimating?" I don't use the runtime queries I shared for anything other than after the fact analysis. > I have actually found that the size of the target database is a much > stronger predictor of the wall time the job takes than the query size. > Times seem pretty consistent across different length queries run > against > the same target (I randomly generate my test queries now). This is true as long as your query sequences are small (under ~10,000bp). My queries are BACs, up to 160,000bp in length. There is a large runtime difference between a query of 10,000bp and one with 100,000bp. A plot of wallclock runtime in seconds as a function of query size in bp for BLASTN (for a variety of processors, all single thread, lots of uncontrolled variables, some limitations may apply): http://ccgb.umn.edu/~cdwan/benchmarks/image007.gif It's pretty well known that (within certain reasonable limits) blastn is limited by how efficiently you can get the target out of storage, and blastx is limited by the clock speed of the processor. Once upon a time, I tried to find a query length which would maximize query bp analyzed per second. It turned out to be a more efficient use of my time to go looking for more processors instead. :) -Chris Dwan