[Bioclusters] Web-accessible Parallel BLAST Services -- capacity limits

Kravitz, Saul skravitz at jcvi.org
Tue Dec 4 15:08:58 EST 2007


We have developed and deployed a web-accessible parallel implementation
of BLAST as part of the CAMERA project (we welcome registrations from
the bioinformatics.org community at http://camera.calit2.net
<http://camera.calit2.net/> ).  Our implementation involves segmentation
of both query sequences and subject databases, and runs on to of SGE.
Submissions are orchestrated via an application running on JBoss.
Although we realize that there are a many parallel implementations of
BLAST,  I am interested in the capabilities of public web-accessible
deployments of parallel BLAST.  Specifically, there capacities as
measured in terms of:
   - Maximum number of query sequences
   - Maximum total size of the query sequences
   - Maximum number or data volume of hits returned 
   - Maximum Total Search Time  (NCBI is 1 Hr of cumulative CPU time)
 
In CAMERA's case, the numbers today are:
   - Maximum number of query sequences -- we've tested to 250k
   - Maximum total size of the query sequences -- we've tested to 100Mbp
   - Maximum number or data volume of hits returned  (we've tested to
several hundred thousand)
   - Maximum Total Search Time  (None so far)
 
I inquired of NCBI regarding their capabilities and got the following
response from Wayne Matten at the NCBI Service Desk: 

	This is impossible to say due to the variability in queries,
databases and the output generated. Our only set limit is on cpu time
(about 1 hour cumulative, over several machines). In other words, you
can only determine the practical limits of your searches by testing
them.

 
Regards,
 
Saul Kravitz
J. Craig Venter Institute

 
Also, I inquired of NCBI 
-Saul


 


More information about the Bioclusters mailing list