[Bioclusters] BLAST/ PBS / Grid Engine
Ivo Grosse
bioclusters@bioinformatics.org
Fri, 17 May 2002 18:56:59 -0400
Hi Steve,
we don't run a Blast web server here, but we have multiple users
submitting thousands (or ten-thousands) of Blast jobs to the cluster,
and I think from a computational standpoint that's the same. So how do
we distribute those 10^4 or 10^5 Blast jobs from the login node over
the cluster nodes? We use SGE to distribute the jobs, and we do not
partition the database into fragments. In contrast, we put identical
copies of the database (or databases) on each slave node, and whenever
a new job arrives, the database is read just from the local disk. Of
course that requires that the database will fit in RAM, so you should
consider at least 1 GB RAM per processor.
What we partition is the query sequence, typically into fragments of
101 kb overlapping by 1 kb, or into fragments of 1001 kb overlapping by
1 kb. That partitioning will allow you, for example, to Blast the
human genome against the mouse genome in a few days, so in your case it
may prevent the situation where many nodes are idle while one node is
busy for months. (I guess, if the comparison of mouse to human is
assigned to just one node, it may indeed run for several months.)
Best regards, Ivo