[Bioclusters] requesting design advice on grid-optimized genome annotation system

Sun Oct 23 14:46:26 EDT 2005

Hey Gang,

I'm designing a system for automatic prokaryotic genome annotation.  The 
system will need to annotate (typically several thousand) coding regions, 
in part by BLASTing multiple reference databases, like COGs, UNIPROT, ncbi 
nr etc.  Im wondering about the most efficient way to do this using my 
Xserve cluster and mpi-blast.  Im cool with prestaging the 
mpi-blast-formatted databases onto the compute nodes, and my intuition 
tells me it would be best to blast the set of coding regions against one 
reference database at at time, ie blast all coding regions against COGs, 
then again against UNIPROT, etc.  That way the reference databases can 
stay resident in RAM for the entire blast run against the genome coding 
regions.  Does this sound right?  Will this actually happen? Would I call 
the mpi-blast executable once on the entire list of coding regions, or 
would multiple mpi-blast calls (one per coding region) achieve the same 
thing (keeping the database resident in RAM)?  Any advice on how to 
implement this system for optimal 
mpi-blasting would be sincerely appreciated.

May the force be with you,

g.
-- 
Gary Van Domselaar, PhD
Associate Director, Bioinformatics.Org
gary at bioinformatics.org