[Bioclusters] Versions of Blast that run on a cluster?

Wed Jan 5 13:39:56 EST 2005

Hi Malay:

Are there any documentations and/or papers which describe such a setup?
I would assume that there would be general interest in seeing how such a
setup could be implemented.

I was thinking, instead of duplicating ALL the available databases to
the local HD, could some file-staging utlity be used to simply stage the
database to be BLASTed against?  Obviously the file-staging utlity has
to work really quick on the cluster for this method to be viable.

Thanks,

Bernard 

> -----Original Message-----
> From: bioclusters-bounces at bioinformatics.org 
> [mailto:bioclusters-bounces at bioinformatics.org] On Behalf Of Malay
> Sent: Wednesday, January 05, 2005 10:23
> To: Clustering, compute farming & distributed computing in 
> life science informatics
> Subject: Re: [Bioclusters] Versions of Blast that run on a cluster?
> 
> Bernard Li wrote:
> > Hi Malay:
> > 
> > 
> >>Oops I forgot to mention the third option. This is for production 
> >>machine for very high end scaling up and requires ample 
> amount of disc 
> >>space in each node. This is to have each node it's local copy of 
> >>database. And use input spitting through SGE. This the best way to 
> >>scale up to ~1000 jobs at a time. But because of database 
> maintanance 
> >>issue, this method is advisable of for dedicated BLAST farm.
> > 
> > 
> > You meant 'input splitting' right?  And how would you 
> accomplish that
> > using SGE?  By scripting it in your job script?
> > 
> 
> I meant submit each sequence as a separate job.
> 
> There is one more way of doing it. Which is called "pull technique". 
> Where you store each sequences in a RDBMS. A demon runs on 
> each node and 
> pulls the sequence from the RDBMS and runs it against it's own local 
> BLAST database, stores the result in a accesible place and 
> marks the job 
> in RDBMS as "done". A designated node then seek the RDBMS for 
> job marked 
> done and pulls the result for the place. This method is the most 
> efficient of them all, and is used in BLAST server at NCBI.
> 
> 
> -Malay
> 
> _______________________________________________
> Bioclusters maillist  -  Bioclusters at bioinformatics.org
> https://bioinformatics.org/mailman/listinfo/bioclusters
>