[Bioclusters] blast and nfs

Hunter Matthews bioclusters@bioinformatics.org
21 Apr 2003 17:32:21 -0400


On Mon, 2003-04-21 at 15:18, Chris Dagdigian wrote:

> 
> All you need to do is have enough local disk in each of your compute 
> nodes to hold all (or some) of your BLAST datasets. The idea is that you 
> use the NFS mounted blast databases only as a 'staging area' for 
> rsync'ing or copying your files to scratch or temp space on your compute 
> nodes. Given the cheap cost of 40-80gb IDE disk drives this is a quick 
> and easy way to get around NFS related bottlenecks.
> 
> Each search can then be done against local disk on each compute node 
> rather than all nodes hitting the NFS fileserver and beating it to death...
> 
> This is generally what most BLAST farm operators will do as a "first 
> pass" approach. It works very well and is pretty much standard practice 
> these days.
> 

Are there any available scripts/instructions for either the first pass
or the second pass setups? 

I'm afraid I'm enough of a unix admin to do the work, but not enough of
a biologist to always understand what NCBI blast wants. (esp for the
second approach)

> The "second pass" approach is more complicated and involves splitting up 
> your blast datasets into RAM-sized chunks, distributing them across the 
> nodes in your cluster and then multiplexing your query across all the 
> nodes to get faster throughput times. This is harder to implement and is 
> useful only for long queries against big databases as there is a certain 
> amount of overhead required to merge your multiplexed query results back 
> into one human or machine parsable file.
> 
> People only implement the 'second pass' approach when they really need 
> to. Usually in places where pipelines are constantly repeating the same 
> big searches over and over again.
> 
> 
> My $.02 of course
> 
> -Chris
> www.bioteam.net
> 
> 
> 
> 
> 
> _______________________________________________
> Bioclusters maillist  -  Bioclusters@bioinformatics.org
> https://bioinformatics.org/mailman/listinfo/bioclusters
-- 
Hunter Matthews                          Unix / Network Administrator
Office: BioScience 145/244               Duke Univ. Biology Department
Key: F0F88438 / FFB5 34C0 B350 99A4 BB02  9779 A5DB 8B09 F0F8 8438
Never take candy from strangers. Especially on the internet.