[Bioclusters] Request for discussions-How to build a biocluster Part 5 (BLAST/DB management)

Sylvain Foisy bioclusters@bioinformatics.org
Thu, 2 May 2002 13:57:15 -0400


Hi,

A reminder: this is coming from a total newbie at this BioCluster stuff. 
it is also to serve as the seed of a tutorial/history-of-building site 
for our creation. I am a total newbie in UNIX administration and 
installation. This is why we will get a system administrator to help us 
out. But I still have to figure out the right questions to ask!!

BLAST

OK, which version of BLAST should we use: NCBI or WU? I have used both 
and quite franckly for most uses, they are pretty much equal although WU 
seems to be faster. Any particular feature from any of these that could 
be helpful to specific users?

Also, can BLAST be part of any system image that could be installed from 
the head to any node? Or can it be installed on the local disk and then 
be access by the system in memory?

THE GENBANK DATABASE

BLAST without the data, what for? OK, what sould be downloaded: the 
GenBank database in its own format or the FASTA transformed one that is 
found in tha BLAST folder at NCBI? In both cases it is a lot of data. 
The idea would be for a user to get the whole GenBank record for a 
particular sequence. However, I think that it could be done either way 
with scripts.

How should the local database be administered? Reading the archive, I 
think that the consensus is that the DB has to be splitted in n pieces 
(n=nb of nodes), each piece sent to a particular node, process with 
formatdb. Or have I everything wrong? I would be worried that the nodes 
which are getting the human sequences or the EST sequences be very hard 
working while the ones with the vector sequences are idle. Is it 
feasible to divide the DB to split the load over the nodes?

How should the daily updates be performed? The same question applies 
because if the same node(s) gets the daily updates, users coming with 
daily jobs wil push the nodes hard.

Am I missing something?

This is open for helpful and constructive discussion

Sylvain

++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Sylvain Foisy, Ph. D.
Manager
BIONEQ - Le Reseau quebecois de bioinformatique
Genome-Quebec
Tel.: (514) 343-6111 poste 5188
E-mail: foisys@medcn.umontreal.ca
++++++++++++++++++++++++++++++++++++++++++++++++++++++++