[Bioclusters] Grid BLAST

david coornaert dcoorna at dbm.ulb.ac.be
Mon Sep 26 06:59:51 EDT 2005


Allright, I understand the argument passed to blast in order to specify 
actual size of the DB, instead  of letting
blast assume size from the chunk it is running against.

Since I'm playing with this for the moment, I was wondering what 
solution people used
when users asked for blast jobs against set of DBs, each of these being 
splitted across the nodes.
Of course you just need to add the size, but the point was : is there an 
existing tool ?

Your previous message seemt to indicate that you were doing some 
"posterior" statistical reworking.

For the moment I am using an old version of blastmerge.c (2003) which is 
able to merge
regular text blast quite nicely, but unfortunately is not meant to merge 
blast-chunks of different DBs,

This is not only a theoretical problem, users *do* run blasts jobs 
against mixed sets of DB,
for example against public_embl_est_pln and *my_secret_private_hot_db* 
altogether

I've been planning on rewriting a "blastmerge" by using blast-chunks in 
XML-output,
which would in turn be extremely EZ to merge altogether, my main problem 
being that
I need to produce regular text-blast output (and stats), and I am 
entirely *clueless* regarding this issue.

Anyone experimented with MPI-Blast knows how this mixed DB stats problem 
is managed in MPI-blast ?
Anyone has any clue regarding a "Blast-XML-Output" to 
"Blast-text-output" converter ? in bioperl ?



===============================================
David Coornaert    (dcoorna at dbm.ulb.ac.be)

Belgian Embnet Node (http://www.be.embnet.org)
Université Libre de Bruxelles

Laboratoire de Bioinformatique
12, Rue des Professeurs Jeener & Brachet
6041  Gosselies
BELGIQUE

Tél:  +3226509975
Fax:  +3226509998
===============================================



Tim Cutts wrote:

>
> On 26 Sep 2005, at 10:22 am, david coornaert wrote:
>
>> What are you using to merge the outputs ? (and to manage the stats...)
>>
>
> Hey, I'm just the sysadmin.  :-)  I think the output is merged  
> usually by perl code, using the BioPerl BLAST parsers.  Stats can be  
> dealt with, I understand, in BLAST itself; there's a parameter to  
> tell it actually how large the total database is, not just the  
> segment it's currently running against.  There are people on this  
> list far better qualified than I to give you the details...
>
> Tim
>
>


More information about the Bioclusters mailing list