[Bioclusters] Parallel blast

Fri, 07 Jun 2002 07:56:17 -0400

Hi Wim,

This will be a quickie response...

With newer versions of ncbi-blast there are 2 things that have made the 
process of splitting up the target databases so that your query can be 
multiplexed across multiple searches and machines far easier:

o The "-z" option switch (used to be undocumented I think?) allows you 
to override/tell the blastall binary the effective size of the database. 
If you feed the original (large) value to the blastall binary while 
searching against the small slice you will at least get back the correct 
scores and statistics.  This is a huge time and accuracy saver as trying 
to parse and adjust these values after the fact is a giant error-prone 
excercise in pain.

o XML output of results

Having the scores and statistics correct while getting the results back 
in a way that is far easier to parse than the human readable version is 
95% of the battle.  Everything else is fairly simple.

-Chris

Wim Glassee wrote:

><snip>
>
>I've noticed some people cut their databases and query sequences to
>smaller pieces, with or without overlap, and perform separate blasts.
>But how do you put them back together again? And are the results the
>same?
>
>Wim
>
>
>  
>

-- 
Chris Dagdigian, <dag@sonsorol.org>
Life Science IT & Research Computing Consultant
Office: 617-666-6454, Mobile: 617-877-5498, Fax: 425-699-0193
Work: http://bioteam.net PGP KeyID: 83D4310E  Yahoo IM: craffi