[Bioclusters] Apple/Genentech's new version of Blast

Ian Korf bioclusters@bioinformatics.org
Fri, 22 Nov 2002 08:00:04 +0000 (GMT)

On Fri, 22 Nov 2002, Elia Stupka wrote:

> > Has anyone had a chance to take a look at Apple/Genentech's new
> > version of Blast?  Are the performance gains a great as they say?
> We've been playing with them for a while. blastn is definitely well
> optimized, the others are not yet G4 optimized though the default
> performance is already quite good. One note you might want to bear in mind
> is that using 2 processors instead of 1 does not help, in fact it seems to
> hamper the performance. Besides that we are quite happy with the
> benchmarks of blast and many other apps on Xserve. 

My understanding is that the Altivec optimzations arise from using
pre-fetching instructions, so the improvement will only be seen on really
insensitive searches. I've got both AG-BLAST and WU-BLAST on my computer.
As far as I can tell, the only reason to use AG-BLAST is when you're
looking for exact nucleotide matches, for example finding a specific 100
mer in a genome. This is the only time when using absurdly large word
sizes, like 20 makes sense. You just wouldn't want to do any comparative
genomic studies with -W 20. Reasonable cross-species word sizes removes
any AG-BLAST benefits.

I've tried, but I can't make WU-BLASTN as fast as AG-BLASTN even using
larger word sizes, WINK, and HITDIST. But for cross-species comparisons,
you're much better off with WU-BLAST because you can set your wordsize as
low as you like, SMP works properly, and you can use substitution

If you want to do the same kind of fast identity search with proteins you
have to use WU-BLAST and not any NCBI-BLAST derivative that I know of.
Setting W=5 or higher in WU-BLAST turns off neighborhood words and you get
BLASTN-like behavior for word matching. So you can do protein searches
with W=10 and fly. You can use neighborhood words with large values of W
by explicitly setting T, but you'll probably run out of memory if you try